5 to 22.5, etc?

The "right" answer is more philosophy than statistics. Luckily, on most

data sets, I don't think it will make any real difference which way it is

done.

SHAR_EOF

if test -f 'Agreg.3'

then

echo shar: over-writing existing file "'Agreg.3'"

fi

cat 'Agreg.3'

Calculated hazard for the "test2" data set.

There are two possible formulations of the hazard. The first is in

reference to the hazard function's time scale. The second is with

reference to a particular subject, who might enter/leave the risk set

multiple times, might change strata, etc, and we want the total hazard that

would be experienced by someone traversing that "path".

In a standard Cox model these are the same, since each person starts

at time zero, has no gaps, and stays in the same strata.

1. Baseline hazard for covariate "x".

Let B= beta, s=exp(B)

Start Stop Event x #at risk 1/(dLambda) xbar

1 2 1 1 2 s+1 s/(s+1)

2 3 1 0 3 s+2 s/(s+2)

5 6 1 0 5 3s+2 3s/(3s+2)

2 7 1 1 4 3s+1 3s/(3s+1)

1 8 1 0 4 3s+1 3s/(3s+1)

7 9 1 1 5 (3s+2)/2 3s/(3s+2)

3 9 1 1 "

4 9 0 1 "

8 14 0 0 2 0 0

8 17 0 0 1 0 0

At times 2,3,6,7,8,9 the baseline hazard is

H = cumsum(dLambda), or

H = cumsum( 1/(s+1), 1/(s+2), 1/(3s+2), 1/(3s+1), 1/(3s+1), 2/(3s+2) )

The desired hazard is exp(Bx) * H

The "Greenwood" term in the variance is similar to H, except square all

of the denominators, i.e., G= cumsum( 1/(s+1)^2, ...)

The increments to d are (xbar-x) dLambda.

The total variance is exp(2Bx) * (G + dId')

B=0, x=0 (Survival = exp(-H))

Time Survival Var(log surv) dId' portion

2 .60653 .289880 .03987948

3 .43460 .444632 .08320533

6 .35582 .548804 .14769310

7 .27711 .748855 .28524405

8 .21582 .993771 .46765942

9 .14467 1.372728 .76661724

B=-.0845293, x=.5

Time Survival Var(log surv) dId' portion/ s

2 .6068014 .2496241 .0000763297

3 .4369373 .3606177 .003492431

6 .3571892 .3999496 .002099579

7 .2767459 .4631417 .000012849

8 .2144194 .5308199 .002807969

9 .1432922 .6152773 .006326855

It is a bit of a surprize just how much difference it can make 'where'

the expected survival curve is computed.

2. Baseline survival for someone with a history. A practical example where

one might wish to do this is hard to come by, particularly so with

time dependent covariates. One problem is that it is all too easy to construct

a covariate/followup/strata history that would never occur with a 'real'

subject, so that the resulting curve, though mathematically accurate, is a

useless summary.

Nevertheless---- Let this be the patient's time/covariate pattern

Patient time hazard time x

----------- ----------- ---

0-3 0-3 0

3-6 - - not at risk for event

6-10 6-10 1

10-15 0-5 0 (some intervention 'restarts'

his hazard pattern)

15-25 5-15 2

Time Inc in cum haz Inc in Greenwood var Inc in d

--- -------------- -------------------- ----------------

2 H[1] g[1] (xbar[1]-0)*H[1]

3 H[2] g[2] (xbar[2]-0)*H[2]

7 H[4]*s g[4]*s^2 (xbar[4]-1)*H[4]*s

8 H[5]*s g[5]*s^2 (xbar[5]-1)*H[5]*s

9 H[6]*s g[6]*s^2 (xbar[6]-1)*H[6]*s

12 H[1] g[1] (xbar[1]-0)*H[1]

13 H[2] g[2] (xbar[2]-0)*H[2]

16 H[3]*s^2 g[3]*s^4 (xbar[3]-2)*H[3]*s^2

17 H[4]*s^2 g[4]*s^4 (xbar[4]-2)*H[4]*s^2

18 H[5]*s^2 g[5]*s^4 (xbar[5]-2)*H[5]*s^2

19 H[6]*s^2 g[6]*s^4 (xbar[6]-2)*H[6]*s^2

In the above, H and g refer to the increments in hazard and in the

Greenwood variance found in part 1 above. One feature to be noted is that

with a time dependent "target" value of x, we can no longer factor the

exp(B*x) term out of the sum of increments.

SHAR_EOF

if test -f 'Agreg.efficiency'

then

echo shar: over-writing existing file "'Agreg.efficiency'"

fi

cat 'Agreg.efficiency'

The agreg function can do both time dependent and multiple event Cox

models. It is modeled on the martingale notation of Anderson and Gill, and

hence the name. In the martingale notation, one has for each subject 3

data objects. N(t) is the cumulative number of events experienced by the

subject, X(t) is the time-dependent covariate vector, and Y(t) is an

indicator function which is 1 whenever the subject is at risk. In the usual

way of martingales and gambling schemes, Y can depend in any way on X and N

as long as it does not look into the future.

In this implimentation, we constuct an X matrix of constant values along

with vectors start, stop, and event. Each subject is entered as multiple

rows of the X matrix. Each row represents an interval (start, stop] over

which the time dependent covariates were constant, and event is a 0/1

indicator with event==1 if an event occured at time "stop". The time interval

is open at the left and closed on the right.

The program can handle multiple events and/or missing intervals of risk.

For example, consider a study where the 'event' is readmission to the

hospital. A patient can have multiple re-admissions. During the period just

after an admission, however, the patient is not at risk (while they are still

in the hospital). Each period of time outside of the hospital would appear

as a separate row of data in x.

The more common way to deal with time dependent cox models is to have

a computation occur at each death time, for example, BMDP and the new SAS

PHREG do this. One advantage of that proceedure over this one is the

ability to code continuously time-dependent covariates: our method only

accomodates step functions. (Giving every subject a separate line of data

per death time in the entire data set would work, so technically we can do

continuous variables, but the task of setting up said data matrix is

daunting). However, our method has several advantages over the BMDP strategy:

1) The ability to handle multiple events per subject, and intervals

without observation.

2) The ablitily to tie the time scale down to something other than

patient entry to the study, i.e., patients who don't start at time zero. For

instance, assume that you were doing a study on factory workers, and that you

wanted to model a baseline hazard that was tied to calendar time rather than

employee exposure time, perhaps there were changes in the equipment or some

such and you expect the baseline hazard to jump on March 1.

An example I have worked with involved Parkinson's disease and

L-Dopa therapy. The natural time 0 for each subject is the time of diagnosis,

so that the baseline hazard tracks the natural history of the disease. But

patients didn't enter the study until they came to our facility for care,

usually through referral. In the BMDP scheme, one might attempt to code this

as t=time from Dx, with a time dependent variable X1(t) = "has come to Mayo".

The fit will show that referral is the worst thing that can happen to you,

since no one dies until they come here!

The issue of "aligning" subjects properly on the hazard time scale has

not gotton much attention in the past, and deserves some thought.

3) Time dependent strata are easily coded. If someone changes strata

at time s they are broken into two observations at that time point. The

second observation can even be restarted in time to start at 0 if you

wish. For instance, if the strata change corresponds to some strong

intervention therapy, then the natural risk set in the second strata might

be based on "time since intervention" rather than "time since entry to study".

4) When only a few subjects have time dependent variates, this method can

be much more efficient. In a recent study here with multiple lab tests,

the proportion with 1,2, etc tests was geometrically decreasing. Only 2

patients had 5 values. Thus most patients had only 1 line of data in x.

Let r(t) be the number of subjects in the risk set at death time t, p be

the number of covariates, q the number of time-dependent covariates, and

s(t) the number of rows in this strata (when set up the agreg way). The

agreg calculation has at each death time a search over s(t) terms with a

sum over r(t) of them, the BMDP calculation has a sum over r(t) terms, each of

which requires q calls to the computation subroutine. So my total time is

O(p*p* E(r)) + O(E(s)), BMDP's is O(p*p* E(r)) + O(q* E(r)), each times the

number of events. If the subroutine is at all complex agreg wins.

As a footnote, I was really surprized the first time I ran agreg on a

stratified data set: it was faster than I expected! When strata are

introduced the program spends less time searching out whom is part of the

current risk set. Without strata it has to look over the entire data set,

with strata it has to look only over the current group.

Terry T.

SHAR_EOF

if test -f 'Coxreg.1'

then

echo shar: over-writing existing file "'Coxreg.1'"

fi

cat 'Coxreg.1'

Examples that test out the Cox and agreg programs.

1) Here is a made up data set for Cox reg, where all of the results have been

worked out exactly.

Time Status x

1 1 1

1 0 1

2 1 1

2 1 0

3 0 0

4 1 0

Define B= beta (since I don't have Greek chars), and s=exp(B)

loglik(B) = 2B - log(3s+3) - 2log(s+3)

U = derivative of loglik with respect to B

= (6 + 3s - s^2)/ ((s+1) (s+3))

I = information matrix = s/ (s+1)^2 + 6s/(s+3)^2

Actual solution is at U=0, or s= .5*( 3 +- sqrt(33) )

s= 4.3722813....

beta-hat = log(s) = 1.4752849

loglik(0) = -4.56435

loglik(B) = -3.8247495

I(B) = .6341681

Iteration is B= 0, 1.6, 1.4727236, 1.4752838, ...

Let a=(s+1)*(3s+3), and b=(s+3)^2.

Then the score residuals are (2s+3)/a

-s /a

-s/a + 3(3-s)/b

s/a + s(s+1)/b

s/a + 2s/b

s/a + 2s/b

Score residuals at B=0 are (1/24) * (10, -2, 7, -1, 5, 5)

Martingale residuals are 1- s/(3s+3)

0- s/(3s+3)

1- s( 1/(3s+s) + 2/(s+3) )

1- ( 1/(3s+s) + 2/(s+3) )

0- ( 1/(3s+s) + 2/(s+3) )

1- s( 1/(3s+s) + 2/(s+3) +1)

Shoenfeld residuals are Time=1: 1/(s+1)

2: (3-s)/(3+s)

4: 0

2. A worked agreg example

Start Stop Event x #at risk 1/(dLambda)

1 2 1 1 2 s+1

2 3 1 0 3 s+2

5 6 1 0 5 3s+2

2 7 1 1 4 3s+1

1 8 1 0 4 3s+1

7 9 1 1 5 (3s+2)/2

3 9 1 1

4 9 0 1

8 14 0 0 2 0

8 17 0 0 1 0

loglik = 4B - ln(s+1) - ln(s+2) - 3ln(3s+2) -2ln(3s+1)

loglik(0) = -9.3926619, loglik(solution) = -9.387015

U = 4 - (63s^4 + 201s^3 +184s^2 + 48s)/(9s^4 +36s^3 +47s^2 +24s +4)

solution at s=30834667/33554432 = .9189446, thanks to Macsyma

beta-hat = log(s) = -.0845293

I = s/(s+1)^2 + 2s/(s+2)^2 + 6s/(3s+2)^2 + 3s/(3s+1)^2

+ 3s/(3s+1)^2 + 12s/(3s+2)^2

I(0) = 2821/1800 = 1.56722222222....

Score residuals at B=log(2), or s=2

1/9, -3/8, -21/32, -165/784, -2417/14112, 33/392, -15/784, -211/784,

3/16, 3/16

Score residuals at B=0

1/4, -2/9, -12/25, -413/3600, 191/1800, 71/400, 7/200, -73/200, 6/25,

6/25

Shoenfeld residuals Time=2 : 1/(s+1)

3 : -s/(s+2)

6 : -3s/(s+2)

7 : 1/(3s+1)

8 : 3s/(3s+1)

9 : 4/(3s+2)

Martingale residuals: Let a,b,c,d, e, and f be the increments of

dLambda listed above, i.e., a= 1/(s+1). Then the residuals are (event - ?)

where ? = a *s

b

c

(b + c + d) *s

a + b + c + d + e

(e + f)*s

(c + d + e + f) *s

(c + d + e + f) *s

f

f

SHAR_EOF

if test -f 'Coxreg.2'

then

echo shar: over-writing existing file "'Coxreg.2'"

fi

cat 'Coxreg.2'

1. Assume that at time t there are d tied deaths. Then the Breslow

approximation to the likelihood is

w[1] w[2] w[d]

----- * ----- * ... * -----

sum(w) sum(w) sum(w)

Here I have numbered the people in the risk set from 1 to n, and made sure

that the d deaths are in positions 1 to d. W is the vector of risk weights

w[i] = exp(beta %*% x[i,])

Let m = the sum of the scores for the deaths = sum(w[1:d]). The Efron

approximation is

w[1] w[2] w[3]

------- * ----------- * ------------ * ....

sum(w) sum(w) - m/d sum(w) - 2m/d

This is equivalent to using new weights at each of the death times within

a tie! Multiply each of the old weights by a factor (1+ 1/d - j/d) to get

the new jth term. Since this downweighting is not a function of beta, none

of the derivatives wrt beta changes in its form, e.g., I is still a sum

of weighted covariances, but I have new weights.

For the canonical coxreg example, at time 2, the weights are

first death: s, 1, 1, 1 (as before)

second : s/2, 1/2, 1, 1

The third term of U is (0 - s/(s+5))

The third term of I is (s/(s+5)) - (s/(s+5))^2

The new solution is s^3 - 23s -30 =0, so beta ~1.6768575

2. Exact partial likelihood. The term is

w[1] * w[2] * ......w[d]

-------------------------

sum(prod(w[d choose n]))

The denominator is the sum over all subsets of size d, and the number inside

the sum is the product of the weights from that subset. For the

coxreg example, at time 2, the likelihood term is now

s * 1 s

------------ = ---

s*1 + s*1 + s*1 + 1*1 + 1*1 + 1*1 3s +1

And the maximum occurs at beta= infinity.

If the problem is changed to "x=c(1,1,2,0,0,0)", the rest staying the

same, then beta=.961687.

-----------------

Residuals under the Efron approximation.

Martingale beta =0 beta-hat

1 - s(1/3s+3) 5/6 .7191708

0 - s(1/3s+3) -1/6 -.2808092

1- (s/3s+3 + s/s+3 + s/s+5) 5/12 -.4383401

1- (1/3s+3 + 1/s+3 + 1/s+5) 5/12 .7310864

0- (1/3s+3 + 1/s+3 + 2/s+5) -3/4 -.365544

" "

Schoenfeld

1/s+1 1/2 .1575124

1- s(s+4)/((s+3)(s+5)) 19/24

0- "" 5/24 -.5787554

Score (I hated this one)

(2s+3)/((s+1)(3s+3)) 5/12 .1132783

-s /((s+1)(3s+3)) -1/12 -.04423407

(675+1305s +756s^2 -4s^3 -79s^4 -13s^5)/

(3(s+1)^2 (s+3)^2 (s+5)^2)) 55/144 -.1029196

(147s -9s^2 -200s^3 -140s^4 -35s^5 -3s^6)/

(3(s+1)^2 (s+3)^2 (s+5)^2)) -5/144 -.4078401

2s (177 + 282s + 182s^2 + 50s^3 + 5s^4)/

(3(s+1)^2 (s+3)^2 (s+5)^2)) 29/144 .2208585

ditto

-------

Residuals under the exact likelihood:

Not in my lifetime.

SHAR_EOF

if test -f 'Hazard.cox'

then

echo shar: over-writing existing file "'Hazard.cox'"

fi

cat 'Hazard.cox'

Here is a simple example of the hazard and variance of the hazard, worked

out by hand.

Time Status x

1 1 1

1 0 1

2 1 1

2 1 0

3 0 0

4 1 0

Define B= beta (since I don't have Greek chars), and s=exp(B)

Now, the weights for each subject are exp(B*x) = (s,s,s,1,1,1), giving the

hazard and cumulative hazard of

Time Hazard Cum hazard

1 1/(3s+3) 1/(3s+3)

2 2/(s+3) " + 2/(s+3)

3 0 " + " + 0

4 1/1 " + " + " + 1

This estimated survival for a subject with x=0 is just exp(-1* cum hazard).

For a subject with arbitrary x, multiply the cumulative hazard by exp(B*x).

The variance of the cumulative hazard is the sum of two terms. Term 1

is a natural extension of Greenwood's formula to the case where there are

weights. It is a running sum, with an increment at each death time of

#deaths/ (sum of weights, for those still at risk)^2

Another reasonable Greenwood term would have r(r-d) instead of r^2 in the

denominator, where r=sum of weights over the risk set and d=sum of weights for

the deaths (though I have seen no one propose this in a paper). For an x other

than zero, the weights are relative to the chosen x, which has the effect

of multiplying the term by exp(2*B*x).

The second term is dId', where I is the variance-covariance matrix of the

Cox model, and d is a vector. The second term accounts for the fact that the

weights themselves have a variance; d is the derivative of S(t;x) wrt beta,

and can be formally written as

exp(B*x) \int_0^t (\bar z(s) - x) d\hat\Lambda_0 (s)

where z is the covariate set that gave rise to the estimated hazard, and

\bar z(s) is the usual Cox model term "weighted mean of z at time s".

This can be recognized as -1* the score residual process, which measures

leverage of a particular observation on the estimate of beta. It is

intuitive that a small score residual --- an obs with such and such

covariates has little influence on beta --- results in a small added

variance, i.e., beta has little influence on the estimated survival. This

is the score residual process if the 'psuedo subject' has no event, so the

correspondence with the actual score residuals (for the data at hand) is

not perfect.

For x=0, the ith element of d increases at each death time by

(weighted mean of ith covariate) * #deaths/(sum of weights in risk set)

The weighted mean is computed over those still in the risk set.

Time Term 1

------ --------

1 1/(3s+3)^2

2 " + 2/(s+3)^2

3 " + "

4 " + " + 1/(1^2)

Time d

----- ----

1 (s/(s+1))* 1/(3s+3)

2 " + (s/(s+3))* 2/(s+3)

3 " + "

4 " + " + 0 * 1

For this data set, I = s/(s+1)^2 + 6s/(s+3)^2; the variance matrix is I

inverse.

For B=0, x=0

Time Variance

1 1/36 + 1.6*(1/12)^2 = 7/180

2 (1/36 + 2/16) + 1.6*(1/12 + 2/16)^2 = 2/9

4 (1/36 + 2/16 + 1) + 1.6*(1/12 + 2/16 + 0)^2 = 11/9

For B=1.4752849, x=0

Time Variance

1 .0038498 + .004021 = .007871

2 .040648 + .0704631 = .111111

4 1.040648 + .0704631 =1.111111

For B=0, X=1

The cum hazard is s*old, the first term of the variance is s^2*old.

The new d has increments of

The variance for a posited subject with arbitrary x is most easily

calculated by first recentering the covariate matrix at that value, and

then computing as in the above. But if you want to do it the other way---

Assume you want the hazard and survival for a subject with covariate

vector c. Let r = exp(B'c). Then the new hazard = r * old hazard. The

first term in the variance = r^2 * old first term in the variance. The

new d = r*(old d - c* Cum hazard). This last term is surprizing at first;

I multiply the old hazard by a constant r, and the variance does not go up

by r^2.

-----------------

Changes due to the Efron approximation.

At time 2, the two deaths now experience a hazard of 1/(s+3) + 1/(s+5),

and the "survivors" at time 2 experience a hazard of 1/(s+3) + 2/(s+5).

The basic idea here is that each of the tied deaths experiences half of the

second hazard jump, so that the identity sum(deaths) = sum(hazard) remains

in force. For curve estimation we have

Time Hazard Cum hazard

1 1/(3s+3) 1/(3s+3)

2 1/(s+3) +2/(s+5) " + 1/(s+3) + 2/(s+5)

3 0 " + " + " +0

4 1/1 " + " + " + 1

The variance formulae extend by noticing that the "Greenwood" term is

the sum of (hazard increment ^2), when you treat a tied death as d separate

hazard increments. The same logic holds for the second term in the hazard.

Time Term 1

------ --------

1 1/(3s+3)^2

2 " + 1/(s+3)^2 + 4/(s+5)^2

3 " + " + "

4 " + " + " + 1/(1^2)

The weighted means at time 2 are now s/(s+3) and s/(s+5).

Time d

----- ----

1 (s/(s+1))* 1/(3s+3)

2 " + (s/(s+3))* 1/(s+3) + (s/(s+5))* 2/(s+5)

3 " + "

4 " + " + 0 * 1

For this data set, I = s/(s+1)^2 + 3s/(s+3)^2 + 5s/(s+5)^2; the variance

matrix is I inverse.

For B=0, x=0 c== I inverse = 144/83 ~ 1.7349

Time Variance

1 1/36 + c(1/12)^2 = 119/2988

2 (1/36 + 1/16 + 4/25) + c(1/12 + 1/16 + 1/18)^2 = 1996/6225

4 (1/36 + 1/16 + 4/25 +1) + c(1/12 + 1/16 + 1/18 +0)^2 = 1 + "

For B=1.676857, x=0

Time Variance

1 .00275667 + .00319386 = 0.0059505

2 .05445330 + .0796212 = 0.134075

4 1.05445330 + .0796212 = 1.134075

SHAR_EOF

if test -f 'Install'

then

echo shar: over-writing existing file "'Install'"

fi

cat 'Install'

First-- a note on my philosophy. I can't ensure that this will work for

all installations or machines, and I won't even try. The instructions below

work for me, on my machines. I am quite willing to give help if these don't

work for you, but make no guarrantees about the efficacy of the same.

-----------------------------

Dynamic loading:

0. Make a directory SHOME/library/survival, and place the code there.

cd SHOME/library/survival

sh < /wherever_you_have_put_it/survival3

"SHOME" refers to the S home directory. This is where you want to

install things so that all S users can see them.

1. In that directory:

mkdir .Data

mkdir .Data/.Help

touch .Data/.Audit

chmod a-w .Data/.Audit

The two lines about the .Audit file are optional-- but I assume that you

don't want to waste a bunch of space with an audit file of the installation

process.

To save some space, you may want to delete those rate tables (for

expected survival) that are irrelevant to your geographic needs:

survexp.az.s Arizona

survexp.azr.s Arizona (by race)

survexp.fl.s Florida

survexp.flr.s Florida

survexp.mn.s Minnesota

survexp.mnwhite.s Minnesota

survexp.wnc.s West-North-Central US

survexp.us.s US total

survexp.usr.s US

survexp.uswhite.s US white

2. If you are running Splus:

a. Remove my Makefile

b. Type "Splus CHAPTER *.c *.d *.s" to make a nice new one

that is optimized for your machine.

c. If you are in subdirectory "xxx" then this new Makefile will

create an object xxx_l.o. Note that CHAPTER will fail if your

directory name has punctuation marks in it (like ".").

c. READ the first portion of the new Makefile. In particular,

find the comments on static versus dynamic loading and set things

up to choose dynamic loading.

d. See the manual page for "library" for instructions on.

e. type "make dyn.load".

3. If you are running ATT S instead of Splus there are 2 choices:

a. They don't recognized ".s" extensions. So do

cat *.s > all.q

S CHAPTER all.q *.c *.d

Then proceed pretty much as before.

b. Try using my Makefile. (I don't use CHAPTER locally because

I need to interface with SCCS). Find the line that defines

"SPLUS = ../../Splus", and change it to point to the actual

shell script. (This is probably ../../S or /usr/local/bin/S).

If your system doesn't have the Unix "install" command, then

find the string "install -m0444" in the Makefile, and change it

to "cp".

The makefile won't work for Ultrix, and perhaps for other systems

as well, as it uses many Sun "template" extensions.

4. Anyone who want to use the code will have to type the following

lines, every time they use S:

library(survival, first=T)

library.dynam("survival", "survival_l.o")

4a. You might want to make a local function out of the lines in (2),

say "newsurv", and place it on the default S search path.

4b. Users of Splus (at least) have 2 other options.

i. In this directory, exectute the "First.lib" function

% Splus

> source("First.lib")

> q()

Now your users only have to type the first line above, i.e.,

library(survival, first=T)

and can skip the second one.

ii. Do what we do at Mayo, and install an local startup file. A copy

of ours in here as First.local. Use it as a template for your own.

% Splus

> source("First.local")

> First.local() #test it!

> q()

% mv .Data/First.local SHOME/splus/.Functions/.First.local

4c. The "first=T" argument is important, since I override 4 of the

standard S functions.

5. You may want to move the Examples directory to another place, or

remove it.

------------------------------

To test the routines, cd to the Examples directory. Be aware that many

of my tests depend on the "date" functions, also available from statlib.

SHAR_EOF

if test -f 'Mart.residual'

then

echo shar: over-writing existing file "'Mart.residual'"

fi

cat 'Mart.residual'

Hopefully you, the reader, know TeX, because that is how the formulas

in this document are written. Notation is that of the Biometrika paper

(1990) ``Martingale Based Residuals for Survival Data''.

\def\bhat{\hat \beta} %define "bhat" to mean "beta hat"

\def\Mhat{\widehat M} %define "Mhat" to mean M-hat

\def\zbar{\bar Z}

\def\lhat{\hat \Lambda}

Let $Y_i(t)$ be the indicator that a given subject is at risk and under

observation at time t, $Z_{ij}(t)$ be the jth covariate of the ith person

(possibly time dependent), and $\bhat$ be the Cox or Agreg estimate of

beta. Let $N_i(t)$ be the step function for the ith subject, which counts

the number of ``events'' for that subject up to time t. Define $w_i(t)$ to

be $exp^{\bhat ' Z_i(t)}$, i.e., the risk score for the ith subject after

the fit is finished. Then the Breslow (or Tsiatis or Link) estimate of the

baseline hazard is

$$ \lhat(t) = {\int_0^n {\sum_{i=1}^n dN_i(s)} \over

{\sum_{j=1}^n Y_i(s)w_i(s)}} .$$

The martingale residual at time t is

$$ \Mhat_i(t) = N_i(t) - \int_0^t w_i(s) Y_i(s) d \lhat(s). $$

The program returns the residual at $t = \infty$.

For non-time-dependent covariates, the $w_i$ term can be factored out of the

integral, giving $w_i \lhat(t)$.

The deviance residual is a normalizing transform of the martingale residual

$$ deviance_i = sign(\Mhat_i) * \sqrt{-\Mhat_i - N_i log((N_i - \Mhat_i)/N_i)}$$

The other two residuals are based on the score process $L_{ij}(t)$ for the

ith subject and the jth variable:

$$ L_{ij}(t) = \int_0^t (Z_{ij}(s) - \zbar_j(s)) d \Mhat_i(s),$$

where $\zbar_j(s)$ is defined as the weighted mean (using $w_i$'s) of the

jth covariate, over the subjects still in the risk set at time s.

The score residual is then defined, for each subject and each variable

(an n by p matrix) as $L_{ij}(\infty)$. It is the sum of the score process

over time.

The other useful transform is to sum the score process up over indivduals

to get a total score process $L_j(t)$. This is just the score vector at

time t, so that at $\bhat$ we must have $L_j(0) = L_j(\infty) = 0$.

Because $\lhat$ is discreet, our estimated score process will also be

discreete, having jumps at each of the unique death times. Residual type

4 is a p column matrix with one row per jump. To get the ``Brownian bridge''

plots found in the technical report, just use the cumsum function on the

returned value.

There are two simplifying identies for the type 4 residuals:

$$ \sum_i L_{ij}(t) = \sum_i \int_0^t Z_{ij}(s) d \Mhat_i(s) $$

$$ \sum_i L_{ij}(t) = \sum_i \int_0^t (Z_{ij}(s) - \zbar_j(s)) d N_i(s) $$

The second of these is equivalent to a residual proposed by Schoenfeld.

Since the sums are the same for all $t$, each increment must be the same.

Our program uses the first identity, as $d \Mhat$ seems to be less work to

compute than $\zbar$. Note that $d \Mhat_i(t)$ is zero when subject i is

not in the risk set at time t.

Residuals 1, and 3 are integrals over time of some object. One

consequence of this is that they work well for the Anderson-Gill

program also. Specificly, in setting up agphm, a single subject is broken

up into multiple lines of the input data, as though he were a set of

different individuals observed over disjoint times. After the agphm

program is finished, the residual for that person is just the sum of the

residuals for these ``psuedo'' subjects. This property is not true for the

deviance residual, however.

Residual 4 works the same in both models, since it is reported at the

unique death times, and not for individuals at all.

SHAR_EOF

if test -f 'Notes'

then

echo shar: over-writing existing file "'Notes'"

fi

cat 'Notes'

Predicted survival after a Cox model

1. Time dependent covariates are near impossible.

a. I need to know who is who. The cohort survival for a study of

sized 2 where $\beta$=large and $x_1(t)=0$, $x_2(t)=1$, is much

different than if subjects 1&2 interchange covariates after a

short term.

In the first case, the overall survival is large -- there is >50%

chance that subject 1 is still alive. In case 2, they are both dead

after a while, since both have had a period of high risk. But the

input files for the two cases are identical!

b. Conceptually, things are undefined if someone has an interval

without risk.

c. For both the Ederer and Hakulinen estimates, I have to predict the

covariate beyond the time of measurement . The conditional estimate

only estimates a difference, not a true survival, so I'd be better

off to plot the time-dep coefficient.

SHAR_EOF

if test -f 'Param'

then

echo shar: over-writing existing file "'Param'"

fi

cat 'Param'

Since the density of a dist with small \sigma can be >1, the loglik for

the saturated model may be >0. Thus the deviance may be more than 2*LL.

SHAR_EOF

if test -f 'ToDo'

then

echo shar: over-writing existing file "'ToDo'"

fi

cat 'ToDo'

survreg: Add predict and plot methods.

Figure out how to use/expand the glm "family" concept to survreg

Impliment the residual tests of Escobar and Meeker PARTIAL

Add Gaussian, generalized gamma, and generalized F dist PARTIAL

Make sure that survtest uses the Left continuous KM estimate! DONE

Add the survival weights of Lin's paper to the program. NOT NEEDED, see paper

with Pat Grambsch.

How do caseweights propogate to the residuals calculation? THEY DON'T

How do risk weights propogate? Just an offset term.

Can't put a survival object inside a data frame. (You can with model.frame,

but then print.data.frame goes nuts). FIXED

Suspect that the plot of null agreg residiuals, not collapsed on subject,

versus the x variate will correctly give the functional form for a time

dependent covariate.

Understand Dave Harrington's deviance residuals arguments.

Add a 'mode' argument to the regression routines. If set to 'single', then

the X matrix is kept in single precision only. Saves a lot of

storage for Frank, et. al.

Allow singular matrices if singular.ok=T. Requires a change to qr style

calculations? DONE-- via a change to cholesky routines.

Add caseweights to coxreg and surv.fit.coxreg.

Just how to do this for the Efron approx is highly unclear. DONE

Does my "na.action" option override an na.action attribute on a data frame? NO

Should or should not missing be included as a strata in surv.fit and coxreg.

ADDED AS AN ARG TO "strata()"

test the singular=ok option for weibull & company

Remove the "infinite beta" check from coxreg. DONE

Allow multiple strata statements. DONE

Set up the "methods" based NA actions. DONE

Allow a strata argument to surv.exp DONE

Test the use of expected rather than observed information in survreg. TRIED -

didn't work out, and better starting estimates obviate the need.

Expand survdiff to rho-gamma tests.

Add frailty to coxph.

Update model.newframe to use hidden transformation parameters.

Individual survival estimates from predict. PARTIAL

Allow the newdata in survfit.coxph to contain only a subset of the values,

and use the means for the rest.

For a factor, default to the first level rather than the mean.

In print.survfit, warn if there is truncation? In this case the estimate of

the mean is problematic.

Hazard ratio output for survreg.

plot methods for coxph

Add a column "number of censored" to Kaplan-Meier printout. For left

truncated data (ie agreg), we need to print the number at risk more

often, which means that essentially the same column is added.

If there is a strata*covariate interaction term in a Cox model, then

the survfit method for the model gets very confused.

Add the Hakulinen method for expected survival post Cox model.

Realize that I will never get all of this done.

SHAR_EOF

if test ! -d 'latex'

then

mkdir 'latex'

fi

cd 'latex'

if test -f 'S.sty'

then

echo shar: over-writing existing file "'S.sty'"

fi

cat 'S.sty'

% S book functions

%

% Example environment: for S examples -- like verbatim except that

% you can insert special things by using \ commands and braces (which

% then themselves have to be escaped)

%

% \s inserts a Postscript symbol that stands for the 1988 S book in references

%

% The next few lines change the text width:

\setlength{\textwidth}{5in}

\setlength{\evensidemargin}{1in}

%

%the default width corresponds to

%\setlength{\textwidth}{4.5in}

%\setlength{\oddsidemargin}{.5in}

%\setlength{\evensidemargin}{1.5in}

%

% some functions for avoiding lousy pagebreaks

\def\need#1 {\vskip 0pt plus #1 \penalty -200 \vskip 0pt plus -#1}

\def\Exampleskip{{\widowpenalty=200\need 20pt }}

%

% \Sfigure{filename} inserts a figure using psfig

\def\docspecials{\do\ \do\$\do\&%

\do\#\do\^\do\^^K\do\_\do\^^A\do\%\do\~}

\def\alltt{\trivlist \item[]\if@minipage\else\vskip\parskip\fi

\leftskip\@totalleftmargin\rightskip\z@

\parindent\z@\parfillskip\@flushglue\parskip\z@

\@tempswafalse \def\par{\if@tempswa\hbox{}\fi\@tempswatrue\@@par}

\obeylines \small \tt \catcode``=13 \@noligs \let\do\@makeother \docspecials

\frenchspacing\@vobeyspaces}

\let\endalltt=\endtrivlist

\newenvironment{Example}{\begin{list}{}{\setlength{\leftmargin}{\parindent}}\item

\begin{alltt}}{\end{alltt}\end{list}}

%%following two lines toggle on/off marginal notes

%%\newcommand{\Marginpar}[1]{\marginpar{\small\em #1}}

\newcommand{\Marginpar}[1]{}

\newcommand{\Co}[1]{{\small\tt #1}}

% some commands from Doug Bates for nonlinear chapter

\newcommand{\E}[1]{{\rm E}\left[ #1 \right]}

\newcommand{\boeta}{\mbox{\boldmath$\eta$}}

\newcommand{\bbeta}{\mbox{\boldmath$\beta$}}

\newcommand{\bG}{\bf G}

\newcommand{\bX}{\bf X}

\newcommand{\by}{\bf y}

\newcommand{\bz}{\bf z}

\newcommand{\trans}{^{\rm T}}

\newcommand{\btheta}{\mbox{\boldmath$\theta$}}

\newcommand{\logit}{\mathop{\rm logit}\nolimits}

%\input{psfig}

%\newsavebox{\theSbook}

%\savebox{\theSbook}{\hbox{

% \lower 2.2pt\psfig{figure=/usr/jmc/book2/bigS.ps,height=11pt,silent=}

%\kern -1.5pt

% }}

%\newcommand{\s}{\usebox{\theSbook}}

\newcommand{\Gets}{\(\leftarrow\)}

% commands for figures

%\newlength{\figurewidth}

%\setlength{\figurewidth}{\textwidth}

%\addtolength{\figurewidth}{2\parindent}

\newcommand{\Sfigure}[1]{\psfig{figure=#1,width=\textwidth,silent=}}

\newcommand{\littlefigure}[1]{\psfig{figure=#1,width=2.4in,silent=}}

\newcommand{\IntroChapter}{1}

\newcommand{\ModelChapter}{2}

\newcommand{\DataChapter}{3}

\newcommand{\LinearChapter}{4}

\newcommand{\DesignChapter}{5}

\newcommand{\GlmChapter}{6}

\newcommand{\GamChapter}{7}

\newcommand{\SmoothChapter}{8}

\newcommand{\TreeChapter}{9}

\newcommand{\NonlinChapter}{10}

\makeatletter

\def\@makechapterhead#1{{\def\\{,}\addtocontents{toc}{{\@author}} }

\vspace*{50pt} { \parindent 0pt \raggedright

\ifnum \c@secnumdepth >\m@ne \huge\bf \@chapapp{} \thechapter \par

\vskip 20pt \fi \Huge \bf #1\par

\vskip 1em {\large \lineskip .75em

\begin{tabular}[t]{l}\@author

\end{tabular}\par}

\nobreak \vskip 40pt }

}

\long\def\@makecaption#1#2{

\small\def\small{\footnotesize}

\vskip 10pt

\setbox\@tempboxa\hbox{ #1: {\it #2}}

\ifdim \wd\@tempboxa >\hsize { #1: {\it #2}}\par \else \hbox

to\hsize{\hfil\box\@tempboxa\hfil}

\fi}

\makeatother

\newcommand{\Twiddle}{\mbox{\(\tt\sim\)}}

\newcommand{\twiddle}{{\Twiddle}}

\newcommand{\Dollar}{{\$}}

\newcommand{\Hat}{\mbox{\boldmath\({}^{\wedge}\)}}

%\newcommand{\DraftHeading}{\chaptermark{\copyright AT\&T: DRAFT \today}}

\newcommand{\DraftHeading}{}

\newcommand{\Plus}{\mbox{ \tt + }}

\newcommand{\Star}{\mbox{ \tt * }}

\newcommand{\Colon}{\mbox{ \tt :\,}}

\newcommand{\Divide}{\mbox{ \tt / }}

\newcommand{\In}{\mbox{ \tt \%in\% }}

\newcommand{\Minus}{\mbox{ \tt - }}

\newcommand{\given}{\;\vert\;}

\newcommand{\var}{{\rm var}}

\newcommand{\ev}{{\it E}}

\newcommand{\dev}{{\it D}}

\newcommand{\AIC}{{\it AIC}}

\newcommand{\tsum}{\mathop{\textstyle\sum}}

\newcommand{\norm}[1]{\left\Vert#1\right\Vert}

\newcommand{\abs}[1]{\left\vert#1\right\vert}

\renewcommand{\vec}[1]{\mbox{\boldmath $#1$}}

\newcommand{\df}{{\rm df}}

\newcommand{\NEWS}{{\s}}

\newcommand{\co}[1]{\Co{#1}}

%These next two are to do with placement of figures; see page 178 of Latex Manual

\renewcommand{\topfraction}{.75}

\renewcommand{\bottomfraction}{.75}

%

%

%This is for the S HELP file documentation

%

%

%

\newcommand{\Paragraph}[1]{

\item[{\small\uppercase{#1}}]\ \\

}

\newcommand{\Title}[2]{

\psfig{figure=gray.ps,rheight=0bp,rwidth=0bp,width=\textwidth,height=.375in,silent=}

\index{#1 (Helpfile)}

\vbox to .375in{\vfill

\hbox to \textwidth {\ {\bf #1}\hfill #2\hfill {\bf #1}\ }

\vfill

}

\markboth{\hskip 1pc{\bf #1}\hfill \uppercase{Appendix B}}{\uppercase{ S Functions and Classes}\hfill{\bf #1}\hskip 1pc}

}

\newenvironment{Helpfile}[2]{\need 2in \Title{#1}{#2}\begin{list}{}{\setlength{\leftmargin}{.58in}\setlength{\labelwidth}{.5in}\setlength{\labelsep}{.08in}}\item}{\end{list}}

\newenvironment{Argument}[1]{\item[{\small\uppercase{#1}}]}{}

\newcommand{\seeTitle}[2]{\need 20pt \Title{#1}{#2}}

%

%

%

\newcommand{\GLIM}{{\sc glim}}

\newcommand{\GENSTAT}{{\sc genstat}}

\newcommand{\FORTRAN}{{\sc fortran}}

\newcommand{\C}{{\sc c}}

\newcommand{\GAM}{{\sc gam}}

\newcommand{\GLM}{{\sc glm}}

\newcommand{\LM}{{\sc lm}}

\newcommand{\UNIX}{{\sc unix}}

\newcommand{\LINPACK}{{\sc linpack}}

\hyphenation{weight-ed data-base para-met-riz-ed Like-li-hood like-li-hood}

\hyphenation{point-wise pre-plot}

SHAR_EOF

if test -f '.gs1.therneau'

then

echo shar: over-writing existing file "'.gs1.therneau'"

fi

cat '.gs1.therneau'

ZZversion 3¨screensize 32x80¨t Tue Nov 29 07:55:53 1994¨8~75¤border^return indentmake window 1¬Readme 12x36 12x36vZª

SHAR_EOF

echo shar: a missing newline was added to "'.gs1.therneau'"

if test -f 'Readme'

then

echo shar: over-writing existing file "'Readme'"

fi

cat 'Readme'

The "official" reference for this document is:

Terry M Therneau, A package for survival analysis in S, Technical Report

number 53, Section of Biostatistics, Mayo Clinic, 1994.

If you have no access to latex we will happily mail you a copy.

Terry Therneau

Section of Biostatistics

Mayo Clinic

Rochester, Minnesota 55905

SHAR_EOF

if test -f 'survival.tex'

then

echo shar: over-writing existing file "'survival.tex'"

fi

cat 'survival.tex'

\documentstyle[S,11pt]{article}

\setlength{\textwidth}{5.75in}

\title {A Package for Survival Analysis in S}

\author{Terry M. Therneau \\

Mayo Foundation }

\date{\today}

\def\bhat{\hat \beta} %define "bhat" to mean "beta hat"

\def\Mhat{\widehat M} %define "Mhat" to mean M-hat

\def\zbar{\bar Z}

\def\lhat{\hat \Lambda}

\def\Ybar{\overline{Y}}

\def\Nbar{\overline{N}}

\def\Vbar{\overline{V}}

\begin{document}

\maketitle

\tableofcontents

\section{Introduction}

Over the last ten years I have been using the S package as a personal tool

for my investigations of survival analysis. This work gained a large

amount of momentum during my joint efforts with Patricia Grambsch

and Tom Fleming on residuals for survival models, some of which is

described in \cite{TGF}.

The set of

routines based on that work has been posted to statlib for several years

under the name ``survival2''

and is also included as a standard part of the S-plus package from

Statistical Sciences. With the advent of the new object-oriented

methods within S, laid

out in the book ``Statistical models in S" by John Chambers and Trevor

Hastie \cite{Smodel}, I became interested in adapting some of these methods to the

survival package.

The execution of this task has turned out to be a substantial effort, but

it afforded a chance to upgrade several of the routines with new features,

and I think the collection has been much improved in both functionality and

ease of use. Substantial opportunities for further

improvements still remain, however.

Section 2 gives a very terse overview of the available commands, without

attempting to explain the various options.

Section 3 contains the formal statististical details for the methods.

Section 4 gives

detailed examples. This is the place to expend most of your reading effort,

at least for early users.

Sections 5, 6 and 7 discuss three separate areas of the

package's interaction with S in general: side effects, missing values,

and conflicts with existing S functions.

There are several good texts on survival analysis; examples are Miller

\cite{Miller} and Kalbfleisch and Prentice \cite{Kalb}. A more

technical treatment of the Cox model, including the important extensions

of time dependent-covariates and multiple events, can be found in

Fleming and Harrington \cite{FandH}, and in Andersen et al. \cite{Ander2}.

My thanks to Sin-Ho Jung, Paul Novotny, and particularly Peter Thall for

helpful comments on an earlier draft of this manuscript.

Frank Harrell continues to provide feedback on the methods, along with

new ideas for improvement. (If I ever find time to implement all of

Frank's suggestions, this will become an {\em amazing} survival package!)

\section{Overview}

The summary below is purposefully very terse. If you are familiar

with survival analysis {\em and} with other S modeling functions it will

provide a good summary. Otherwise, just skim the section to get an overview

of the type of computations available from this package, and move on to

section 3 for a fuller description.

\begin{description}

\item[Surv()] A {\em packaging} function; like I() and C() it doesn't

transform its argument. This is used for the left hand side of all the

formulas.

\begin{itemize}

\item \Co{Surv(time, status)} -- right censored data

\item \Co{Surv(time, s==3)} -- right censored data, a value of 3 = death

\item \Co{Surv(t1, t2, status)} -- counting process data, as in the current

agreg() function

\item \Co{Surv(t1, ind, type='left')} -- left censoring

\item Methods

\begin{verse}

Math.Surv \\

Ops.Surv \\

Summary.Surv \\

{[}.Surv \\

is.na.Surv \\

print.Surv

\end{verse}

\end{itemize}

\item[coxph()] Cox's proportional hazards model.

\begin{itemize}

\item \Co{coxph(Surv(time, status) {\Twiddle}x, data=aml)} --

standard Cox model

\item \Co{coxph(Surv(t1, t2, stat) {\Twiddle} (age + surgery)* transplant)} --

time dependent data.

\item \Co{y 1}r_i}

\right) \ldots,

$$

where ${r_1, r_2, \ldots, r_i}$ are the per subject risk scores.

Assume that the real data are continuous, but the data as recorded have tied

death times. For instance, we might have several subjects die on day 1 of

their hospital stay, but of course they did not all perish at the same

moment.

For a simple example, assume 5 subjects in time order,

with the first two both dying at the same recorded time.

If the time data had been more precise, then

the first two terms in the likelihood would be either

$$

\left(\frac{r_1}{r_1+r_2+r_3+r_4+r_5} \right)

\left( \frac{r_2}{r_2+r_3+r_4+r_5} \right)

$$

or

$$

\left(\frac{r_2}{r_1+r_2+r_3+r_4+r_5} \right)

\left( \frac{r_1}{r_1+r_3+r_4+r_5}\right) \,,

$$

but we don't know which. Notice that the product of the numerators remains

constant, but that of the denominators does not. How do we approximate this?

The Breslow approximation is the most commonly used, as it is the easiest

to program. It simply uses the complete sum $\sum r_i$ for both denominators.

Clearly, if the proportion of ties is large this will deflate the partial

likelihood.

The Efron approximation uses $.5r_1 + .5r_2 + r_3 + r_4 +r_5$ as the

second denominator, based on the idea that $r_1$ and $r_2$ each have

a 50\% chance of appearing in the ``true'' second term. If there were 4

tied deaths, then the ratios for $r_1$ to $r_4$ would be 1, 3/4, 1/2, and

1/4 in each of the four denominator terms, respectively.

Though it is not widely used, the Efron approximation is only slightly more

difficult to program than the Breslow version. In particular, since the

downweighting is independent of $w$ and thus of $\beta$, the form of the

derivatives is unchanged.

There are several ways to approach an ``exact'' calculation. One is to

use the average of the two possible denominators as the denominator for the

second term. This calculation quickly gets cumbersome if the number

of tied subjects $d$ who perish at a given time is at all large, since it

is the average of $d$ terms for the second denominator, ${d \choose 2}$

terms for the third, etc.

Note that if the risk scores for the tied subjects were all equal, then the

Efron approximation agrees precisely with this exact calculation.

Another tack is to use the marginal probability that subjects 1 and

2 both expire before subjects 3, 4 and 5. The form of the likelihood

changes considerably in this case, and the product of terms 1 and 2 is

replaced by

$$

\int_0^\infty \left[1-\exp \left(-\frac{r_1}{r_3+r_4+r_5}\right) \right]

\left[1-\exp \left(-\frac{r_2}{r_3+r_4+r_5}\right) \right]

e^{-t} \,dt

$$

If there are $d$ subjects $r_1$ to $r_d$ tied some given time, and we let $s$ be the

sum of the remaining scores in the risk set, the above integral expands to

$$

1 - \sum_{i=1}^{d}\frac{s}{r_i +s} + \sum_{i \ne j} \frac{s}{r_i + r_j +s}

- \sum_{i \ne j \ne k} \frac{s}{r_i+r_j+r_k +s} + \ldots \,,

$$

which is the same amount of work as the average denominator calculation.

(Though similar, the two expressions are not equivalent for $d>2$).

Some tedious algebra shows that if the risk scores for the tied subjects are

all equal, this equals $d!$ times the Efron approximation,

and thus leads to exactly the same solution for $\bhat$.

This would imply that

the first and second ``exact'' methods would be close for actual data sets.

The exact logistic likelihood, or exact partial likelihood, comes from viewing

the data as genuinely discrete.

The denominator in this case is

$ \sum_{i \ne j} r_i r_j$

if there are two subjects tied, $\sum_{i\ne j \ne k} r_i r_j r_k$ if there

are three subjects tied, etc.

The compute time for this case will be even larger than for the calculation

above. If there are ties, the value can be considerably different than the

first exact method.

The SAS phreg procedure implements the second and third exact method. A

small amount of empiric checking has verified that the Efron approximation is

very close to the exact marginal likelihood, and so only the exact partial

likelihood has been implemented in the S package.

Because of the superiority of the Efron approximation, the coxph function has

departed from all other Cox regression programs (that I know of) by making

it the default option rather than the Breslow approximation.

Note that when there are no ties, all the methods reduce to the same form.

The Efron approximation also induces changes in the residuals' definitions.

In particular, the Cox score statistic is still

\begin{equation}

U = \sum_{i=1}^n \int_0^\infty (Z_i(s) - \bar Z(s))\, dN_i(s)\,,

\end{equation}

but the definition of $\bar Z(s)$ has changed if there are tied deaths

at time $s$.

If there are $d$ deaths at $s$, then there were $d$ different values of $\bar Z$

used at the time point. The Schoenfeld residuals use

$\bar{\bar Z}$, the average of

these $d$ values, in the computation.

The martingale and score residuals require a new definition of

$\lhat$.

If there are $d$ tied deaths at time $t$,

we again assume that in the exact (but unknown) untied data

there are events and corresponding jumps in the cumulative hazard at $t \pm

\epsilon_1 < ...< t \pm \epsilon_d$.

Then each of the tied subjects will in

expectation experience all of the first hazard increment, but only

$(d-1)/d$ of the second, $(d-2)/d$ of the

next, and etc. If we equate observed to expected hazard at each

of the $d$ deaths, then the total increment in hazard at the time point is

the sum of the denominators of the weighted means. Returning to our earlier

example of 5 subjects of which 1 and 2 are a tied deaths:

$$

d\lhat(t) = \frac{1}{r_1+r_2+r_3+r_4+r_5} + \frac{1}{r_1/2 + r_2/2

+r_3 + r_4 + r_5}\,.

$$

For the null model where $r_i=1$ for all $i$, this agrees with the suggestion

of Nelson (1969) to use $1/5+1/4$ rather than $2/5$ as the increment to the

cumulative hazard.

The score residuals do not work out to as neat a formula, though the

computation is no harder. For subject 1 in the example,

the residual at time 1 is the sum $a+b$ of the 2 terms:

\begin{eqnarray*}

a &=& \left(Z_1 - \frac{r_1Z_1 + r_2Z_2+ \ldots + r_5Z_5}

{r_1 + r_2 + \ldots + r_5} \right )

\left ( \frac{dN_1}{2} - \frac{r_1}{r_1 + r_2 + \ldots+ r_5}

\right ) \; {\rm and}\\

b &=& \left(Z_1 - \frac{r_1Z_1/2 + r_2Z_2/2 + \ldots +r_5Z_5}

{r_1/2 + r_2/2 + \ldots + r_5} \right )

\left ( \frac{dN_1}{2} - \frac{r_1/2}{r_1/2 + r_2/2 + \ldots+ r_5}

\right ) \,.

\end{eqnarray*}

This product does not neatly collapse into $(Z_1 - \bar{\bar Z})\, d\Mhat_i$

but is nevertheless fairly easy to compute.

To those who wish to check the algebra: start with the expanded ($d$ term)

definition of the increment to $U$, and repeat the infinitesimal jackknife

calculations of Cain and Lange \cite{Cain}.

This argument carries through as well to predicted survival. There is a

change in weights but no change in the form of the equations.

The connection between residuals and the exact partial likelihood is not

as precise, e.g. the score residuals will not correspond to an infinitesimal

jackknife.

The exact calculation is used only rarely, the form of the

computations will be quite different, and it thus appears to be not worth

the bother. If residuals are requested after a Cox fit with the \Co{exact}

method, the Breslow formulae are used.

\subsubsection{Tests for Proportional Hazards}

The key ideas of this section are taken from Grambsch and Therneau

\cite{GTresid}.

Most of the common alternatives to proportional hazards can be cast in

terms of a {\em time-varying coefficient} model. That is, we assume that

$$

\lambda(t;Z) = \lambda_0(t) e^{\beta_1(t)Z_1 + \beta_2(t)Z_2 + \ldots}

\,.

$$

(If $Z_j$ is a 0-1 covariate, such as treatment, this formulation is completely

general in that it encompasses all alternatives to proportional hazards.)

The proportional hazards assumption is then a test for $\beta(t) = \beta$,

which is a test for zero slope in the appropriate plot of

$\bhat(t)$ on $t$.

Let $i$ index subjects, $j$ index variables, and $k$ index the death times.

Then let $s_k$ be the Schoenfeld residual and $V_k$ be the contribution to

the information matrix at time $t_k$ (see equation \ref{inf}).

Define the rescaled Schoenfeld residual as

$$

s^*_k = \bhat + s_k V_k^{-1} \, .

$$

The main results are:

\begin{itemize}

\item $E(s^*_k) = \beta(t_k)$, so that a smoothed plot of $s^*$ versus time gives a

direct estimate of $\bhat(t)$.

\item Many of the common tests for proportional hazards are linear tests for

zero slope, applied to the plot of $r^*$ versus $g(t)$ for some function $g$.

In particular, the Z:PH test popularized in the SAS PHGLM procedure

corresponds to $g(t) =$ rank of the death time.

The test of Lin \cite{Lin} corresponds to $g(t) = K(t)$, where $K$ is the

Kaplan-Meier.

\item Confidence bands, tests for individual variables, and a global test

are available, and all have the fairly standard ``linear models'' form.

\item The estimates and tests are affected very little if the individual

variance estimates $V_k$ are replaced by their global average

$\Vbar=\sum V_k /d = {\cal I}/d$. Calculations then require only the Schoenfeld

residuals and the standard Cox variance estimate ${\cal I}^{-1}$.

\end{itemize}

For the global test, let $g_1(t), g_2(t), \ldots$ be the desired

transformations of time for variables 1, 2, etc,

and $G_k$ be the diagonal matrix $g_1(t_k), g_2(t_k), \ldots$ .

Then

$$

T = \left( \sum G_k s_k \right)' D^{-1} \left( \sum G_k s_k \right)

$$

is asymptotically $\chi^2$ on $p$ degrees of freedom, where

$$

D = \sum G_kV_kG_k - \left(\sum G_k V_k \right )

\left( \sum V_k \right)^{-1} \left(\sum G_k V_k \right )' \, .

$$

Because the $s_k$ sum to zero, a little algebra shows that the above

expression is invariant if $G_k$ is replaced by $G_k - cI$ for any

constant $c$. Subtraction of a mean will, however, result in less

computer round-off error.

In any rational application, we will have $g_1 = g_2= \ldots = g$, and then

we can replace each matrix $G_k$ with a scalar $g(t_k) \equiv g_k$

in the above formulas. A

further simplification occurs by using $\Vbar$, leading to

\begin{equation}

T = \left[ \sum (g_k-\bar g) s_k \right]'

\left [ \frac{d {\cal I}^{-1}}{\sum(g_k - \bar g)^2} \right ]

\left[ \sum( (g_k- \bar g) s_k \right]

\label{Tph}

\end{equation}

For a given covariate $j$, the diagnostic plot will have $y_k = s^*_{kj}$

on the vertical axis and $g_k$ on the horizontal.

The variance matrix of the $y$ vector is $\Sigma_j = (A - cJ) + cI$,

where $A$ is a $d$x$d$ diagonal matrix whose $k$th diagonal element is

$V^{-1}_{k,jj}$, $c = {\cal I}^{-1}_{jj}$,

$J$ is a $d$x$d$ matrix of ones and $I$ is the identity.

The $cI$ term reflects the uncertainty in $s^*$ due the the $\bhat$

term. If only the shape of $\beta(t)$ is of interest (e.g., is it linear

or sigmoid) the term could be dropped. If absolute values are important

(e.g. $\beta(t)=0$ for $t>2$ years) it should be retained.

For smooths that are linear operators, such as splines or the lowess function,

the final smooth is $\hat y = H y$ for some matrix $H$. Then $\hat y$

will be asymptotically normal with mean 0 and variance $H \Sigma_j H'$.

Standard errors can be computed using ordinary linear model methods.

Although the covariance between residuals is roughly $-1/d$ times their variance, it cannot

be neglected in computing standard errors for the smooth. For larger

smoothing spans, simulations showed up to 25\% inflation in the width of

the interval if covariance was ignored.

If $V_k$ is replaced with $\Vbar$, then $\Sigma_j$ simplifies to

${\cal I}^{-1}_{jj}((d+1)I -J)$. With the same substitution, the

component-wise test for linear association is

\begin{equation}

t_j = \frac{\sum (g_k - \bar g) y_k}

{\sqrt{d {\cal I}^{-1}_{jj} \sum(g_k - \bar g)^2 }}

\label{tph}

\end{equation}

The \Co{cox.zph} function uses (\ref{Tph}) as a global test of proportional

hazards, and (\ref{tph}) to test individual covariates. The plot method for

\Co{cox.zph} uses a natural spline smoother (\Co{lowess} might be preferred,

but the necessary $H$ matrix is not readily obtained);

confidence bands for the smooth are based

on the full covariance matrix, with $\Vbar$ replacing $V_k$.

Formulae aside, reasonably accurate results can be obtained by using other

methods directly on the residuals. The return value of\Co{cox.zph}

contains both the $g(t)$ vector and the $y$ matrix that are appropriate for plots.

These can be used as the $x$ and $y$ data for a \Co{gam} model with

identity link and Gaussian errors. for example.

The size of the confidence

band will be conservative (too large) since,

as discussed above,

the correlation

between the data points has been ignored. This effect will tend to decrease

as the sample size increases, since a smaller fraction of the data will be

in any smoothing window. Secondly, the overall estimate of variation may

be larger, since it is estimated using the variation of each $y$ value from

the fitted function; the $V_k$ estimates are based on the variation of each

$y$ from its risk set.

Though the simulations in Grambsch and Therneau (1993) did not uncover

any situations where the simpler formulae based on $\Vbar$ were

less reliable, such cases could arise. The substitution trades a possible

increase in bias for a substantial reduction in the variance of the individual

$V_k$. It is likely to be unwise in those cases where the variance of

the covariates, within the risk sets, differs substantially between different

risk sets. Two examples come to mind. The first would be a stratified

Cox model, where the strata represent different populations. In a muli-center

clinical trial for instance, inner city, Veterans Administration and

suburban hospitals often service quite disparate populations.

In this case a separate average $\Vbar$ should be formed for each strata.

A second example

is where the covariate mix changes markedly over time, perhaps because of

aggressive censoring of certain patient types.

These cases have not been addressed directly in the software. However,

\Co{coxph.detail} will return all of the $V_k$ matrices,

which can then be used to construct specialized tests for

such situations.

Clearly, no one scaling function $g(t)$ will be optimal for all situations.

The \Co{cox.zph} function directly supports four common choices: identity,

log, rank, and 1 - Kaplan-Meier. By default, it will use the last of these,

based on the following rationale. Since the test for proportional hazards

is essentially a linear regression of the scaled residual on $g(t)$, we

would expect this test to be adversely effected if there are outliers in $x$.

We would also like the test to be only mildly (if at all) effected by the

censoring pattern of the data. The Kaplan-Meier transform appears to satisfy

both of these criteria.

\subsubsection{Robust Variance}

\label{rvar}

Robust variance calculations are based on the {\em sandwich estimate}

$$

V = A B A' \,

$$

where $A^{-1} = \cal I$ is the usual information matrix, and B is a

``correction term''.

The genesis of this formula can be found in Huber \cite{Huber}, who discusses

the behavior of any solution to an estimating equation

$$

\sum_{i=1}^n \phi(x_i, \bhat) =0 \,.

$$

Of particular interest is the case of a maximum likelihood estimate based on distribution

$f$ (so that $\phi = \partial \log f / \partial \beta$),

when in fact the data are observations from distribution $g$.

Then, under appropriate conditions, $\bhat$ is asymptotically normal with

mean $\beta$ and covariance $V= ABA'$, where

\begin{eqnarray*}

A &=& \left ( \frac{\partial E \Phi(\beta)} {\partial \beta}

\right )^{-1} \\

\end{eqnarray*}

and $B$ is the covariance matrix for $\Phi = \sum \phi(x_i, \beta)$.

Under most situations the derivative can be moved inside the expectation,

and $A$ will be the inverse of the usual information matrix.

This formula was rediscovered by White \cite{White1} \cite{White2}

(under less general conditions I believe, but all these papers are a bit

over my head),

and is also known in the econometric literature as White's method.

Under the common case of maximum likelihood estimation we have

\begin{eqnarray*}

\sum \phi &=& \sum_{i=1}^n \frac{\partial \log f(x_i)} {\partial \beta} \\

&\equiv& \sum_{i=1}^n u_i(\beta) \,.

\end{eqnarray*}

Then by interchanging the order of the expectation and the derivative,

$A^{-1}$ is the expected value of the information matrix, which will be

estimated by the observed information ${\cal I}$.

Since $E[u_i(\beta)]$ =0,

\begin{eqnarray}

B = {\rm var}(\Phi) &=& E(\Phi^2) \nonumber \\

&=& \sum_{i=1}^n E[u_i'(\beta) u_i(\beta)]

+ \sum_{i \ne j} E[u_i'(\beta) u_j(\beta)] \,

\label{U4}

\end{eqnarray}

where $u_i(\beta)$ is assumed to be a row vector.

If the observations are independent, then the $u_i$ will also be independent

and the cross terms in equation (\ref{U4}) above will be zero. Then a natural

estimator of $B$ is

\begin{eqnarray*}

\widehat B &=& \sum_{i=1}^n u_i'(\bhat) u_i(\bhat) \\

&=& U'U \,,

\end{eqnarray*}

where $U$ is the matrix of {\em score residuals}, the $i$th row of $U$ equals

$u_i(\bhat)$. The column sums of $U$ are the efficient score vector $\Phi$.

As a simple example consider generalized linear models. McCullagh and

Nelder \cite{glim} maintain that overdispersion ``is the norm in practice and

nominal dispersion the exception.'' To account for overdispersion they

recommend inflating the nominal covariance matrix of the regression

coefficients $A = (X'WX)^{-1}$ by a factor

$$

c = \sum_{i=1}^n \frac{(y_i - \mu_i)^2}{V_i} / (n-p) \, ,

$$

where $V_i$ is the nominal variance. Smith and Heitjan \cite{SH} show that

$AB$ may be regarded as a multivariate version of this variance

adjustment factor, and that $c$ and $AB$ may be interpreted as the average

ratio of actual variance $(y_i - \mu_i)^2$ to nominal variance $V_i$. By

premultiplying by $AB$, each element of the nominal variance-covariance

matrix $A$ is adjusted differentially for departures from nominal

dispersion.

When the observations are not independent, the estimator $B$

must be adjusted accordingly. The ``natural'' choice

$(\sum u_i)^2$ is not available of course, since $\Phi(\bhat)=0$ by

definition. However, a reasonable estimate is available

when the correlation is confined to subgroups. In particular, assume that

the data comes from clustered sampling with $j=1, 2, \dots, k$ clusters, where

there may be correlation within cluster but observations from different

clusters are independent. Using equation (\ref{U4}), the cross-product terms

between clusters can be eliminated, and the resulting equation rearranged

as

$$

{\rm var}(\Phi) = \sum_{j=1}^k \tilde u_j(\beta)' \tilde u_j(\beta) \,,

$$

where $\tilde u_j$ is the sum of $u_i$ over all subjects in the $j$th cluster.

This leads to the {\em modified sandwich estimator}

$$

V = A (\tilde U' \tilde U) A \,,

$$

where the collapsed score matrix $\tilde U$ is obtained by replacement of

each cluster of rows in $U$ by the sum of those rows.

If the total number of clusters is

small, then this estimate will be sharply biased towards zero, and some other

estimate must be considered. In fact, ${\rm rank}(V) < k$, where $k$ is

the number of clusters. Asymptotic results for the modified sandwich

estimator require that the number of clusters tend to infinity.

Application of these results to the Cox model requires an expression for the

score residuals matrix $U$. Equations (\ref{U1}) and (\ref{U2}) show the

partial likelihood written in two forms, and (\ref{U3}) yet another;

which should be used as a basis for

our work? One way to proceed is to define a weighted Cox partial likelihood,

and then let

$$

u_i(\beta) \equiv \left( \frac{\partial U}{\partial w_i}

\right)_ { w =1} \, ,

$$

where $w$ is the vector of weights; this approach isused in Cain and Lange

\cite{Cain} to define a leverage or influence measure for Cox regression.

In particular, they derive the leverage matrix

$$

L = U {\cal I}^{-1}\,,

$$

where $L_{ij}$ is the approximate change in $\bhat$ when observation $i$ is

removed from the data set. Their estimate can be recognized as a form of

the {\em infinitesimal jackknife}, see for example the discussion in Efron

\cite{Efron} for the linear models case.

The same leverage estimate is derived using a slightly different argument

by Reid and Cr\'{e}peau \cite{Reid}. They mention, but do not persue, the use

of $L'L$ as a variance estimate.

Specific applications

of the sandwich and modified sandwich estimators, detailed below, have all

re-derived this result as part of their development.

In fact the connection to the jackknife is quite general. For any model

stated as an estimating equation, the Newton-Raphson iteration has step

$$

\Delta \beta = 1'(U{\cal I}^{-1})\,,

$$

the column sums of the matrix $L= U{\cal I}^{-1}$. At the solution $\hat \beta$

the iteration's step size is, by definition, zero. Consider the following

approximation to the jackknife

\begin{enumerate}

\item treat the information matrix ${\cal I}$ as fixed

\item remove observation $i$

\item beginning at the full data solution $\hat\beta$, do one Newton-Raphson

iteration.

\end{enumerate}

This is equivalent to removing one row from $L$, and using the new column

sum as the increment. Since the column sums of $L(\bhat)$ are zero, the

increment must be $\Delta \beta = -L_{i.}$. That is, the rows of $L$ are

an approximation to the jackknife, and the sandwich estimate of variance

$L'L$ is an approximation to the jackknife estimate of variance.

When the data are correlated, the appropriate form of the jackknife is to

leave out an entire {\em subject} at time, rather than one observation, i.e.,

the grouped jackknife. To approximate this, we leave out groups of

rows from $L$, leading to $\tilde L' \tilde L$ as the approximation to the

jackknife.

Lin and Wei \cite{LW} show the applicability of Huber's work to the partial

likelihood, and derive the ordinary Huber sandwich estimate

$V= {\cal I}^{-1}(U'U){\cal I}^{-1}= L'L$.

They also discuss situations in which this is estimate is preferable,

including the important cases of omitted covariates and incorrect functional

form for the covariate.

The relationship of their estimate to the leverage matrix $L$ is

not noted by the authors.

Lee, Wei and Amato \cite{LWA}

consider highly stratified data sets which arise from inter observation

correlation. As an example they use paired eye data on visual loss

due to diabetic retinopathy, where photocoagulation was randomly assigned

to one eye of each patient.

There are $n/2=1742$ clusters (patients) with 2 observations per cluster.

Treating each pair of eyes as a cluster, they derive the modified sandwich

estimate $V=\tilde L' \tilde L$, where $\tilde L$ is derived from $L$ in the

following way. $L$ will have one row, or observation, per eye.

Because of possible correlation, we want to reduce this to a leverage matrix

$\tilde L$ with one row per individual. The leverage (or row) for an individual is

simply the sum of the rows for each of their eyes. (A subject, if any, with

only one eye would retain that row of leverage data unchanged).

The resulting estimator is shown to be much more efficient than analysis

stratified by cluster.

A second example given in Lee, Wei and Amato

concerns a litter-matched experiment. In this case the number of rats/litter

may vary.

Wei, Lin and Weissfeld \cite{WLW} consider multivariate survival times. An

example is the measurement of both time to progression of disease and

time to death for a group of cancer patients.

The data set again contains $2n$ observations, time and status variables,

subject id, and covariates. It also contains an indicator variable \Co{etype} to

distinguish the event type, progression vs. survival. The suggested model

is stratified on event type, and includes all strata$\times$covariate

interaction terms. One way to do this in S is

\begin{Example}

> fit2 Ltilde newvar 0$.

Using weights appropriate to the survey sampling scheme, Binder suggests use

of the modified sandwich estimate ${\cal I}^{-1} B {\cal I}^{-1}$ where

$B = {\rm var}(U)$ ``can be estimated using design based methods'', though

he gives no specifics on what these might be. His derivation of the

score residual vectors $u_i$ differs from the above, but the same result

is obtained, and shown in the last equation of his section 3.

In a simulation study he compares the naive, sandwich, and modified sandwich

estimators, with the latter being the most reliable.

Lin \cite{Lin} also develops a weighted Cox model, in the context of tests

for proportional hazards. His estimates of the score and hence of

$\bhat$ are based on equation (\ref{bind:u}), but without redefinition of

$\bar Z$ to include weights. It is thus not related to case weights, but

rather to weighted log-rank statistics such as the Tarone-Ware family

\cite{Miller}.

Estimates for this model can be obtained from S in three steps; assume that

$w$ is the weight variable:

\begin{enumerate}

\item Use \Co{coxph} with $w$ as weights and $-\log(w)$ as an offset to

estimate Lin's weighted $\bhat$.

\item Fit a second cox model, without weights or an offset, but with the

coefficients constrained to equal the results of the first model.

(Use initial values and \Co{iter=0}.)

The \Co{coxph.detail}

function can be applied to this second model to obtain the individual

$v_i$ estimates.

\item Estimate the variance of $\bhat$ as $ABA$, where $A= (\sum w_i

v_i)^{-1}$ and $B= \sum w_i^2 v_i$.

\end{enumerate}

Tests for proportional hazards are more easily accomplished, however, using the

\Co{cox.zph} function.

An exciting use of weights is presented in Pugh et al. \cite{Pugh}, for

inference with missing covariate data. Let $\pi_i$ be the probability

that none of the covariates for subject $i$ is missing, and $p_i$

be an indicator function which is 0 if any of the covariates actually

is NA, so that $E(p_i) = \pi_i$.

The usual strategy is to compute the Cox model fit over only

the complete cases, i.e., those with $p_1=1$. If information is not missing

at random, this can lead to serious bias in the estimate of $\bhat$.

A weighted analysis with weights of $p_i/ \pi_i$ will correct for this

imbalance. There is an obvious connection between this idea and survey

sampling: both reweight cases from underrepresented groups.

In practice $\pi_i$ will be unknown, and the authors suggest estimating

it using a logistic regression with $p_i$ as the dependent variable. The

covariates for the logistic regression may be some subset of the Cox

model covariates (those without missing information), as well as others.

In an example, the authors use a logistic model with follow-up time

and status as the predictors.

Let $T$ be the matrix of score residuals from the logistic model, i.e.

$$

T_{ij} = \frac{\partial}{\partial \alpha_j}

[p_i \log \pi_i(\alpha) + (1-p_i) \log(1- \pi_i(\alpha))] ,

$$

where $\alpha$ are the coefficients of the fitted logistic regression.

Then the estimated variance matrix for $\bhat$

is the sandwich estimator ${\cal I}^{-1} B {\cal I}^{-1}$,

where

$$

B = U'U - [U'T] [T'T]^{-1} [T'U] \,.

$$

This is equivalent to first replacing each row of $U$ with the residuals

from a regression of $U$ on $T$, and then forming the product $U'U$.

Note that if the logistic regression is completely uninformative

($\hat\pi_i = $ constant), this reduces to the ordinary sandwich estimate.

For either of the Breslow or the Efron approximations,

the extra programming to handle weights is modest.

For the Breslow method the logic behind the addition is also straightforward,

and corresponds to the derivation given above. For tied data and the

Efron approximation, the formula is based on extending the basic idea of

the approximation, $E(f(r_1, r_2, \ldots)) \approx f(E(r_1), E(r_2), \ldots)$

to include the weights, as necessary. Returning to the simple example

of section \ref{Tied} above, the second term of the partial likelihood is

either

$$

\left( \frac{w_1r_1}{w_1r_1 + w_3r_3 + w_4r_4 + w_5r_5} \right )

$$

or

$$

\left( \frac{w_2r_2}{w_2r_2 + w_3r_3 + w_4r_4 + w_5r_5} \right )\,.

$$

To compute the Efron approximation,

separately replace the numerator with $.5(w_1r_1 + w_2r_2)$ and the

denominator with $ .5w_1r_1 + .5w_2r_2 + w_3r_3 + w_4r_4 + w_5r_5$.

\subsubsection{Estimated survival}

The methods for expected survival after a Cox model parallel those for

expected survival based on a population. One can create individual

survival curves or cohort survival curves, and the latter can be in any

of the Ederer, Hakulinen, or conditional frameworks.

For individual survival,

recall that the estimated cumulative hazard for a subject with covariate

process $x(t)$ is

$$

\Lambda(t; x) = \int_0^t e^{\bhat' x(s)} d\lhat_0(s) ',,

$$

with either the Breslow or Efron estimator of $\lhat_0$.

The choice of the Breslow or Efron estimate should be consistent with the

option chosen to break ties when the Cox model was fit (this is the default

action of the software).

The estimated

survival function is then $\hat S(t;x) = \exp(-\lhat(t;x))$.

If the vector of coefficients $\bhat$ were treated as fixed, then the

variance of the cumulative hazard would be

$$

V(t) = \int_0^t e^{2\bhat' x(s)} \frac {d \bar N(s)}

{(\sum Y_i(s) e^{\bhat' Z_i(s)})^2}

$$

for the Breslow estimator, which is a natural extension of the Aalen estimate

of variance in Nelson's hazard estimator. If Efron's method were used, the

variance estimate will be slightly larger.

The actual variance for the cumulative hazard must also account for the

error in estimation of $\bhat$

Tsiatis \cite{tsiatis} and Link \cite{Link1, Link2} have derived this, and

with some rearragement their result can be written as

\begin{equation}

V_c(t) = \int_0^t \{1 + [x(s) - \zbar(s)]' {\cal I}^{-1} (x(s) - \zbar(s)]\}

dV(s) '\,,

\label{coxvar}

\end{equation}

where $V_c$ is the variance under the Cox model, $V$ is the naive variance

given above, $Z$ is the covariate set for the fitted model, $x$ is the

covariate vector for this curve,

and ${\cal I}^{-1}$ is the variance matrix for $\bhat$.

The increase in variance is largest for a covariate set that is far from

the mean.

Gail and Byar \cite{Gail} have extended this result to the estimation of

multiple curves, and show that if $x(s)$ and $x^*(s)$ were two separate

covariate vectors, then the covariance of the two estimated cumulative

hazards is

$$

\int_0^t \{1 + [x(s) - \zbar(s)]' {\cal I}^{-1} (x^*(s) - \zbar(s)]\}

dV(s) '\,,

$$

where $V$ has had the obvious substitution of $x(s) + x^*(s)$ in place of

$2 x(s)$.

No results are given for a stratified Cox model. Any individual curve

will be for a given covariate set $x$ and a given strata $k$, so the variance

term will be as before with $dV_k$ as a differential. The cross product

terms could presumably be integrated with respect to $(dV_k + dV_{k^*})/2$;

however, this creates extra complexity in the programming and has not yet

been implimented.

Since the formula involves the covariate average over time $\zbar(t)$ for

the {\em original study}, it cannot usually be calculated from published

summaries, e.g., to esimate the expected survival of the patient in front

of you based on a report in the literature. However, for most Cox fits

$\zbar(t)$ will vary only slightly over time, and a reasonable approximation

could be made if the report contained both the initial averages and the

confindence intervals for any particular case.

We can also write $V_c(t) = V(t) + d(t)'{\cal I}^{-1} d(t)$. Some

manipulation shows that

the vector $d(t)$ is the score residual process for a hypothetical new

subject with covariates $x$ and no events, which is a measure of the leverage

of such an observation on the estimate of $\beta$. It is intuitive that

if the covariate value(s) $x$ has small leverage on $\bhat$,

then the variance in $\bhat$ will have small effect on the curve.

The calculations for a weighted model have the same form, using the jumps

in the weighted cumulative hazard function

$$

\lhat_0(t) = \sum_{i=1}^n \int_0^t w_i \frac{dN_i(s)}

{\sum Y_j(s)w_j \hat r_j(s)} \,.

$$

This curve is applicable to a single patient,

and is the appropriate object to plot when considering the predicted survival

for some future patient who has some particular set of covariates.

Another use is to fit a stratified model, for example

\begin{Example}

Surv(time, status) ~ age + sex + strata(treatment)

\end{Example}

Then a plot of the pair of curves is a comparison of the treatment groups,

adjusted to common age and sex values. This can be useful when two treatments

are unbalanced with respect to an important covariate.

It is common practice to use these curves for group survival as well.

The curve for the ``average'' patient, i.e., the curve corresponding to

a fictional subject with mean values for each covariate, is then used

as the predicted survival curve for the entire cohort. Though convenient

this procedure is incorrect. What should be done follows from exactly the

same discussion as found above in the section on expected survival for a

population matched reference. Either the Ederer, Hakulinen, or conditional

computation can be used.

One use of these cohort averages is to summarize and present the results

of a study.

This issue is reviewed in Thomsen, Keiding and Altman \cite{Thomsen},

albeit using very different terms. Center stage is given to the analog of

the Ederer estimate, referred to as the {\em

direct adjusted survival curve}, a term coined earlier by Makuch

\cite{Makuch}.

Using a simple example, Thomsen et. al. demonstrate that the estimate based

on a single average patient will be badly biased whenever there is

large variation in the individual risk scores $\beta'x$.

A second use is for historical comparison.

With the increased availability of published regression analyses of survival

with specific diseases it has become possible to make historical comparisons

between the observed survival in a (possibly fairly small) study group and the

survival that was to be expected from a published regression analysis. One

area where this has been used is in transplantation, where randomized studies

are logistically (and perhaps ethically) impossible, so that the best answer

available to the question ``what would have happened to these patients if they

had not been transplanted?'' will be via comparison with published

pre-transplatation-therapy experience \cite{Dickson}.

In this case $\lambda_i$ will come from the older study, with the original

Cox model covariates as the matching variables. Follow-up and censoring

time will come from the new data set.

A variance estimate for the direct adjusted curve $S_d$ is derived in Gail and

Byar \cite{Gail}. Let

$$

S_d = (1/n) \sum_{i=1}^n S_i(t)

$$

Where $S_i$ is the individual survival curve for subject $i$ in the {\em new}

data set. This is calculated as above, using $x_i(t)$ for the new subject

but risk sets and variance based on the original Cox model fit. Then

$$

{\rm var}(S_d) = (1/n)^2 [ \sum {\rm var}(S_i) + \sum_{i\ne j}

{\rm cov}(S_i, S_j) ].

$$

Thomsen et al. also discuss the

conditional estimate

$$

\exp \left[-\int_0^t \frac{\sum Y_i(s)\lambda_0(s) e^\eta_i(s)}

{\sum Y_i(s)} \,ds\right ] \,.

$$

They conclude that the curve itself is ``not easy to interpret'' because it

mixes observed mortality, through $Y_i$, with expected mortality,

through $\lambda_i$. However, the difference in log survival curves can

be used as an estimate of excess mortality as is done in

Andersen and V\ae th \cite{Ander4}.

This author believes that extension of Hakulinen's cohort method

is the most appropriate way to combine expected curves in the Cox model.

However, I am not aware of any discussion of this in the literature.

The failure of the Ederer method, remember, occurs when there is significant

change in the enrollment criteria over the course of a study. This is of

major concern in historical reviews that span $>10$ years, as the age profile

often changes dramatically. For a Cox model, this will be an issue whenever

there is similar change in the values at entry for one of the variables

that was included in the model.

To conclude this section I must comment on a common {\em abuse} of the

Cox model expected curve. This is the comparison of the

expected curve for a study to the Kaplan-Meier of the same study as a

test for ``goodness of fit''.

This is nonsense since:

\begin{enumerate}

\item Some difference between the two can arise because of different

approximations, i.e., the S functions default to an estimate that is

comparable to the Fleming-Harrington method. This will differ from the

Kaplan-Meier in the tails, where $n$ is small.

\item Some simple algebra shows that the conditional estimator {\em is} the

F-H estimate of the raw data, independent of the value of $\bhat$.

\item If the censoring pattern is not independent of risk, then the Ederer

estimate will differ from the K-M because of the censoring effect, even if

the Cox model is completely correct.

\item For most data sets the value of $\zbar(t)$ varies only slowly with time.

In this case the individual survival curve for the average subject $\zbar(0)$

will also be approximately equal to the F-H estimator, independent of $\bhat$.

\item If a data set includes time-dependent covariates, the individual

survival curve for any fixed $x$ can be very surprizing.

\end{enumerate}

\subsection{Parametric Regression}

\subsubsection{An IRLS formulation}

With some care, parametric survival can be written so as to fit into the

iteratively reweighted least squares formulation used in Generalized

Linear Models of McCullagh and Nelder \cite{glim}.

A detailed description of this setup for general maximum likelihood

computation is found in Green \cite{green}.

Let $y$ be the data vector, and $x_i$ be the vector of covariates for the

$ith$ observation. Assume that

$$ z_i \equiv \frac{t(y_i) - x_i'\beta}{\sigma} \sim f $$

for some distribution $f$, where $y$ may be censored and $t$ is a

differentiable

transformation function.

Then the likelihood for $t(y)$ is

$$ l = \left( \prod_{exact} f(z_i)/\sigma \, \right)

\left( \prod_{right} \int_{z_i}^\infty f(u) du \, \right)

\left( \prod_{left} \int_{-\infty}^{z_i} f(u) du \,\right)

\left( \prod_{interval} \int_{z_i^*}^{z_i} f(u) du \right),

$$

where ``exact'', ``right'', ``left'', and ``interval'' refer to uncensored,

right censored, left censored, and interval censored observations,

respectively,

and $z_i^*$ is the lower endpoint of a censoring interval.

Then the log likelihood is defined as

\begin{equation}

\log(l) = \sum_{exact} g_1(z_i) - log(\sigma) + \sum_{right} g_2(z_i) +

\sum_{left} g_3(z_i) + \sum_{interval} g_4(z_i, z_i^*).

\label{ggdef} \end{equation}

Derivatives of the LL with respect to the regression parameters are:

\begin{eqnarray}

\frac{\partial LL}{\partial \beta_j} &=&

\sum_{i=1}^n x_{ij} \frac{\partial g}{\partial \eta_i} \\

\frac{\partial^2 LL} {\partial \beta_j \beta_k} &=&

\sum x_{ij}x_{ik} \frac{\partial^2 g}{\partial \eta_i^2}\, ,

\end{eqnarray}

where $\eta = X'\beta$ is the vector of linear predictors.

Thus if we treat $\sigma$ as fixed, then iteration

is equivalent to IRLS with weights of $-g''$ and adjusted dependent variable

of $\eta - g'/g''$.

The Newton-Rhaphson step defines an update $\delta$

$$ (X^T DX) \delta = X^T U, $$

where $D$ is the diagonal matrix formed from $-g''$,

and $U$ is the vector $g'$.

The current estimate $\beta$ satisfies $X \beta = \eta$, so that the new

estimate $\beta + \delta$ will have

\begin{eqnarray*}

(X^T DX)(\beta + \delta) &=& X^T D \eta + X^T U \\

&=& (X^T D) (\eta + D^{-1}U)

\end{eqnarray*}

One clever use of this within S is to return an object

that inherits from class {\em glm}. Many of the stepwise methods are

inherited: they do one step update approximations, effectively under the

assumption that the extra parameters are fixed. This is a useful and quick

first approximation for new fits.

Nevertheless, there are several differences from GLM models.

\begin{enumerate}

\item Glm models assume that $y$ comes from a particular distribution, while

its mean follows some link form $t(\mu) = \eta$. Here we assume that the

transformed data $t(y)$ itself follows the given distribution.

If the data were uncensored, the fit of a gaussian model with log link will

{\em not} be the same as a glm fit with log link. Survfit will assume that

the error distribution is log-normal.

\item The nice starting estimate procedure for glm models had to be modified.

The other coefs are not independent of $\sigma$, and $\sigma$ often appears

to be the most touchy variable in the iteration. We start with a naive

estimate: the variance of y, ignoring censoring.

With sigma fixed, the other variables can be estimated by doing one iteration

with $\eta = t(y)$ (for interval censored use the midpoint),

e.g., the glm trick.

Then $\sigma$ is reestimated as the variance of the

unweighted residuals.

\item The maximized value of the loglikelihood

for a right or left censored observation is 0, since by making $\eta$ sufficiently

large or small we can insure that the relevant integral is 1.

Thus there is no ``deviance correction". The deviance correction for

an interval censored datum is a nuisance for the

the asymmetric distributions, but simple for the others.

\end{enumerate}

Most often, the routines will be used with $t(y) = \log(y)$, which

corresponds to the set of accelerated failure time models. But using the

above formulation, it was not very hard to make the programs completely

general. Note that $t()$

does not appear explicitly in any of the derivations.

The only difference between the following two fits is the scale of the

fitted values: in the latter they are in units of $\log(y)$.

\begin{Example}

> fit1 fit2 ovarian coxph(Surv(futime, fustat){\Twiddle} age, ovarian)

Call: coxph(formula = Surv(futime, fustat) {\Twiddle} age,

data=ovarian)

coef exp(coef) se(coef) z p

age 0.162 1.18 0.0497 3.25 0.00116

Likelihood ratio test=14.3 on 1 df, p=0.000156 n= 26

\end{Example}

For a more complicated model, the result should probably be saved in a

temporary variable. It can then be printed multiple times, and residuals

and/or predicted values may be extracted.

\begin{Example}

> fit print(fit)

Call: coxph(formula = Surv(futime, fustat) {\Twiddle} residual.dz + rx + ecog.ps)

coef exp(coef) se(coef) z p

residual.dz 1.347 3.844 0.680 1.980 0.0478

rx -0.749 0.473 0.595 -1.260 0.2078

ecog.ps 0.453 1.573 0.590 0.767 0.4431

Likelihood ratio test=6.03 on 3 df, p=0.11 n= 26

\end{Example}

The print function is invoked automatically, so in the example above the

user could have typed ``fit'' instead of ``print(fit)''. A more complete

printout is produced by the summary function. This adds confidence intervals,

Wald and score tests,

and an $R^2$ measure based on work of Nagelkirke \cite{Nagel}. This

measure needs to be proven over time, but certainly is one of

the easier ones to implement that I've seen, and appears very well founded.

An option to the summary function can be

used to get confidence intervals at levels other than .95.

The stratified Cox model can be obtained by

using a strata directive within the fit.

\begin{Example}

> fit summary(fit)

Call:

coxph(formula = Surv(futime, fustat) {\Twiddle} age + ecog.ps + strata(rx),

data=ovarian)

N= 26

coef exp(coef) se(coef) z p

age 0.1385 1.149 0.048 2.885 0.00391

ecog.ps -0.0967 0.908 0.630 -0.154 0.87799

exp(coef) exp(-coef) lower .95 upper .95

age 1.149 0.871 1.045 1.26

ecog.ps 0.908 1.102 0.264 3.12

Rsquare= 0.387 (max possible= 0.874 )

Likelihood ratio test= 12.7 on 2 df, p=0.00174

Efficient score test = 12.2 on 2 df, p=0.0022

\end{Example}

After the fit is completed residuals from the fitted model can be obtained

using the \Co{resid} function. By default the martingale residuals are

produced, also available are deviance, score, and Schoenfeld residuals.

For any of these it is sufficient to give the shortest unique abbreviation

of the residual type. Two common transformations of the score residuals

can also be requested: \Co{dbeta} and \Co{dfbetas}. These are the

approximate change in the coefficient vector if observation $i$ is dropped,

and that change scaled by the variance of $\beta$.

Martingale residuals are used most often in an

assessment of functional form, score residuals play a role in assessing

influential or leverage data points as well as computation of robust

``sandwich'' variance estimators, and the Schoenfeld residuals are

useful in assessing time trends or lack of proportionality in one of

the coefficients of the model. Deviance residuals, though they have an

interesting theoretical justification, have not proven very useful in

practice.

\begin{Example}

> fit mresid dresid sresid resid(fit, "scho")

age residual.dz rx ecog.ps

59 2.69315678 0.06761161 -0.1256239 -0.5072536

115 5.36390193 0.08039118 -0.1493686 -0.6031317

156 -0.89877404 0.10683988 -0.1985109 0.1984379

268 6.95664457 0.12857952 -0.2389036 0.2388158

329 -15.73656567 0.28889884 -0.5367805 -0.4634169

353 4.06104424 -0.70587652 0.4535120 0.5282024

365 5.50035871 0.25348266 0.4796229 -0.4413864

431 -8.06809462 0.27490178 -0.4297023 -0.5248323

464 -2.15471513 0.23158423 0.5066040 0.4814387

475 0.57065101 0.25226661 0.5518479 0.5244351

563 0.06487254 -0.47274521 0.3319974 0.2747028

638 1.64752693 -0.50593435 -0.6446946 0.2939883

\end{Example}

The martingale and deviance residuals are each a vector of length $n$,

where $n$ is the number of subjects in the data.

The score residuals form an $n$ by $p$ matrix, with one column per regressor

variable, and are components of the first derivative of the partial

likelihood. By definition, the column sums of the score residual matrix

will be zero at $\hat\beta$. The Schoenfeld residuals have one row for

each death in the data set and $p$ columns,

the time point of the death is returned as the

row label of the matrix.

As with other models in S,

a factor variable may be expanded into multiple contrasts in the $X$ matrix

(though there are none in this example).

It will then appear as multiple columns in the score or Schoenfeld residuals

as well.

Tests for proportional hazards are based on rescaled Schoenfeld residuals,

and can be obtained with \Co{cox.zph}.

They are based on Grambsch and Therneau \cite{GTresid}, and are discussed

further in section on mathematical backround.

\begin{Example}

> temp print(temp)

rho chisq p

age -0.03989 0.02618 0.8715

residual.dz -0.14168 0.24626 0.6197

rx 0.13246 0.20012 0.6546

ecog.ps 0.48448 1.88192 0.1701

GLOBAL NA 3.36086 0.4993

> plot(temp)

> plot(temp, var=2)

\end{Example}

The plot shows time (or a monotone function of time) on the $x$ axis and

the rescaled residuals on the $y$ axis, with one plot per covariate.

An overlaid smooth curve is an estimate of $\beta(t)$, a time-dependent

regression coefficient. Under proportional hazards we must have

$\beta(t) = \beta$, i.e., the hazard ratio does not vary with time. The

standard printout includes the correlation coefficients $\rho$ for the plots

of each variable

along with tests for $\rho=0$, and a global test of proportional hazards

based on all the covariates. (There is no applicable value of $\rho$ for

the global test).

The \Co{var} option can be used to

create only a single plot, in this case for the second variable in the

model statement: (\Co{residual.dz}). This is useful when one wishes to

add a main title, a horizontal line at $y=0$, or other annotation to

the plot.

Predicted values are available based on either the linear predictor

$\eta=X'\hat\beta$, the risk for an individual relative to the average subject

within the data set $r=\exp(\eta)$, the expected number of events for an

individual over the time interval that they were observed to be at risk

(which is a component of the martingale residual), or for individual

components of the linear predictor $\eta$.

\begin{Example}

> lp risk expt term round(risk,2)

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15

14.43 18.9 9.71 26.49 0.38 1.14 2.16 0.44 0.54 0.93 1.21 1.18 1.71 5.21 0.42

16 17 18 19 20 21 22 23 24 25 26

1.27 0.16 1.66 0.86 0.1 0.23 0.31 0.2 0.25 0.17 0.72

> fit

\end{Example}

An optional \Co{data} argument to the predict function

is a new data set. It allows one to obtain predicted

values for subjects who are not part of the original

study. Because it is positioned as the second argument to \Co{predict},

the examples above must explicitly include

``type ='' (or a second comma), where this was not necessary in \Co{residual}.

Another option to the

predict function is to return standard errors of the

predicted values.

\subsection{Fitted survival curves}

The survfit function is used for fitting a survival curve, either to

original data or after a Cox model or parametric model fit.

\begin{Example}

> sf summary(sf)

\end{Example}

The above example would result in four survival curves, indexed by the two

levels of treatment and the two levels of residual disease. The right hand

side of the formula is interpreted differently than it would be for an

ordinary linear or Cox model.

Technically, the formula should have been expressed using

a * operator instead of +, since the desired result is for all

four levels or `interactions'.

We process the + symbol as though it

were an interaction, as well as producing labels that are longer (and

hopefully more readable) than the default labels generated by the *

operator.

Each of the following formulas would have produced

the same output curves, though the first has different labels.

\begin{Example}

Surv(futime, fustat) {\Twiddle} interaction(rx,residual.dz)

Surv(futime, fustat) {\Twiddle} strata(rx,residual.dz)

\end{Example}

Another example is shown

below, for a somewhat smaller study of acute

myelogenous leukemia; the data can be found on page

49 of Miller \cite{Miller},

\begin{Example}

> amlsf sf

Call: survfit.formula(formula = Surv(time, status) {\Twiddle} group, data=aml)

n events mean se(mean) median 0.95CI 0.95CI

group=Maintained 11 7 52.6 19.83 31 18 NA

group=Nonmaintained 12 11 22.7 4.18 23 8 NA

\end{Example}

Similarly to other S model programs, the print function gives a very

short synopsis and the summary function provides more complete information.

For survfit the default print contains only n, the total number of events,

the mean survival time and its standard error,

median survival, and confidence intervals for the

median. The mean is based on a truncated estimate, i.e., the survival

curve is assumed to go to zero just past the last follow-up time. It is

thus an under estimate of the mean when the last observation(s) is

censored. The confidence interval for the median is based on the

confidence intervals for the survival curve:

the lower and upper limits are the intersection of a horizontal line at .5

with the lower and upper confidence bands for $S(t)$.

The confidence interval will change if a different

confidence level or confidence type is specified in the \Co{survfit} call.

If the upper confidence band for $S(t)$ never reaches 0.5, as in the example

above, then the upper confidence limit for the median is unknown.

\begin{Example}

>summary(sf)

Call: survfit.formula(formula = Surv(time, status) {\Twiddle} group, data=aml)

group=Maintained

time n.risk n.event survival std.err lower 95% CI upper 95% CI

9 11 1 0.909 0.0867 0.7541 1.000

13 10 1 0.818 0.1163 0.6192 1.000

18 8 1 0.716 0.1397 0.4884 1.000

23 7 1 0.614 0.1526 0.3769 0.999

31 5 1 0.491 0.1642 0.2549 0.946

34 4 1 0.368 0.1627 0.1549 0.875

48 2 1 0.184 0.1535 0.0359 0.944

group=Nonmaintained

time n.risk n.event survival std.err lower 95% CI upper 95% CI

5 12 2 0.8333 0.1076 0.6470 1.000

8 10 2 0.6667 0.1361 0.4468 0.995

12 8 1 0.5833 0.1423 0.3616 0.941

23 6 1 0.4861 0.1481 0.2675 0.883

27 5 1 0.3889 0.1470 0.1854 0.816

30 4 1 0.2917 0.1387 0.1148 0.741

33 3 1 0.1944 0.1219 0.0569 0.664

43 2 1 0.0972 0.0919 0.0153 0.620

45 1 1 0.0000 NA NA NA

\end{Example}

By default, the summary includes one row for each time at which a

death occurred, and for each of these times lists the number of subjects

who are at risk just prior to the event, the number of events that

occurred, the survival estimate, its standard error, and upper and lower

95\% confidence intervals calculated on the hazard scale.

Options to the survfit

routine include estimates based on the Nelson cumulative hazard estimator

instead of the Kaplan-Meier (as proposed by Fleming and Harrington),

confidence intervals based on the log hazard scale, and the level for the

confidence interval. An option to the summary routine

allows the listing to be printed for selected survival times and/or to

include censored time points within the printout.

Survival estimates for the data set as a whole can be generated by

using the null model, i.e. \Co{\Twiddle 1} as the right-hand side of the formula.

\begin{Example}

> survfit(Surv(time, status) {\Twiddle}1, aml)

\end{Example}

It is also allowable to leave off the \Co{\Twiddle 1}; the same result will

be obtained.

The survfit function now supports case weights via the weight argument.

One obvious use of this feature is for data sets with multiplicities of the

input lines, i.e., instead of having one line of data appear three times it

could appear once with a weight of three. For instance, if weights of two

are attached to each of the 23 cases in the AML data this effectively

doubles the data set, the resultant survival is identical to the original,

but the variance of the hazard is halved. Case weights were included as an

option in the program less for this reason, however, than to facilitate

various manipulations that require fractional case weights, see for

example Turnbull \cite{Turnbull}, who uses this in an EM algorithm for

left censored data.

When \Co{survfit} is used with fractional weights, the returned variance

estimate is probably worthless.

The survfit function also can generate predicted survival curves for a Cox

model by using the resultant fit as a first argument to the function.

\begin{Example}

> attach(ovarian)

> fit survfit(fit)

\end{Example}

This will produce two survival curves,

one for each of the two \Co{rx} strata;

each curve will be for a ``pseudo cohort" whose age and ecog performance

score are equal to the mean values for the data set that was fit.

An important aspect of the new modeling language is the (largely undocumented)

number of {\em side effects}.

If the last line above were executed without first attaching the \Co{ovarian}

data frame (perhaps in a later S session) it would fail.

This is because estimation of the survival curve

requires some data summaries that were not saved in the {\em fit} object.

In order to obtain them,

the data is reconstructed based on the call that produced the {\em fit}

object, and this reconstruction requires essentially the same environment to exist as was

originally present.

However, if the original fit had been produced using the \Co{data=ovarian}

argument to \Co{coxph}, no attachment would be necessary.

More subtle errors arise if important options differ at the later call, i.e.,

\Co{na.action} or \Co{contrasts}.

A summary of the side effects in survival models is

found later in this document.

The original version of survfit had an option for {\em risk weights}.

This option had to do with survival curves following

a Cox model with known coefficients, and is now

obtained in a more obvious way. The example below shows a fit

with risk weights, i.e. $\exp(X'\beta)$, where $X'\beta= $ .1 to 2.3.

\begin{Example}

> survfit(coxph(Surv(aml$time, aml$status) {\Twiddle} offset(1:23/10)))

\end{Example}

This example is contrived:

normally the offset statement would contain

the result of a coxph fit, perhaps on a different set of data.

I have used this feature on two occasions in my own analyses. In each case

the second data set had a small number of patients, and we wished to

test the effect of variable ``x'' after adjusting for a set of baseline variates

known to effect survival. Though the data set was not large enough to

model and estimate all of the variables concurrently, it was sufficient when

the regression coefficients for the baseline variates were fixed at the

values from an earlier, larger study.

For instance, if the earlier study had a variable \Co{old} and a fitted

regression coefficient of 2.3, the new model would have had

\Co{offset(2.3 *old)} as a term in the model.

Another use of offsets is to model the {\em relative} mortality of a

population. This is discussed in example Andersen, et al.

\cite{Ander3}

using diabetes in the county of Fyn. In this model the covariates were

age at onset, sex, and the population mortality or hazard ``phaz''

which depends on age and sex and is

set up as a time dependent covariate. Then a model for

absolute mortality is

\begin{Example}

Surv(start, stop, status) ~ age + sex

\end{Example}

and the model for relative mortality is

\begin{Example}

Surv(start, stop, status) ~ age + sex + offset(log(phaz))

\end{Example}

As a comparison of the two models, they suggest fitting a third model

\begin{Example}

Surv(start, stop, status) ~ age + sex + log(phaz)

\end{Example}

where a values of $\bhat_3 =0$ corresponds to absolute mortality and

$\bhat_3 =1$ to a model for the relative mortality. The fitted value of

-.39 suggests that the model for absolute mortality is preferred.

Note that when a curve is requested using only the \Co{offset} directive,

using the same data set that created the coefficients for the offset,

the resultant survival estimate is identical

to the curve based on the model's \Co{fit} component but the variance will be

deflated. The survival curve from a Cox model includes in

its variance a term that accounts for the error in $\hat\beta$, and an offset

model implicitedly states that $\beta$ is known without error.

In order to get curves for a pseudo cohort other than one centered at the

mean of the covariates, use the newdata argument to survfit. The newdata

argument is a list, data frame or matrix which contains in each row all

variables found in the right-hand side of the equation that was fit,

excluding strata terms.

Multiple rows in the new data frame will result in multiple ``cohorts" of

subjects, for example:

\begin{Example}

> fit survfit(fit, newdata=list(age=c(30,50), ecog.ps=c(2,2))

\end{Example}

This will produce two survival curves, the first for an imagined cohort of

subject who were age 30 with a performance score of 2, and the second for

an age 30 group. A more complete example is given below.

(Amendment: if the model includes a strata by covariate interaction, use of

the \Co{newdata} argument currently leads to failure, including a very

nonintuitive error message. This is a bug, and should be fixed in the next

release.)

In keeping with other S models, the newdata argument can also be

the number of a data frame.

(I don't like this, but it was important to be compatible).

When the formula has only one variable on the

right-hand side of the equation and the given newdata argument is a single

integer,

then the intent of the user is ambiguous.

For instance

\begin{Example}

> fit survfit(fit, newdata=3)

\end{Example}

Is the intention to have $x=3$ or to use the variable $x$ found in data

frame number 3?

To avoid this anomaly, be explicit by writing

\Co{survfit(fit, newdata=list(x=3))}.

(And for those who are curious, the ambiguity is, if necessary, resolved by

the following rule:

If the number is a positive

integer, this routine assumes it is a frame number, otherwise the number is

assumed to be the actual value for the covariate.)

Here is another example from the Fleming data

set.

Note the use of \Co{x=T}.

This causes a copy of the final $X$ matrix to be saved in the \Co{fit}

object, including any transformations, dummy variables that represent factors,

etc.

When subsequent calculations are planned for a

fitted Cox model, such as residuals or survival curves, this will save

significant computation time (at the expense of a larger \Co{fit} object)

since $x$ does not have to be reconstructed.

\begin{Example}

> fit summary(fit)

Call:

coxph(formula = Surv(futime, fustat) {\Twiddle} age + ecog.ps + strata(rx),

data=ovarian, x=T)

N= 26

coef exp(coef) se(coef) z p

age 0.1385 1.149 0.048 2.885 0.00391

ecog.ps -0.0967 0.908 0.630 -0.154 0.87799

exp(coef) exp(-coef) lower .95 upper .95

age 1.149 0.871 1.045 1.26

ecog.ps 0.908 1.102 0.264 3.12

Rsquare= 0.387 (max possible= 0.874 )

Likelihood ratio test= 12.7 on 2 df, p=0.00174

Efficient score test = 12.2 on 2 df, p=0.0022

> summary(survfit(fit))

Call: survfit.coxph(object = fit)

rx=1

time n.risk n.event survival std.err lower 95% CI upper 95% CI

59 13 1 0.978 0.0266 0.9275 1

115 12 1 0.951 0.0478 0.8620 1

156 11 1 0.910 0.0760 0.7722 1

268 10 1 0.861 0.1055 0.6776 1

329 9 1 0.736 0.1525 0.4909 1

431 8 1 0.627 0.1704 0.3680 1

638 5 1 0.333 0.2296 0.0865 1

rx=2

time n.risk n.event survival std.err lower 95% CI upper 95% CI

353 13 1 0.943 0.141 0.703 1

365 12 1 0.880 0.144 0.638 1

464 9 1 0.789 0.165 0.523 1

475 8 1 0.697 0.177 0.424 1

563 7 1 0.597 0.182 0.329 1

> summary(survfit(fit, list(age=c(30,70), ecog.ps=c(2,3))))

Call: survfit.coxph(object = fit, newdata = list(age = c(30, 70),

ecog.ps = c( 2, 3)))

rx=1

time n.risk n.event survival1 survival2

59 13 1 0.999 0.87905

115 12 1 0.999 0.74575

156 11 1 0.998 0.57399

268 10 1 0.996 0.41765

329 9 1 0.992 0.16677

431 8 1 0.988 0.06492

638 5 1 0.973 0.00161

rx=2

time n.risk n.event survival1 survival2

353 13 1 0.999 0.7093

365 12 1 0.997 0.4739

464 9 1 0.994 0.2494

475 8 1 0.991 0.1207

563 7 1 0.987 0.0489

\end{Example}

The first call to survfit asks only for a single curve at the mean of the

covariates. (The value of those means is stored in the fit object as

fit\$means.) The second call asks for curves for two hypothetical cohorts, one

has an age of 30 and a performance score of two, the second is age 70 with

a performance score of three. The printout requires some

explanation. The printouts for the two treatment strata are listed in sequence:

since the event times are different in the two strata they cannot be listed

side-by-side. The survivals for the two age * ps cohorts are listed side

by side since they are computed at the same time points.

Age is a very significant variable in the model:

survival of a subject age 70 is significantly worse

than one

age 30. The standard error and confidence intervals

are computed in the second example as they were in

the first, and are present in the returned survival

structure, but since their inclusion would be too wide for the

paper the printing routine leaves them off.

The functions for predicted survival, unfortunately, share a basic flaw

with S prediction in other models. If the model equation involves any

special functions such as \Co{ns}, \Co{poly} {\em or involves factor

variables}, then naive use of \Co{survfit} will result in incorrect answers.

In particular, for a factor it is important that \Co{newdata} has the same

number of levels as the original data, or contrasts will not be coded as

they were originally.

\begin{Example}

> fit srv srv srv srv srv options(contrasts=c("contr.treatment", "contr.poly"))

> sfit.1 _ coxph(Surv(start, stop, event){\Twiddle} (age + surgery)*transplant,

data=jasa1, method='breslow')

> sfit.2 _ coxph(Surv(start, stop, event){\Twiddle} year*transplant,

data=jasa1, method='breslow')

> sfit.3 _ coxph(Surv(start, stop, event){\Twiddle} (age + year)*transplant,

data=jasa1, method='breslow')

> sfit.4 _ coxph(Surv(start, stop, event){\Twiddle} (year + surgery)*transplant,

data=jasa1, method='breslow')

> sfit.5 _ coxph(Surv(start, stop, event){\Twiddle} (age +surgery)*transplant +year,

data=jasa1, method='breslow')

> sfit.6 _ coxph(Surv(start, stop, event){\Twiddle} age*transplant + surgery + year,

data=jasa1, method='breslow')

> summary(sfit.1)

Call:

coxph(formula = Surv(start, stop, event) {\Twiddle} (age + surgery) * transplant)

N= 172

coef exp(coef) se(coef) z p

age 0.0138 1.014 0.0181 0.763 0.446

surgery -0.5457 0.579 0.6109 -0.893 0.372

transplant 0.1181 1.125 0.3277 0.360 0.719

age:transplant 0.0348 1.035 0.0273 1.276 0.202

surgery:transplant -0.2916 0.747 0.7582 -0.385 0.701

exp(coef) exp(-coef) lower .95 upper .95

age 1.014 0.986 0.979 1.05

surgery 0.579 1.726 0.175 1.92

transplant 1.125 0.889 0.592 2.14

age:transplant 1.035 0.966 0.982 1.09

surgery:transplant 0.747 1.339 0.169 3.30

Rsquare= 0.07 (max possible= 0.969 )

Likelihood ratio test= 12.4 on 5 df, p=0.0291

Efficient score test = 12 on 5 df, p=0.0345

> sfit.2

Call: coxph(formula = Surv(start, stop, event) {\Twiddle} year * transplant)

coef exp(coef) se(coef) z p

year -0.265 0.767 0.105 -2.518 0.0118

transplant -0.282 0.754 0.514 -0.549 0.5831

year:transplant 0.136 1.146 0.141 0.967 0.3337

Likelihood ratio test=8.61 on 3 df, p=0.035 n= 172

\end{Example}

One line of the above printout may generate confusion: \Co{N = 172}.

This is the number of {\em observations} in the data set, not the

number of subjects. There are actually 103 patients, of which 69 had a

transplant and are thus represented using 2 rows of data.

When there are time dependent covariates, the predicted survival curve

can present something of a dilemma. The usual call's result is for a

pseudo cohort whose covariates do not change--

\begin{Example}

>fit1 fit2 data survfit(sfit.1, data, individual=T)

\end{Example}

Another useful extension is time dependent strata. The following examples

come from a study of liver transplantation. As in the heart transplant study,

there is a variable waiting time for any patient. One question of interest

is the efficacy of liver transplant, another is the utility of a particular

risk score developed at the Mayo Clinic. The variables for a given

patient are:

\begin{itemize}

\item (tstart, tstop, status]: the time interval, open on the left and closed

on the right. Status is 1 if the subject died at time tstop. All times are

in days since enrollment in the study.

\item base.rsk: The risk score at study entry. This covariate was defined by

the ``Mayo model'' analysis, on an independent data set.

The actual definition involves 5 variables: .871*log(bilirubin) + .039*age + \ldots.

\item trisk: time-dependent risk score. The latest value we have for the

risk score (most patients are evaluated about once a year).

\item transplant: time-dependent transplant status.

\item tx.time: For transplanted patients, the number of days from enrollment

to transplant.

\end{itemize}

There are 83 patients, who generate approximately 600 observations in the

constructed data frame. The number of observations for a given patient

depends on the number of determinations of his/her risk score, and on whether

they were transplanted.

\begin{Example}

> attach(timedep)

> options(contrasts=c("contr.treatment", "contr.poly"))

> yy tfit1 tfit2 tfit3 tfit4 yyy coxph(yyy {\Twiddle} trisk*strata(trplant) )

\end{Example}

In SAS or BMDP, it is possible to mimic time dependent strata by breaking a

subject into two new subjects. Because each subject's time interval

implicitly starts at 0 in these packages, there is effectively a realignment

of the data.

\subsection{Differences in survival}

There is a single function \Co{survdiff} to test for differences between 2

or more survival curves. It implements the $G^\rho$ family of Fleming and

Harrington \cite{FH2}. A single parameter $\rho$ controls the weights

given to different survival times, $\rho=0$ yields the log-rank test and

$\rho=1$ the Peto-Wilcoxon. Other values give a test that is intermediate

to these two. The default value is $\rho=0$.

The interpretation of the formula is the same as for \Co{survfit}, i.e.,

variables on the right hand side of the equation jointly break the

patients into groups.

\begin{Example}

> survdiff(Surv(time, status){\Twiddle} group, aml)

N Observed Expected (O-E)^2/E

Maintained 11 7 10.689 1.273

Nonmaintained 12 11 7.311 1.862

Chisq= 3.4 on 1 degrees of freedom, p= 0.06534

\end{Example}

For one-sample tests see the section on expected survival.

\subsection{Competing Risks}

\label{Compete}

This running example is taken from the paper by Wei, Lin and Weissfeld (WLW)

\cite{WLW}. They use a data set on time to recurrence of bladder

cancer; a copy of the data may be obtained from statlib. A portion of

the data is shown below:

\begin{Example}

Recurrence Time

Treatment Follow-up Initial Initial ----------------

group time number size 1 2 3 4

1 0 1 1

1 1 1 3

1 4 2 1

1 7 1 1

1 10 5 1

1 10 4 1 6

1 14 1 1

1 18 1 1

1 18 1 3 5

1 18 1 1 12 16

1 23 3 3

1 23 3 3 10 15

1 23 1 1 3 16 23

1 23 1 1 3 9 21

. .

. .

. .

\end{Example}

Code for reading in this data can be found in the Examples directory. After

reading it in, we have created the following data set

\begin{Example}

> bladder[1:20,]

id rx size number start stop event enum

1 1 1 1 3 0 1 0 1

2 1 1 1 3 1 1 0 2

3 1 1 1 3 1 1 0 3

4 1 1 1 3 1 1 0 4

5 2 1 2 1 0 4 0 1

6 2 1 2 1 4 4 0 2

7 2 1 2 1 4 4 0 3

8 2 1 2 1 4 4 0 4

9 3 1 1 1 0 7 0 1

10 3 1 1 1 7 7 0 2

11 3 1 1 1 7 7 0 3

12 3 1 1 1 7 7 0 4

13 4 1 5 1 0 10 0 1

14 4 1 5 1 10 10 0 2

15 4 1 5 1 10 10 0 3

16 4 1 5 1 10 10 0 4

17 5 1 4 1 0 6 1 1

18 5 1 4 1 6 10 0 2

19 5 1 4 1 10 10 0 3

20 5 1 4 1 10 10 0 4

. .

. .

. .

\end{Example}

Notice that this data set has exactly 4 observations for each subject.

A second data set, bladder2, has had all of the rows with

\Co{start==stop} removed, and also has a fifth observation for some

subjects (those with follow-up after the fourth recurrence).

The model explored in WLW is easily fit by the following commands.

The key addition to the model is \Co{cluster(id)}, which asserts that

subjects with the same value of the variable \Co{id} may be correlated.

In order to compare the results directly to WLW, we wish to

look at a different set of contrasts than the S default. These are

created ``by hand''

\begin{Example}

> options(contrasts='contr.treatment')

> wfit rx cmat wvar sqrt(diag(wvar))

[1] 0.3075006 0.3639071 0.4151602 0.4896743

\end{Example}

The same coefficients can also be obtained, as WLW do, by performing four

separate fits, and then combining the results.

\begin{Example}

> fit1 fit2 fit3 fit4 sc1 sc2 sc3 sc4 t11 t12 t44 wvar.all wvar fit2pa fit2pb afit sqrt(diag(afit$var))

[1] 0.24183067 0.05689293 0.07221747

> sqrt(diag(afit$naive.var))

[1] 0.20007253 0.04800789 0.07025758

\end{Example}

The naive estimate of standard error is .20, the correct estimate of .24

is intermediate between the naive estimate and the linear combination

estimate. (Since this model does not include strata by covariate

interactions, I would not expect an exact match).

Further discussion on these estimators can be found in section \ref{rvar}.

\subsection{Expected Survival}

Expected survival is closely related to a standard method in

epidemiology called {\em person years}, which consists of counting the total

amount of follow-up time contributed by the subjects within any of several

strata. Person-years analysis

is accomplished within S by the \Co{pyears} function. The

main complication of this function, as opposed to

a straightforward use of \Co{tapply}, is

that a subject may contribute to several different cells of the output array

during his follow-up. For instance, if the desired output table were

treatment group * age in years, a subject with 4 years of observation would

contribute to 5 different cells of the table (4 cells if he entered the

study exactly on his birthdate). The following counts up years of

observation for the Stanford heart patients by age group and surgical

status.

\begin{Example}

age fit plot(x1, resid(fit))

\end{Example}

That is, the residual from a model is the same shape as the {\em input} data,

independent of reductions that may have occurred in the intervening $X$

matrix.

Other actions in this last step are possible. For instance, if the na.action

had imputed numbers for the missing values, the predicted value for such

observations should perhaps be set to NA or otherwise marked as

unreliable.

Of the four effects, the second is dealt with by the \Co{na.action} extension to base

S. The first, the ability to choose a global action, is accomplished with

a change to \Co{model.frame.default}. If neither the particular function

call nor the data frame contained an explicit na.action, the routine now

checks for an na.action option.

To make na.omit the global default, type

\begin{Example}

> options(na.action='na.omit')

\end{Example}

Because \Co{model.frame.default} is called by all of the S modeling functions,

this action will apply to all of them, not just survival.

In order to implement the third and fourth effects, it was necessary to

pass na.action information as part of the returned model fit. This in

turn requires the na.action routine to store something in the model frame

that is passed to the fitting function.

Because the definition of what ``should be done'' to label the printout or

to modify the residuals depends on the na.action chosen, the fitting,

print, residual, and predict routines do not try to manipulate the

passed information directly, but use it as an argument to \Co{naprint}

and \Co{naresid} methods. This allows the actions taken to be extremely

general, and more importantly, they are easily modified by the user.

Specifically

\begin{enumerate}

\item The na.action routine optionally adds an attribute named {\em na.action}

to the data frame. The class of the attribute object should be the name of

the na action, but the contents or form of the object is unrestricted

and depends on the particular na.action.

\item The fitting routine adds this attribute to its output, as a component

named {\em na.action}.

\item The print routines for coxph, survfit, and etc. each contain a call

to the naprint method, with the passed na.action object as its argument.

This method is assumed to return a character string appropriate for inclusion

in the printout.

\item The residual and predicted value routines each pass their final result

through a call to the naresid method, whose arguments are the na.action

object and the vector or matrix of results.

\end{enumerate}

The package includes a modified version of \Co{na.omit} along with the

methods \Co{naprint.omit} and \Co{naresid.omit}.

The default naprint method returns a null string, and the default naresid

method returns its second argument, this mimics the unmodified S.

This allows the survival routines to work with ``standard'' S na.action

functions.

The standard modeling functions such as \Co{lm} ignore any attributes on

the data frame, and work ``as usual'' with the modified na.omit function.

\section {Overrides}

The survival package overrides four S functions. Effects of the

model.frame.default and na.omit functions were discussed above. The na.omit change

should have no effect on other S modeling routines. The change to

model.frame.default is more global: most of the modeling functions will now

``obey'' the na.action specified in an options statement. There are two

exceptions. The tree routine does not use model.frame.default, and appears

to not be affected at all, and the gam function has side effects. I have

not tracked this down completely, but think that it is due to

predict.safe.gam. If a gam model has been fit, and missing values were

deleted due to a global na.action option, then further manipulations using

the result of that fit can give mysterious errors about mismatched dimensions

and such. I believe that when the follow-up functions rebuild portions of

the data frame they do so without calling model.frame.default, and thus

create mismatches with the original fit. For the present, it is best to

explicitly give the na.action in gam.

The overrides to data.frame and print.data.frame allow for the inclusion

of Surv and date objects in a data frame. I have not seen any untoward

effects due to this change.

\begin{thebibliography}{99}

\bibitem{AG} Andersen, P.K. and Gill, R.D. (1982).

Cox's regression model for counting processes: A large sample study.

{\em Ann. Stat.} {\bf 10}, 1100-20.

\bibitem{Ander2} Andersen, P.K., Borgan, O., Gill, R.D., and Keiding, N. (1991)

{\em Statistical Models Based on Counting Processes}.

Springer-Verlag, New York.

\bibitem{Ander3} Andersen, P.K., Borch-Johnsen, K., Deckert, T., Green, A.,

Hougaard, P., Keiding, N., and Kreiner, S. (1985).

A Cox regression model for the relative mortality and its application to

diabetes mellitus survival data.

{\em Biometrics} {\bf 41}, 921-32.

\bibitem{Ander4} Andersen, P.K. and V\ae th, M. (1989).

Simple parametric and non-parametric models for excess and relative

mortality.

{\em Biometrics} {\bf 45}, 523-35.

\bibitem{Barlow} { Barlow, W.E. and Prentice, R.L.} (1988).

Residuals for relative risk regression.

{\em Biometrika} {\bf 75}, 65-74.

\bibitem{Berry} Berry, G. (1983)

The analysis of mortality by the subject years method.

{\em Biometrics} {\bf 39}, 173-84.

\bibitem{Binder} Binder, D.A. (1992).

Fitting Cox's proportional hazards models from survey data.

{\em Biometrika} {\bf 79}, 139-47.

\bibitem{Cain} Cain, K.C. and Lange, N.T. (1984).

Approximate case influence for the proportional hazards regression model

with censored data.

{\em Biometrics} {\bf 40}, 493-9.

\bibitem{Smodel} Chambers, J.M. and Hastie, T.J. (1992).

{\em Statistical Models in S}.

Wadsworth and Brooks/Cole, California.

\bibitem{Chen} Chen, C. and Wang, P.C. (1991).

Diagnostic plots in Cox's regression model.

{\em Biometrics} {\bf 47}, 841-50.

\bibitem{Crowley} Crowley, J. and Hu, M. (1977), Covariance analysis of

heart transplant data.

{\em J. Am. Stat. Assoc.} {\bf 72}, 27-36.

\bibitem{DKorn} Dorey, F.J. and Korn, E.L. (1987).

Effective sample sizes for confidence intervals for survival

probabilities.

{\em Statistics in Medicine} {\bf 6}, 679-87.

\bibitem{Ederer} Ederer, F., Axtell, L.M. and Cutler, S.J. (1961).

The relative survival rate: A statistical methodology.

{National Cancer Institute Mongraphs} {\bf 6}, 101-21.

\bibitem{Eder2} Ederer, F. and Heise, H. (1977).

Instructions to IBM 650 programmers in processing survival computations,

{\em Methodological Note No. 10, End Results Evaluation Section},

National Cancer Institute.

\bibitem{ovarian} Edmunson, J.H., Fleming, T.R., Decker, D.G.,

Malkasian, G.D., Jefferies, J.A., Webb, M.J., and Kvols, L.K. (1979),

Different Chemotherapeutic Sensitivities and Host Factors Affecting

Prognosis in Advanced Ovarian Carcinoma vs. Minimal Residual Disease.

{\em Cancer Treatment Reports} {\bf 63}, 241-47.

\bibitem{Efron} Efron, B. (19xx)

{\em The Jackknife, the bootstrap, and other resampling plans}.

SIAM CMBS-NSF Monograph {\bf 38}.

\bibitem{Escobar} Escobar, L.A. and Meeker W.Q., Jr. (1992).

Assessing influence in regression analysis with censored data.

{\em Biometrics} {\bf 48}, 507-528.

\bibitem{FH2} Fleming, T.R. and Harrington, D.P. (1981),

A class of hypothesis tests for one and two sample censored survival data.

{\em Communications in Statistics} {\bf A10(8)}, 763-94.

\bibitem{FH3} Fleming, T.R. and Harrington, D.P. (1984),

Nonparametric estimation of the survival distribution in censored data.

{\em Comm. in Statistics A} {\bf 13}, 2469-86.

\bibitem{FandH} Fleming, T.R. and Harrington, D.P. (1991),

{\em Counting Processes and Survival Analysis}, Wiley, New York.

\bibitem{Gail} Gail, M.H. and Byar, D.P. (1986).

Variance calculations for direct adjusted survival curves, with

applications to testing for no treatment effect.

{\em Biom J} {\bf 28}, 587-99.

\bibitem{Dickson} Grambsch, P.M., Dickson, E.R., Wiesner, R.H.

and Langworthy, A. (1989).

Application of the Mayo PBC survival model to liver transplant patients.

{\em Mayo Clinic Proc} {\bf 64}, 699-704.

\bibitem{GTresid} Grambsch, P. and Therneau, T.M. (1993)

Proportional hazards tests and diagnostics based on weighted residuals.

{\em Submitted}.

\bibitem{green} Green P.J. (1984).

Iteratively reweighted least squares for maximum likelihood estimation,

and some robust and resistant alternatives (with discussion).

{\em JRSSB} {\bf 46}, 149-92

\bibitem{Hak1} Hakulinen, T. (1982).

Cancer survival corrected for heterogeneity in patient withdrawal.

{\em Biometrics} {\bf 38}, 933.

\bibitem{Hak2} Hakulinen, T. and Abeywickrama, K.H. (1985).

A computer program package for relative survival analysis.

{\em Computer Programs in Biomedicine} {\bf 19}, 197-207.

\bibitem{Hak3} Hakulinen, T. (1977).

On long term relative survival rates.

{\em J. Chronic Diseases} {\bf 30}, 431-43.

\bibitem{Harrington} Harrington, D.P. and Fleming, T.R. (1982).

A class of rank test proceedures for censored survival data.

{\em Biometrika}, {\bf 69}, 553-66.

\bibitem{Hartz} Hartz, A.J., Giefer, E.E. and Hoffmann, G.G. (1983).

A comparison of two method for calculating expected mortality.

{\em Statistics in Medicine} {\bf 2}, 381-6.

\bibitem{Hartz2} Hartz, A.J., Giefer, E.E. and Hoffmann, G.G. (1984).

Letter and rejoinder on

``A comparison of two method for calculating expected mortality."

{\em Statistics in Medicine} {\bf 3}, 301-2.

\bibitem{Hartz3} Hartz, A.J., Giefer, E.E. and Hoffmann, G.G. (1985).

Letters and rejoinder on

``A comparison of two method for calculating expected mortality."

{\em Statistics in Medicine} {\bf 4}, 105-9.

\bibitem{Huber} Huber, P.J. (1967).

The behavior of maximum likelihood estimates under non-standard conditions.

{\em Proceedings of the Fifth Berkeley Symposium on Mathematical Statistics

and Probability} {\bf 1}, 221-33.

\bibitem{Kalb} Kalbfleisch, J.D. and Prentice R.L. (1980).

{\em The Statistical Analysis of Failure Time Data}, Wiley, New York.

\bibitem{Klein} Klein, J.P. (1991).

Small sample moments of some estimators of the variance of the

Kaplan-Meier and Nelson-Aalen estimators.

{\em Scand. J. Statistics} {\bf 18}, 333-40.

\bibitem{LWA} {Lee, E.W., Wei, L.J. and Amato D.} (1992).

{\em Cox-type regression analysis for large number of small groups of

correlated failure time observations}.

In Klein, J.P and Goel, P.K. (eds), Survival Analysis, State of the

Art, 237-247, Kluwer Academic Publishers, Netherlands.

\bibitem{Lin} Lin, D.Y. (1991).

Goodness-of-fit analysis for the Cox regression model based on a

class of parameter estimators.

{\em JASA} {\bf 86}, 725-728.

\bibitem{LW} { Lin, D.Y. and Wei, L.J.} (1989).

The robust inference for the Cox Proportional Hazards Model.

{\em J. Am. Stat. Assoc.} {\bf 84}, 1074-79.

\bibitem{LWZ} { Lin, D.Y., Wei, L.J. and Ying, Z.} (1992).

Checking the Cox model with cumulative sums of martingale-based residuals.

{\em Technical Report \#111}, Dept. of Biostatistics, U. of Washington.

\bibitem{Link1} Link, C.L. (1984).

Confidence intervals for the survival function using Cox's proportional

hazard model with covariates.

{\em Biometrics} {\bf 40}, 601-10.

\bibitem{Link2} Link, C.L. (1986).

Confidence intervals for the survival function in the presence of

covariates.

{\em Biometrics} {\bf 42}, 219-20.

\bibitem{Makuch} Makuch, R.W. (1982).

Adjusted survival curve estimation using covariates.

{\em J Chronic Diseases} {\bf 3}, 437-43.

\bibitem{Mantel} Mantel, N. (1966).

Evaluation of survival data and two new rank order statistics arising in

its consideration.

{\em Cancer Chemotherapy Reports} {\bf 50}, 163-6.

\bibitem{Miller} Miller, R.G. (1981), {\em Survival Analysis},

Wiley, New York.

\bibitem{glim} McCullagh, P. and Nelder, J.A. (1983).

{\em Generalized Linear Models.}

Chapman and Hall.

\bibitem{Nagel} Nagelkirke, N. (1991).

A note on a general definition of the coefficient of determination.

{\em Biometrika} {\bf 78}, 691-2.

\bibitem{Nelson} Nelson, W.B. (1969).

Hazard plotting for incomplete failure data.

{\em J. Quality Technology} {\bf 1}, 27-52.

\bibitem{Pren2} Prentice, R.L., Williams, B.J. and Peterson, A.V. (1981).

On the regression analysis of multivariate failure time data.

{\em Biometrika} {\bf 68}, 373-89.

\bibitem{Pugh} Pugh, M., Robins, J., Lipsitz, S., and Harrington, D. (1993).

Inference in the Cox proportional hazards model with missing covariate

data. In press.

\bibitem{Reid} Reid, N. and Cr\'{e}peau, H. (1985).

Influence functions for proportional hazards regression.

{\em Biometrika} {\bf 72}: 1-9.

\bibitem{Scho} Schoenfeld, D. (1980).

Chi-squared goodness-of-fit tests for the proportional hazards regression

model.

{\em Biometrika} {\bf 67}, 145-53.

\bibitem{SH} Smith, P.J. and Hietjan, D.F. (to appear).

Testing and adjusting for overdispersion in generalized linear models.

{\em J Royal Statistical Soc, Series C}.

\bibitem{Turnbull} Turnbull, B.W. (1974)

Nonparametric estimation of a survivorship function with doubly

censored data.

{\em JASA} {\bf 69}, 169-73.

\bibitem{TGF} Therneau, T.M., Grambsch P.M. and Fleming, T.R. (1990).

Martingale based residuals for survival models.

{\em Biometrika} {\bf 77}, 147-60.

\bibitem {Thomsen} Thomsen, T.L., Keiding, N. and Altman, D.G. (1991),

A note on the calculation of expected survival, illustrated by the

survival of liver transplant patients.

{\em Statistics in Medicine} {\bf 10}, 733-8.

\bibitem{tsiatis} Tsiatis, A.A. (1981).

A large sample study of Cox's regression model.

{\em Ann. Statistics} {\bf 9}, 93-108.

\bibitem{Verheul} Verheul, H.A., Dekker, E., Bossuyt, P., Moulijn, A.C.

and Dunning A.J. (1993).

Backround mortality in clinical survival studies,

{\em Lancet} {\bf 341}, 872-5.

\bibitem{WLW} { Wei, L.J., Lin, D.Y. and Weissfeld, L.} (1989).

Regression analysis of multivariate incomplete failure time data by

modeling marginal distributions.

{\em J. Am. Stat. Assoc.} {\bf 84}, 1065-73.

\bibitem{White1} White, H. (1980).

A heteroskedasticity-consistent covariance matrix estimator and a direct

test for heteroskedasticity.

{\em Econometrica} {\bf 48}, 817-30.

\bibitem{White2} White, H. (1982).

Maximum likelihood estimation of misspecified models.

{\em Econometrika} {\bf 50} 1-25.

\bibitem{smoke} {\em The Health Benefits of Smoking Cessation} (1990).

US Department of Health and Human Services. Public Health Service,

Centers for Disease Control, Center for Chronic Disease Prevention and

Health Promotion, Office on Smoking and Health.

DHHS Publication No (CDC)90-8416.

\end{thebibliography}

\end{document}

SHAR_EOF

cd ..

if test -f 'predict.doc'

then

echo shar: over-writing existing file "'predict.doc'"

fi

cat 'predict.doc'

What do I need to do predictions --

linear predictor: exists

+se : X matrix

+newdata: means of old X matrix, new X matrix, new offset

risk -- same as lp

expected -- cumulative hazard for subject= baseline haz + time + risk

+se : sqrt(expected)

+new : baseline hazard function, new time, new x, means of old X,

new offset, new strata

terms -- : X matrix and the means

+se : I matrix

+new : new X matrix and the old means

survival: baseline cum haz & it's time vector, strata

+se : old x matrix, subject strata, time, risk --> weighted means matrix

+new : above + new risk

The baseline hazard is a function of risk, time, and strata. So is the

cumulative hazard. For survival, se=T will usually be desired.

SHAR_EOF

# End of shell archive

exit 0

La notion de coût-efficacité s'est largement ... - Examen corrige

REPUBLIQUE ALGERIENNE DEMOCRATIQUE ET POPULAIRE ...

Monografia de Mestrado Profissional - IC-Unicamp

Organisation de la première séance - Cours

Correction sujet élève

1. Introduction sur les codes correcteurs - Infoscience - EPFL

TD N°1 :Réseaux et Télécommunications - Exercices corriges

TD N°1 :Réseaux et Télécommunications

Exercice 1 - Td corrigé

Travail pour le jeudi 18 janvier. 1) Lire chez vous les extraits ci ...

Acide éthanoïque

: