Survival

Censored and Truncated

Censored

Event occurs before the observing starting time, event time left to starting time. we called it left censored. For example, event is getting disease. A sample have already got it before we observe it. It is left censored data.
Event occurs after the observing ending time, event time right to ending time. we called it right censored. For example, a sample doesn’t get it until we stop observing. It is right censored data.
When data both left and right censored, we call double censored.

Truncated

The idea is quit similar to censor. Censored data is observed the kind of data but it doesn’t complete. Truncated data is that we will not use the incomplete data as a simple.

We only observe the data with event occuring after starting time, so we truncate those event left to starting time. Those data we called it left truncated.
We only observe the data with event occuring before ending time, so we truncate those event right to ending time. Those data we called it right truncated.
When data both left and right truncated, we call double truncated.

Basic Code

survfit(Surv(time1,time2,delta)~type, data = tongue):
We classify tongue by different type (No type we use 1); censored time from time1 to time2. delta means event, normally 0=alive, 1=dead.

1
2
3
4
5
library(survival)
fit <- survfit(Surv(t2,relapse)~1, data=drug6mp)
summary(fit)
names(summary(fit))
data.frame(fit$time,fit$n.risk,fit$n.event,fit$cumhaz)

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
## Call: survfit(formula = Surv(t2, relapse) ~ 1, data = drug6mp)
## 
##  time n.risk n.event survival std.err lower 95% CI upper 95% CI
##     6     21       3    0.857  0.0764        0.720        1.000
##     7     17       1    0.807  0.0869        0.653        0.996
##    10     15       1    0.753  0.0963        0.586        0.968
##    13     12       1    0.690  0.1068        0.510        0.935
##    16     11       1    0.627  0.1141        0.439        0.896
##    22      7       1    0.538  0.1282        0.337        0.858
##    23      6       1    0.448  0.1346        0.249        0.807


##  [1] "n"             "time"          "n.risk"        "n.event"      
##  [5] "n.censor"      "surv"          "std.err"       "cumhaz"       
##  [9] "std.chaz"      "type"          "logse"         "conf.int"     
## [13] "conf.type"     "lower"         "upper"         "call"         
## [17] "table"         "rmean.endtime"


##    fit.time fit.n.risk fit.n.event fit.cumhaz
## 1         6         21           3  0.1428571
## 2         7         17           1  0.2016807
## 3         9         16           0  0.2016807
## 4        10         15           1  0.2683473
## 5        11         13           0  0.2683473
## 6        13         12           1  0.3516807
## 7        16         11           1  0.4425898
## 8        17         10           0  0.4425898
## 9        19          9           0  0.4425898
## 10       20          8           0  0.4425898
## 11       22          7           1  0.5854469
## 12       23          6           1  0.7521136
## 13       25          5           0  0.7521136
## 14       32          4           0  0.7521136
## 15       34          2           0  0.7521136
## 16       35          1           0  0.7521136

time ( $t_i$ )：發生 event 的時間點，也就是存活函數的跳點
n.risk ( $Y_i$ )：在 $t_i^{-}$ 時還活著的人，包含在 $t_i$ 發生 event 的人

n.event ( $d_i$ )：在 $t_i$ 發生 event 的人數
survival ( $\hat{S(t_i)}$ )：K.M. estimator
std.err： $\hat{S(t_i)}$ 的standard error ，使用 Greenwood’s formula 算出
cumhaz：N.A. estimator，碰到censored的 $t_i$ 時，值不變

Left-Truncated

資料從左邊切斷，像是從資料的變數 ageentry 從 $(80 * 12)$ 開始看

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
library(KMsurv)
library(survival)
library(dplyr)

data(channing)
attach(channing)

surv.object<-Surv(ageentry,age,death)
surv.object[time==0]

r.left.80 <- channing %>%
  filter(ageentry >= (80 * 12)) %>%
  survfit(Surv(ageentry, age, death) ~ 1, data = .)

summary(r.left.80)
plot(r.left.80, xscale = 12)
abline(v = 80 * 12)

Censoring

Right-censored

假設總有一天會發生事件，而且發生在觀察時間的右邊

Left-censored

事件發生在觀察時間的左邊，像是毒癮者，接受治療前已經開始使用毒品

Interval-censored

事件發生在兩個觀察時間之間，像是每隔10小時才觀察一次，只知道事件發生的時間區間。

Estimator

Nelson-Aalen estimator

\begin{align*} \overset{\sim}{H(t)} = \left \lbrace \begin{array}{lll} \ \ 0 & \text{if} & t \leq t_1 \\\\\\ \displaystyle \sum_{t_i \leq t} \frac{d_i}{Y_i} & \text{if} & t \geq t_1 \end{array} \right. \\\\\\ \end{align*}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
library(KMsurv)
library(survival)
library(magrittr)

data <- c(3,2,0,1,5,3,5)
i <- c(1,0,1,1,0,1,1)
sur.hw <- survfit(Surv(data,i)~1)

plot(sur.hw,
     fun = "cumhaz",
     main="Nelson-Aalen estimator")

在有事件發生時，cummulative hazard 馬上往上跳；碰到censored的 $t_i$ 時，值不變。所以是個右連續的階梯函數。(每一段的實心點在左端點)

KM estimator

\begin{align*} \hat{S(t)} = \left \lbrace \begin{array}{lll} \ \ 1 & \text{if} & t < t_1 \\\ \displaystyle \prod_{t_i \leq t} (1- \frac{d_i}{Y_i}) & \text{if} & t \geq t_1 \end{array} \right. \\\ \end{align*}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
library(magrittr)

data <- c(3, 2, 0, 1, 5, 3, 5)
i <- c(1, 0, 1, 1, 0, 1, 1)
sur.hw <- survfit(Surv(data, i) ~ 1)

par(mfrow = c(1, 2))

# original plot
plot(sur.hw, 
     xlab = "t", 
     ylab = "estimated S(t)", 
     main = expression(paste("KM estimator: ", hat(S)(t)))
)


# right cinti plot
sur.hw %>% 
  {stepfun(.$time[.$n.event > 0], 
           c(1, .$surv[.$n.event > 0]), 
           right = TRUE)} %>%
  plot(verticals = FALSE,
       ylim = c(0, 1),
       xlim = c(0, max(sur.hw$time)),
       xlab = "t",
       ylab = "estimated S(t)", 
       main = expression(paste("KM estimator: ", hat(S)(t)))
       )

# plot upper confidence interval
sur.hw %>% 
  {stepfun(.$time[.$n.event > 0], 
           c(1, .$upper[.$n.event > 0]), 
           right = TRUE)} %>%
  plot(add = TRUE,
       lty = 2,
       verticals = FALSE,
       col = "red")

# plot lower confidence interval
sur.hw %>% 
  {stepfun(.$time[.$n.event > 0], 
           c(1, .$lower[.$n.event > 0]), 
           right = TRUE)} %>%
  plot(add = TRUE, 
       lty = 2, 
       verticals = FALSE,
       col = "red")

Proportional Hazard Model (Cox)

$Z_1:I(女)$
$Z_2:I(黑人)$
$Z_3:I(Z_1 \times Z_2)$
4 groups: black female, black male, white female, white male

Baseline

Baseline is $Z_1,\ Z_2,\ Z_3$ are $0$ . It means white male.

Hazard ratio

\begin{align*} \frac{h(t\ |\ \text{black male})}{h(t \ |\ \text{white male})} &= \frac{h(t|(Z_1,Z_2,Z_3)=(0,1,0))}{h(t|(Z_1,Z_2,Z_3)=(0,0,0))} \exp{(\beta_2)} \\\ & = \exp{(-0.08878)} \\\ & =0.92 \\\ \end{align*}

\begin{align*} \frac{h(t\ |\ \text{black male})}{h(t \ |\ \text{white female})} &= \frac{h(t|(Z_1,Z_2,Z_3)=(0,1,0))}{h(t|(Z_1,Z_2,Z_3)=(1,0,0))} \exp{(\beta_2 - \beta_1)} \\\ & = \exp{(-0.08878+0.24848)} \\\ & =1.17 \\\ \end{align*}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
library(KMsurv)
library(survival)

data(kidtran)
?kidtran

t<-kidtran$time
d<-kidtran$delta
z1<-ifelse(kidtran$gender==2,1,0)
z2<-ifelse(kidtran$race==2,1,0)
z3<-z1*z2
r1<-coxph(Surv(t,d)~z1+z2+z3)
r1
summary(r1)


# Wald test
#solve(vcov(r1)) is inverse of I(b)
b<-r1$coefficients
stat_wald <- t(as.matrix(b))%*%solve(vcov(r1))%*%(as.matrix(b))
1-pchisq(stat_wald, df=3)

# Likelihood ratio
LL_b<-r1$loglik
?coxph.object
r0 <- coxph(Surv(t,d)~1)
summary(r0)

stat_LR <- 2*(LL_b[2]-LL_b[1])
1-pchisq(stat_LR, df=3)

Outcome

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
Call:
coxph(formula = Surv(t, d) ~ z1 + z2 + z3)

       coef exp(coef) se(coef)      z      p
z1 -0.24848   0.77998  0.19854 -1.252 0.2107
z2 -0.08878   0.91504  0.29182 -0.304 0.7609
z3  0.74549   2.10747  0.42711  1.745 0.0809

Likelihood ratio test=4.37  on 3 df, p=0.2243
n= 863, number of events= 140 
Call:
coxph(formula = Surv(t, d) ~ z1 + z2 + z3)

  n= 863, number of events= 140 

       coef exp(coef) se(coef)      z Pr(>|z|)  
z1 -0.24848   0.77998  0.19854 -1.252   0.2107  
z2 -0.08878   0.91504  0.29182 -0.304   0.7609  
z3  0.74549   2.10747  0.42711  1.745   0.0809 .
---
Signif. codes:  0 ‘***’ 0.001 ‘**’ 0.01 ‘*’ 0.05 ‘.’ 0.1 ‘ ’ 1

   exp(coef) exp(-coef) lower .95 upper .95
z1     0.780     1.2821    0.5285     1.151
z2     0.915     1.0928    0.5165     1.621
z3     2.107     0.4745    0.9124     4.868

Concordance= 0.54  (se = 0.024 )
Likelihood ratio test= 4.37  on 3 df,   p=0.2
Wald test            = 4.64  on 3 df,   p=0.2
Score (logrank) test = 4.74  on 3 df,   p=0.2
          [,1]
[1,] 0.1997922
Call:  coxph(formula = Surv(t, d) ~ 1)

Null model
  log likelihood= -879.1705 
  n= 863 
[1] 0.2242548

Conditional Survival function

\begin{align*} \hat{S(t)}= \prod_{a \leq t_i \leq t} 1- \frac{d_i}{Y_i} ,\ t \geq a \\\\\\ \end{align*}

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
library(KMsurv)
library(survival)

data(channing)
?channing

attach(channing)
head(channing)
table(channing$gender)

# chan.m 是男生的資料; chan.f 是男生的資料
chan.m<-channing[channing$gender==1,] 
chan.f<-channing[channing$gender==2,] 
min(chan.m$ageentry)
min(chan.f$ageentry)


r.m<-survfit(Surv(ageentry-.1,age,death)~1,data=chan.m)
r.f<-survfit(Surv(ageentry-.1,age,death)~1,data=chan.f)
summary(r.m)
summary(r.f)

summary(r.m)

r.m2<-survfit(Surv(ageentry-.1,age,death)~1,data=chan.m[chan.m$ageentry>781,])

781/12 # age in years

summary(r.m2,68*12-.1)
p.68<-summary(r.m2,68*12-.1)$surv #Pr(X>=68*12)
summary(r.m2,80*12-.1)
p.80<-summary(r.m2,80*12-.1)$surv #Pr(X>=80*12)

### female
head(chan.f[order(chan.f$ageentry),])
head(chan.f[order(-chan.f$death,chan.f$age),])

r.f<-survfit(Surv(ageentry-.1,age,death)~1,data=chan.f)

summary(r.f)
summary(r.f,68*12-.1)
p.68<-summary(r.f,68*12-.1)$surv #Pr(X>=68*12)
summary(r.f,80*12-.1)
p.80<-summary(r.f,80*12-.1)$surv #Pr(X>=80*12)

Risk plot

1
2
3
4
5
# risk plot
plot(r.m$time,r.m$n.risk,lty=2,col=1,type="l",xlim=c(700,1200),ylim=c(0,180),xlab="age in months",ylab="number at risk")
points(r.f$time,r.f$n.risk,lty=2,col=2,type="l")
legend("topright",
       c("male","female"),lty=1,col=1:2)

Survival Function for Left-truncated data

分別觀察68與80歲以上的生存曲線，left-truncated at 68 and 80

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
##  plot
# 畫男性的生存曲線
plot(stepfun(r.m2$time / 12, c(1, r.m2$surv), right = TRUE), do.points = FALSE, 
     main = "Survival Curve - Male vs Female", xlab = "Age", 
     ylab = "Estimated Conditional Survival Probability", col = "blue")

# 添加參考線
abline(v = 68, lty = 2, col = "darkgrey")
abline(v = 80, lty = 2, col = "darkgrey")

# 添加男性的條件生存曲線，do.points = FALSE 避免空心點
lines(stepfun(c(80 * 12, r.m2$time[27:81]) / 12, c(1, r.m2$surv[26:81] / p.80), right = TRUE), 
      do.points = FALSE, col = "red")

# 添加女性的生存曲線，do.points = FALSE 避免空心點
lines(stepfun(c(r.f$time[-(1:3)] / 12), c(1, r.f$surv[-(1:3)] / p.68), right = TRUE), 
      do.points = FALSE, col = "green")
lines(stepfun(c(r.f$time[-(1:84)] / 12), c(1, r.f$surv[-(1:84)] / p.80), right = TRUE), 
      do.points = FALSE, col = "purple")

# 添加圖例
legend("topright",   
       lwd = 1,
       col = c("blue", "red", "green", "purple"),                               
       legend = c("Male - since 68", 
                  "Male - since 80", 
                  "Female - since 68", 
                  "Female - since 80")
       )

Reference

Type of censoring