class: left, top, title-slide
Survival Analysis
An Overview
Nicole Erler
Department of Biostatistics, Erasmus MC
n.erler@erasmusmc.nl
N_Erler
NErler
https://nerler.com
--- count: false layout: true <div class="my-footer"><span> <a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a>      <a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a>      <a href = "https://nerler.com"><i class="fas fa-globe-americas"></i> nerler.com</a> </span></div> --- count: false ## Time-to-event Data **Endpoint:** Time until an event of interest occurs. * common baseline time * event may not be observed within study duration <br> **Censoring:** * end of follow-up * drop-out ⇨ may be related to "survival" ⇨ **informative censoring** --- ## Events and Censoring <img src="index_files/figure-html/unnamed-chunk-1-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Interval Censoring <img src="index_files/figure-html/unnamed-chunk-2-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Survival Function Probability of "surviving" until time `\(t\)`: `$$S(t) = \Pr(T > t)\quad {\scriptsize\color{var(--nord3)}{\text{with } T: \text{true event time}}}$$` <img src="index_files/figure-html/unnamed-chunk-3-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Hazard function Instantaneous risk of an event at time `\(t\)`, given no event has occurred before `\(t\)` `$$h(t) = \lim_{\Delta t\rightarrow 0} = \frac{\Pr(t\leq T < t + \Delta t \mid T\geq t)}{\Delta t}$$` <img src="index_files/figure-html/unnamed-chunk-4-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Cumulative Hazard function Cumulative risk up to time `\(t\)` (expected number of events by time `\(t\)`) `$$H(t) = \int_0^t h(s) ds\qquad \text{and}\qquad H(t) = -\log S(t)$$` <img src="index_files/figure-html/unnamed-chunk-5-1.png" width="80%" style="display: block; margin: auto;" /> --- ## Hazard & Cumulative Hazard <img src="index_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" /> --- count: false ## Hazard & Cumulative Hazard <img src="index_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> --- count: false ## Hazard & Cumulative Hazard <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> --- count: false ## Hazard & Cumulative Hazard <img src="index_files/figure-html/unnamed-chunk-9-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Median Survival <img src="index_files/figure-html/unnamed-chunk-10-1.png" width="100%" style="display: block; margin: auto;" /> <div style = "position: fixed; top: 30%; right: 10%; width:450px;"> median survival: \(T_{0.5} = S^{-1}(0.5)\) <br> The time, by which half of the subjects will experience the event. </div> --- ## Mean Survival <img src="index_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" /> <div style = "position: fixed; top: 30%; right: 10%; width:450px;"> mean/average survival: \(\displaystyle \int_0^\infty S(t)dt\) The expected failure time. </div> --- ## Estimation **Idea:** Probability of being "alive" at time `\(t\)`: `$$\hat S(t) = \Pr(T_i > t) = \frac{\text{number of patients alive at \(t\)}}{n}$$` -- **Problem:** We do not observe the true event time for all subjects. .pull-left[ * observed event time<br> `\(T_i = \min(T_i^*, C_i)\)` * event indicator: - `\(\delta_i = 1\)` if event, - `\(\delta_i = 0\)` if censored ] .pull-right[ <br> <table class=" idtab" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:right;"> time </th> <th style="text-align:left;"> status </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 4.00 </td> <td style="text-align:left;"> event </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 45.00 </td> <td style="text-align:left;"> censored </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 10.12 </td> <td style="text-align:left;"> event </td> </tr> <tr> <td style="text-align:right;"> 4 </td> <td style="text-align:right;"> 19.25 </td> <td style="text-align:left;"> event </td> </tr> </tbody> </table> ] --- ## Estimation Divide the follow-up into intervals and condition on being alive at the beginning of the time interval: `\begin{eqnarray*} \Pr(T_i > 1) &=& \frac{\text{# patients alive at \(t = 1\)}}{n}\\ \Pr(T_i > 2 \mid T_i > 1) &=& \frac{\text{# patients alive at \(t = 2\)}} {\text{# patients alive at \(t = 1\)}} \end{eqnarray*}` -- ⇨ `\(\displaystyle \Pr(T_i > 2) = \Pr(T_i > 1) \times \Pr(T_i > 2 \mid T_i > 1)\)` -- <br> ⇨ `\(\displaystyle \Pr(T_i > t) = \Pr(T_i > 1) \times \Pr(T_i > 2 \mid T_i > 1) \times \ldots \times \Pr(T_i > t \mid T_i > t-1)\)` --- ## The Kaplan-Meier Estimator `$$\hat S_{KM}(t) = \prod_{i:t_i\leq t}\frac{r_i - d_i}{r_i}$$` with * `\(d_i\)`: number of events at `\(t_i\)` * `\(r_i\)`: number of patients "at risk" just before `\(t_i\)` ⇨ Implies the same survival probability for observed and censored patients. ??? To estimate the survival probability at time `\(t\)`, take all unique event or censoring times `\(t_i\)` up until `\(t\)`, and for each of these times, calculate the ratio of patients who survived the current interval, and the patients at risk at the beginning of this interval. --- ## The Kaplan-Meier Estimator <img src="index_files/figure-html/unnamed-chunk-13-1.png" width="100%" style="display: block; margin: auto;" /> --- ## The Kaplan-Meier Estimator <img src="index_files/figure-html/unnamed-chunk-14-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Comparing Survival Curves **Log-Rank Test** (time stratified Cochran-Mantel-Heanszel test) * non-parametric * based on comparison of observed and expected events (⇨ \\(\chi^2\\)) <div> <table class = "lrtab"> <tr> <th> </th> <th>Group 1</th> <th>Group 2</th> <th>Total</th> </tr> <tr> <td>event</td> <td>\(d_{1i}\)</td> <td>\(d_{2i}\)</td> <td>\(d_{i}\)</td> </tr> <tr> <td>no event</td> <td>\(r_{1i} - d_{1i}\)</td> <td>\(r_{2i} - d_{2i}\)</td> <td>\(r_{i} - d_{i}\)</td> </tr> <tr> <td>at risk</td> <td>\(r_{1i}\)</td> <td>\(r_{2i}\)</td> <td>\(r_{i}\)</td> </tr> </table> </div> -- <br> * requires **uninformative censoring** & **proportional hazards** * gives the same weight to all follow-up times --- ## Comparing Survival Curves **Alternatives:** .sgrey[(weighted versions of the log-rank test with different weights)] - Peto & Peto modification of the Gehan-Wilcoxon test (aka Breslow) - Tarone-Ware test - Flemington-Harrington test - ... -- <br> **Log-Rank test vs Breslow test:** - Breslow gives earlier event times more weight - Log-rank more powerful for proportional hazards - Breslow more powerful for non-proportional hazards For crossing survival curves, neither test is optimal!<br> ⇨ check PH assumption with plot of cumulative hazards --- ## Proportional Hazards Assumption <img src="index_files/figure-html/unnamed-chunk-16-1.png" width="100%" style="display: block; margin: auto;" /> --- ## KM + log-Rank to Identify Risk Factors? <img src="index_files/figure-html/unnamed-chunk-17-1.png" width="100%" style="display: block; margin: auto;" /> --- ## KM + log-Rank to Identify Risk Factors? * KM estimates **cannot be "adjusted"** for other covariates ⇨ univariable test * ok for comparing treatment arms in most **randomized trials** * likely **biased in observational studies** * Continuous covariates have to be categorized <i class = "far fa-frown"></i> -- <br> **Also:** Not appropriate for * time-varying covariates * competing risk settings -- <br> Better: **multivariable analyses** using * AFT models * proportional hazards models --- ## Accelerated Failure Time (AFT) models "Direct" **extension** of linear regression **for survival data**: `$$\log(T_i^*) = \beta_0 + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip} + \varepsilon_i$$` with `\(\varepsilon_i\sim\)` Normal, Student's `\(t\)`, Logistic, Extreme Value, ... -- <br> ⇨ direct (additive) effect of `\(\mathbf x_i\)` on `\(\log(T_i^*)\)` -- <br> **But:** Sensitive to specification of the distribution for `\(\varepsilon_i\)`.<br> ⇨ May not fit the distribution of the survival times. ??? * AFT models are direct extension of linear regression models to time-to-event data. * Differences: - response `\(T_i^*\)` is always positive ⇨ use `\(\log\)` - censoring ⇨ more sensitive to choice of error distribution --- ## Proportional Hazards Model (aka Cox Model) `$$\underset{\text{baseline hazard}}{h_i(t) = \underset{\uparrow}{h_0(t)} \exp(\beta_1} x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip})$$` -- **Hazard Ratio:** `$$\frac{h_{i'}(t)}{h_i(t)} = \frac{h_0(t) \exp(\beta_1 x_{i'1} + \beta_2 x_{i'2} + \ldots + \beta_p x_{i'p})}{h_0(t) \exp(\beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip})}$$` -- `$$x_{i'k} = x_{ik} + 1 \quad \Rightarrow \quad \frac{h_{i'}(t)}{h_i(t)} = \exp(\beta_k) \qquad \Rightarrow \quad h_{i'}(t) = h_i(t) \times\exp(\beta_k)$$` ⇨ **multiplicative effect** of `\(x_{ik}\)` on `\(h_i(t)\)` --- ## Proportional Hazards Model (aka Cox Model) The model can also be written as `$$\log h_i(t) = \log h_0(t) + \beta_1 x_{i1} + \beta_2 x_{i2} + \ldots + \beta_p x_{ip}$$` <br> ⇨ additive effect of `\(x_{ik}\)` on `\(\log h_i(t)\)` `$$x_{i'k} = x_{ik} + 1 \quad \Rightarrow \quad \log h_{i'}(t) = \log h_i(t) + \beta_k$$` -- <br> Interpretation of the effect on the log-hazard or hazard (i.e., hazard ratio) is **not intuitive**. ⇨ For interpretation: plot of the **expected survival** curves. --- ## Expected Survival Because `\(\displaystyle H(t) = \int_0^t h(s) ds = -\log S(t)\)`: `$$S(t) = \exp\underset{\;\;\;\;\;\;h_0(t)\exp(\mathbf X\boldsymbol\beta)}{\left\{ -\int_0^t \underbrace{h(s)}ds\right\}}$$` -- In the Cox model, `\(h_0(t)\)` is unspecified<br> ⇨ we need a separate estimator for the baseline hazard: $$ \hat S(t) = \exp\bigg\\{ -\underset{\substack{\text{Breslow}\\\\\text{estimator}}}{\underbrace{\hat H_0(t)}} \exp\underset{\substack{\text{from}\\\\\text{Cox model}}}{\mathbf (\mathbf X\underset{\uparrow}{\boldsymbol{\hat \beta}})\bigg\\}} $$ --- ## Expected Survival 1. Fit a model, e.g., `$$h_i(t) = h_0(t)\exp(\beta_1 \text{age}_i + \beta_2 \text{sex}_i + \beta_3\text{ascites}_i + \beta_4 \text{hepato}_i + \beta_5 \text{albumin}_i)$$` -- 2. Create data of hypothetical patients: <table class=" simpletable" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> age </th> <th style="text-align:left;"> sex </th> <th style="text-align:right;"> albumin </th> <th style="text-align:right;"> hepato </th> <th style="text-align:left;"> ascites </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 51 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:left;"> yes </td> </tr> <tr> <td style="text-align:right;"> 51 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 3.5 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:left;"> no </td> </tr> </tbody> </table> -- 3. Calculate expected survival for the hypothetical patient(s):<br> <div style="background-color: var(--nord0); padding: 25px; margin-top: 20px;"> Expected survival probability of a 51 year old female with albumin of 3.5 and no hepatomegaly with or without ascites. </div> --- ## Expected Survival <img src="index_files/figure-html/unnamed-chunk-20-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Expected Survival `\(\neq\)` Kaplan Meier <img src="index_files/figure-html/unnamed-chunk-21-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Survival & Cumulative Incidence Usually: `\(\displaystyle \text{cumulative incidence} = 1 - \text{(expected) survival}\)` <img src="index_files/figure-html/unnamed-chunk-22-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Competing Risks Often, multiple things can happen to a patient * transplant, * liver-related death * death from other cause * ... When having one event changes the risk of experiencing another event we have **competing risks**. ⇨ Censor at the time of the competing events? --- ## Competing Risks & Kaplan Meier `$$\color{grey}{\text{informally:}}\qquad\hat S_{KM}(t) = \prod\frac{\text{survivors of interval }i}{\text{at risk in interval }i}$$` * estimation includes only those still at risk * implicit extrapolation of the probability to censored subjects -- <p class = "smallbreak"> </p> **Example:** Event of interest is LTx<br> ⇨ KM assumes that patients under follow-up have the same "risk" of LTx as - patients that were lost to follow up - patients who died!?! .warning[ <i class="fas fa-exclamation-triangle" style="color: var(--nord11);"></i> Overestimation of the risk ⇨ overestimation of cumulative incidence <i class="fas fa-exclamation-triangle" style="color: var(--nord11);"></i> ] --- ## Competing Risks & Kaplan Meier <img src="index_files/figure-html/unnamed-chunk-23-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Competing Risks Two approaches for competing risks: **Cause specific hazards:**<br> Cox model in which alternative events are censored * HR remains valid! * cumulative incidence calculation changes (cause specific!) * survival = overall event free survival (not cause specific) * interaction of covariates with cause-specific cumulative incidences <br> **Fine-Grey model:** * directly models the cumulative incidence * can't handle time-varying covariates --- ## Competing Risk: Example <table class=" risktab" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>Risk of an event.</caption> <thead> <tr> <th style="text-align:left;"> gender </th> <th style="text-align:left;"> dead </th> <th style="text-align:left;"> LTx </th> <th style="text-align:left;"> alive </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> female </td> <td style="text-align:left;"> 10% </td> <td style="text-align:left;"> 27% </td> <td style="text-align:left;"> 63% </td> </tr> <tr> <td style="text-align:left;"> male </td> <td style="text-align:left;"> 30% </td> <td style="text-align:left;"> 30% </td> <td style="text-align:left;"> 40% </td> </tr> </tbody> </table> -- <br> .flex-grid[ .col[ <table class=" risktab" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>baseline</caption> <thead> <tr> <th style="text-align:left;"> gender </th> <th style="text-align:right;"> alive </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> female </td> <td style="text-align:right;"> 100 </td> </tr> <tr> <td style="text-align:left;"> male </td> <td style="text-align:right;"> 100 </td> </tr> </tbody> </table> ] .col[ <table class=" risktab" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>after 1 year</caption> <thead> <tr> <th style="text-align:left;"> gender </th> <th style="text-align:right;"> dead </th> <th style="text-align:right;"> LTx </th> <th style="text-align:right;"> alive </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> female </td> <td style="text-align:right;"> 10 </td> <td style="text-align:right;"> 27 </td> <td style="text-align:right;"> 63 </td> </tr> <tr> <td style="text-align:left;"> male </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 30 </td> <td style="text-align:right;"> 40 </td> </tr> </tbody> </table> ] .col[ <table class=" risktab" style="width: auto !important; margin-left: auto; margin-right: auto;"> <caption>after 2 years</caption> <thead> <tr> <th style="text-align:left;"> gender </th> <th style="text-align:right;"> dead </th> <th style="text-align:right;"> LTx </th> <th style="text-align:right;"> alive </th> </tr> </thead> <tbody> <tr> <td style="text-align:left;"> female </td> <td style="text-align:right;"> 16 </td> <td style="text-align:right;"> 45 </td> <td style="text-align:right;"> 39 </td> </tr> <tr> <td style="text-align:left;"> male </td> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 42 </td> <td style="text-align:right;"> 16 </td> </tr> </tbody> </table> ] ] --- ## Competing Risk: Example <img src="index_files/figure-html/unnamed-chunk-29-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Competing Risks: Example <img src="index_files/figure-html/unnamed-chunk-30-1.png" width="100%" style="display: block; margin: auto;" /> --- ## Time-dependent Covariates .pull-left[ Covariates may change throughout follow-up. Data in long format: * one row per measurement<br> ⇨ multiple rows per subject * start & stop time of interval ] .pull-right[ <table class=" idtab" style="width: auto !important; margin-left: auto; margin-right: auto;"> <thead> <tr> <th style="text-align:right;"> id </th> <th style="text-align:right;"> tstart </th> <th style="text-align:right;"> tstop </th> <th style="text-align:right;"> endpt </th> <th style="text-align:right;"> age </th> <th style="text-align:left;"> sex </th> <th style="text-align:right;"> bili </th> </tr> </thead> <tbody> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 192 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 58.77 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 14.5 </td> </tr> <tr> <td style="text-align:right;"> 1 </td> <td style="text-align:right;"> 192 </td> <td style="text-align:right;"> 400 </td> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 58.77 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 21.3 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 182 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 1.1 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 182 </td> <td style="text-align:right;"> 365 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 0.8 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 365 </td> <td style="text-align:right;"> 768 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 1.0 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 768 </td> <td style="text-align:right;"> 1790 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 1.9 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 1790 </td> <td style="text-align:right;"> 2151 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 2.6 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 2151 </td> <td style="text-align:right;"> 2515 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 3.6 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 2515 </td> <td style="text-align:right;"> 2882 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 4.2 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 2882 </td> <td style="text-align:right;"> 3226 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 3.6 </td> </tr> <tr> <td style="text-align:right;"> 2 </td> <td style="text-align:right;"> 3226 </td> <td style="text-align:right;"> 4500 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 56.45 </td> <td style="text-align:left;"> f </td> <td style="text-align:right;"> 4.6 </td> </tr> <tr> <td style="text-align:right;"> 3 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 176 </td> <td style="text-align:right;"> 0 </td> <td style="text-align:right;"> 70.07 </td> <td style="text-align:left;"> m </td> <td style="text-align:right;"> 1.4 </td> </tr> </tbody> </table> ] --- ## Time-dependent Cox Model The time-dependent Cox model assumes .flex-grid[ .col[ <div style = "width: 450px;"></div> * **constant** value during the time-interval * **exogenous** covariates (e.g., value of covariate indep. of whether event has occurred) <br> .warning[ <i class="fas fa-exclamation-triangle" style="color: var(--nord11);"></i> Often violated ⇨ bias ] ] .col[ <img src="index_files/figure-html/unnamed-chunk-32-1.png" width="100%" style="display: block; margin: auto;" /> ] ] --- ## Time-dependent Covariates: Joint Model .flex-grid[ .col[ <div style = "width: 450px;"></div> * model the underlying trajectory of the covariate * jointly with the survival model <br> ⇨ Joint Model for Longitudinal and Survival Data ] .col[ <img src="index_files/figure-html/unnamed-chunk-33-1.png" width="100%" style="display: block; margin: auto;" /> ] ] --- ## Other Things to Consider * Clustered Data .sgrey[(e.g., multi-center studies)] * Discrimination .sgrey[(potential issues with censoring)] * Stratified Cox .sgrey[(different baseline hazards for different groups)] * Missing Values .sgrey[(valid imputation in survival setting is not straightforward)] -- **Things to watch out for:** * interval censoring * competing risks * time-varying covariates **and:** * Consider Kaplan Meier curves to be descriptive, not inferential. * Expected survival curves to visualize/interpret the model results.