class: left, top, title-slide
Dealing with Incomplete Covariates in Survival Models
A Bayesian Approach
Workshop on Missing Data & Survival Analysis
June 1
st
, 2022
Nicole Erler
Department of Biostatistics
n.erler@erasmusmc.nl
N_Erler
NErler
https://nerler.com
--- count: false layout: true <div class="my-footer"><span> <a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a>      <a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a>      <a href = "https://nerler.com"><i class="fas fa-globe-americas"></i> nerler.com</a> </span></div> --- count: false ## Outline * Motivating Example * Standard Approach to Imputation * Associated Issues * Fully Bayesian Approach * Theory * Implementation in the <i class="fab fa-r-project"></i> Package JointAI * Extensions to More Complex Settings:<br> Multivariate Joint Model for Longitudinal and Survival Data * More Info on JointAI ??? To give you a bit of an **idea where we are headed** and what to expect, here a quick overview. I will start with a **motivating example** from my work with the department of Gastroenterology & Hepatology at the Erasmus Medical Center and then take a look at **what method would usually be applied** to handle the missing values in this type of setting, and what the **issues** are that we have with this approach and **why we need an alternative.** You will be familiar with much of this, but I think it is relevant to **get some context** and to get **familiar with my notation**. We'll then have a look at a **fully Bayesian alternative**. This approach is **implemented in the R package JointAI** and I'll show you briefly how we could **analyse the motivating example** using JointAI, and how this **approach and implementation works for even more complex settings**, specifically a multivariate joint model for longitudinal and survival data. And if there is still time, we can have a more in depth look into what else you can do with JointAI. **Slides on Website** --- ## Example: PSC Recurrence & Survival .footnote[ [<i class = "fas fa-file-alt"></i> Visseren, Erler et al. (2021)](https://doi.org/10.1111/tri.13925) ] 1549 patients received a liver transplant (LTx) to "cure" PSC * 288 patients had a 2<sup>nd</sup> LTx * 34 patients had a 3<sup>rd</sup> LTx * 8 patients had a 4<sup>th</sup> LTx **Covariate of interest:** recurrence of PSC ⇨ graft number & recurrence are time-varying covariates ??? PSC, short for **Primary Sclerosing Cholangitis**, is a liver disease. The mechanisms behind this disease are not yet fully known there is currently no cure but doctors can only try and treat the patient's symptoms. For advanced or complicated cases, **liver transplant is the only potentially curative treatment**, but in 10 - 30% of patients the **disease recurs** in the new organ. We had data from the European Liver Transplant Registry of **1549 patients who did receive a liver transplant for PSC**, and several of these patients actually received a second, third or even fourth transplant. The covariate of interest in our analysis was **recurrence of PSC** and in our analysis we wanted to include both **recurrence and the graft number as time-varying covariates.** - - - -- **Additionally**: 8 (time-constant) covariates * 3 incomplete (4% - 11% NA) * 88% complete cases ??? In addition, we had **8 additional, time-constant covariates**, of which **3 had missing values** in up to 11% of the cases. - - - -- <div style = "width: 35%; padding: 1.15em 2em; background-color: var(--nord0); position: absolute; bottom: 80px; right: 60px;"> <strong>⇨ time-dependent PH model</strong> </div> ??? So the analysis that would **typically** be performed if there were no missing values (or by ignoring cases with missing values) would be a **time-dependent Cox model**. --- ## Example: PSC Recurrence & Survival <img src="index_files/figure-html/patsurv-1.png" width="100%" /> ??? To make it a bit more clear how this time-varying structure looks like I tried to visualize it here. All patients receive a transplant because of their PCS and that is where the follow-up time starts. At some point, some of the patients have recurrence and we have a variable in the data that then changes from 0 to 1. And during follow-up patients may also receive additional transplants, either because of the PSC or due to other reasons, and we have a count variable that for every one starts with the value 1, and then increases to 2 and later to 3 whenever a patient has another transplant. We follow the patients until they die or until the end of follow-up which was the time when the data was extracted from the registry. --- ## How to Handle the Missing Values? Default: **Multiple Imputation** <img src = "figures/MI.png", height = 450, style = "margin: auto; display: block;"> ??? The **most common** way to handle missing values in covariates is **multiple imputation**. This refers to the general principle that we impute the missing values multiple times, fill them into the original, incomplete data to create multiple completed datasets that we can then analyse using standard methods. <div style = "color: grey;"> In the end, we need to pool the results from the separate analyses to create one overall result that takes into account both the uncertainty that we have about the parameter estimates from each of the analyses, measured by the standard errors, and the additional uncertainty due to the missing values, which is measured by the variation between the results from the different imputations. </div> --- ## Imputation Step <div style = "width: 75%; padding: 1.15em 2em; background-color: var(--nord0); margin: auto; display: block;"> Sample from the <strong>predictive distribution</strong> of the <span style = "font-weight: bold; color: var(--nord15);">missing values</span> given the <strong>observed values</strong>. </div> .footnote[ assuming <span style="font-weight: bold;">ignorable missingness</span> in a covariate `\(x\)` [<i class = "fas fa-book"></i> Rubin (2004)](https://doi.org/10.1198/000313004X6355)] ??? The critical, or **most interesting** part of this is the **imputation step**. When Donald Rubin developed multiple imputation in the 1970s, he realized that **imputed values have to be sampled from the predictive distribution of the missing values given the observed values**. This is an inherently **Bayesian idea**. The way I see it, multiple imputation is **based on Bayesian reasoning** but instead of working with the full posterior distribution it only uses a **hand full of samples** for practical purposes. - - - -- <br> `\begin{align*} p(\color{var(--nord15)}{\mathbf x} \mid \text{everything else}) = p(\color{var(--nord15)}{\mathbf x} \;\mid\; & \text{outcome},\\ & \text{other covariates},\\ &\text{parameters}) \end{align*}` ??? That means we need to **sample** the **imputed values for an incomplete covariate `\(x\)`** from a **distribution for `\(x\)` conditional on everything else**, where "everything else" includes the outcome/response, other covariates, and parameters. --- ## Imputation Step .flex-grid[ .col[ <table class="data-table"> <tr> <th></th> <th>\(\mathbf y\)</th> <th>\(\color{var(--nord15)}{\mathbf x_1}\)</th> <th>\(\mathbf x_2\)</th> <th>\(\mathbf x_3\)</th> </tr> <tr><td></td><td colspan = "4"; style = "padding: 0px;"><hr /></td><tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> </tr> <tr> <td class="rownr">\(i\)</td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> </tr> <tr> <td class = "rownr"></td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> </tr> </table> * `\(\mathbf y\)`: **response** variable * `\(\color{var(--nord15)}{\mathbf x_1}\)`: **incomplete** covariate * `\(\mathbf x_2\)`, `\(\mathbf x_3\)`: **complete** covariates <div style = "width: 450px;"></div> ] .col[ **Sampling from the predictive distribution:  ** {{content}} ] ] ??? Let's have a look at this in a **very simple setting** where we have a response `\(y\)`, an incomplete covariate `\(x_1\)` and two completely observed covariates `\(x_2\)` and `\(x_3\)`. - - - - - -- Define a model for the incomplete variable `\(\color{var(--nord15)}{\mathbf x_1}\)`, e.g. `$$\color{var(--nord15)}{\mathbf x_1} = \beta_0 + \beta_1 \mathbf y + \beta_2 \mathbf x_2 + \beta_3 \mathbf x_3 + \varepsilon$$` to get estimates `\(\boldsymbol{\hat\beta}\)` (and `\(\hat\sigma\)`) {{content}} ??? We can **think of sampling from the predictive distribution** of `\(x_1\)` as **fitting a model for `\(x_1\)`** that has the other three variables in the linear predictor, to the cases **where `\(x_1\)` is observed**. From this model, we can **get estimates** for the regression coefficients `\(\beta\)` and the standard deviation of the residuals `\(\sigma\)`. - - - -- <br> Sample `\(\color{var(--nord15)}{\hat{x}_{i1}}\)` from the predictive distribution `$$p(\color{var(--nord15)}{x_{i1}} \mid y_i, x_{i2}, x_{i3}, \boldsymbol{\hat\beta}, \hat\sigma)$$` ??? And using these parameter estimates we can then **calculate the expected value** of `\(x_1\)` for those cases where `\(x_1\)` is missing. The **distributional assumption** that we made when we **chose the type of model** to use, together with the **expected value** and estimate of the **standard deviation of the error terms** specifies this predictive distribution that we need to sample imputations for the missing values in `\(x_1\)` from. --- ## Multiple Imputation using FCS (MICE) .flex-grid[ .col[ <div style = "text-align: center; margin-bottom: 25px;"> <strong>Multivariate<br>Missingness</strong></div> <table class="data-table"> <tr> <th></th> <th>\(\mathbf y\)</th> <th>\(\color{var(--nord15)}{\mathbf x_1}\)</th> <th>\(\color{var(--nord15)}{\mathbf x_2}\)</th> <th>\(\color{var(--nord15)}{\mathbf x_3}\)</th> <th>\(\ldots\)</th> </tr> <tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr">\(i\)</td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class = "rownr"></td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td></td> </tr> </table> ] .col[ <div style = "width: 700px;"> ⇨ \(p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \text{everything else})\) is <strong>multivariate</strong>. </div> {{content}} ] ] ??? In practice, we usually do not have missing values in just a single variable, but in **multiple variables**, and the **predictive distribution** is then **multivariate**. When the incomplete **variables are of mixed type**, meaning continuous and categorical, this multivariate distribution does **not have a closed form** and things get a bit more complicated. - - - -- <br> **Most common approach:**<br> <span style = "font-weight: bold;">MICE</span> <span style = "color: var(--lgrey);">(multivariate imputation by chained equations)</span><br> <span style = "font-weight: bold;">FCS</span> <span style = "color: var(--lgrey);">(fully conditional specification), i.e.,</span> <div> \begin{alignat}{10} \color{var(--nord15)}{\mathbf x_1} &= \beta_0 &+& \beta_1 \mathbf y &+& \beta_2 \color{var(--nord15)}{\mathbf x_2} &+& \beta_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_2} &= \alpha_0 &+& \alpha_1 \mathbf y &+& \alpha_2 \color{var(--nord15)}{\mathbf x_1} &+& \alpha_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_3} &= \theta_0 &+& \theta_1 \mathbf y &+& \theta_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta_3 \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon \end{alignat} </div> <p class = "smallbreak"> </p> * iterative * based on the idea of the Gibbs sampler ??? The most common approach to solve this issue is MICE, short for **multivariate imputation by chained equations**, also called **fully conditional specification**. The principle is a **direct extension** to what we've seen on the previous slide: We impute missing values using models that have all other data in their linear predictor. Because in these imputation **models condition on unobserved values**, we have to use an **iterative algorithm** that starts by **randomly drawing initial values** from the observed part of the data, and then **cycles through the incomplete variables** and imputes one at a time. This algorithm is based on the **idea of the Gibbs sampler** which creates a **sample from a multivariate distribution** by iteratively sampling from a **set of univariate full-conditional distributions**. --- ## Time-to-Event Outcomes .pull-left[ (Simple) **Proportional Hazards Model:** `$$h(t) = h_0(t) \exp\left(\color{var(--nord15)}{\mathbf X} \boldsymbol\beta\right)$$` <br> <ul style = "color: var(--nord4);"> <li>\(h(t)\): hazard function</li> <li>\(h_0(t)\): baseline hazard</li> <li>\(\color{var(--nord15)}{\mathbf X} = (\mathbf x_1, \mathbf x_2, \ldots)\): (in)complete covariates</li> <br><br> <li>\(\mathbf T\): observed event / censoring time</li> <li>\(\boldsymbol\delta\): event indicator</li> </ul> ] .pull-right[ <br> <table class="data-table"> <tr> <th></th> <th>\(\mathbf T\)</th> <th>\(\boldsymbol\delta\)</th> <th>\(\color{var(--nord15)}{\mathbf x_1}\)</th> <th>\(\color{var(--nord15)}{\mathbf x_2}\)</th> <th>\(\ldots\)</th> </tr> <tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr">\(i\)</td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class = "rownr"></td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td></td> </tr> </table> ] ??? How does this look like when we have **survival data** like in our motivating example. When we use the **Cox model**, we model the **hazard function**, and here I consider only the **simple setting** in which we do not have time-varying covariates. `\(\mathbf X\)` refers to the **design matrix** which contains completely observed and incomplete covariates. But the **data** that we use to fit such a model, **does not contain the hazard** as a response variable. Instead, we have the **time of the event** or censoring, and the **event indicator `\(\delta\)`**. --- ## Time-to-Event Outcomes <br> Using FCS (naively), the **imputation model** for `\(\color{var(--nord15)}{\mathbf x_1}\)` would be `$$\color{var(--nord15)}{\mathbf x_1} = \theta_0 + \theta_1 \mathbf T + \theta_2 \boldsymbol\delta + \theta_3 \color{var(--nord15)}{\mathbf x_2} + \theta_4 \color{var(--nord15)}{\mathbf x_3} + \ldots,$$` i.e., assume a **linear association** between `\(\color{var(--nord15)}{\mathbf x_1}\)` and `\(\mathbf T\)` and `\(\boldsymbol\delta\)`. .footnote[ [<i class = "fas fa-file-alt"></i> White & Royston (2009)](https://doi.org/10.1002/sim.3618) [<i class = "fas fa-file-alt"></i> Bartlett et al. (2015)](https://doi.org/10.1177/0962280214521348) ] ??? When we now use the **MICE algorithm** on survival data, the following happens: We specify (maybe without knowing it, because the software does it for us) **imputation models** for the incomplete covariates that look like this: they **contain the event time and event indicator and all other covariates in its linear predictor. At first glance, this **seems OK**, because we have specified a full conditional distribution. But this model **assumes** that there is a **linear association** between the incomplete covariate `\(x_1\)` and the event time and event indicator. - - - -- <br> This is **inconsistent** with the log-likelihood of the **proportional hazards model**: `$$p(\mathbf T, \boldsymbol \delta \mid \color{var(--nord15)}{\mathbf X}, \boldsymbol\beta) = \boldsymbol\delta \left(\log h_0(T) + \color{var(--nord15)}{\mathbf X} \boldsymbol\beta\right) - \int_0^T h_0(s)\exp\left( \color{var(--nord15)}{\mathbf X} \boldsymbol\beta\right)ds$$` ??? But the **log-likelihood of the proportional hazards model** makes a **different assumption**. Our **analysis model implies** that there is a **non-linear association** between covariates and the event time and indicator. So we have specified a full-conditional distribution as our imputation model, but **not the correct full-conditional distribution**. The **implication** of this becomes more clear when we look at a **simpler example** that can be visualized more easily. --- ## A Simple Example .pull-left[ **Implied Assumption:**<br> <span>Linear association</span> between `\(\color{var(--nord15)}{\mathbf x_1}\)` and `\(\mathbf y\)`: `$$\color{var(--nord15)}{\mathbf x_1} = \theta_0 + \bbox[#3B4252, 2pt]{\theta_1 \mathbf y} + \theta_2 \mathbf x_2 + \theta_3 \mathbf x_3 + \ldots$$` <img src="index_files/figure-html/unnamed-chunk-2-1.png" width="100%" /> ] ??? Here I go back to the very simple setting with a **response `\(y\)`** and missing values in the **covariate `\(x_1\)`**, and assume both are continuous and can be modelled using linear regression. So the **imputation model for `\(x_1\)`** is a linear regression that has `\(y\)` in its linear predictor, and this **implies a linear association** between `\(x_1\)` and `\(y\)`. - - - -- .pull-right[ <br> But what if `$$\mathbf y = \beta_0 + \bbox[#3B4252, 2pt]{\beta_1 \color{var(--nord15)}{\mathbf x_1} + \beta_2 \color{var(--nord15)}{\mathbf x_1}^2} + \beta_3 \mathbf x_2 + \ldots$$` <img src="index_files/figure-html/unnamed-chunk-3-1.png" width="100%" /> ] ??? Assume that we do in fact have a non-linear association and that our analysis model includes also a **quadratic term of `\(x_1\)`.** --- ## Non-linear Associations .pull-left[ * <span style="font-weight: bold; color:var(--nord4);">true association</span>: non-linear * <span style="font-weight: bold; color:var(--nord7);">imputation assumption</span>: linear ] .pull-right[ ] <img src="index_files/figure-html/unnamed-chunk-4-1.png" width="80%" /> ??? Because we assumed a linear association in the imputation model, all imputed values, here indicated by the greenish diamonds, will follow this linear association, and the observed data show the true non-linear association. --- count: false class: animated, fadeIn ## Non-linear Associations .pull-left[ * <span style="font-weight: bold; color:var(--nord4);">true association</span>: non-linear * <span style="font-weight: bold; color:var(--nord7);">imputation assumption</span>: linear ] .pull-right[ <span style="font-size: 56pt; position: relative; right: 110px; bottom: 20px;">} ⇨</span> <span style = "color: var(--nord11); font-size: 1.2rem; font-weight: bold; position: relative; bottom: 30px; right: 100px;"> bias!</span> ] <img src="index_files/figure-html/unnamed-chunk-5-1.png" width="80%" /> ??? When we now analyse this imputed data we will of course get very different results than what we would have gotten had the data been complete, even if we use a model that allows for the correct non-linear association. The values that have been imputed under the wrong assumption distort the association and introduce bias. And the same that we have visualized here for a quadratic effect will also happen when we use imputation with the MICE algorithm naively with a time-to-event outcome. To avoid this bias we would need **imputation models that also assume a non-linear association**. But it is not going to be straightforward to get the correct shape of this association because usually we cannot just get the **inverse of the functional form** used in the analysis model. --- ## Example: PSC Recurrence & Survival .footnote[ [<i class = "fas fa-file-alt"></i> Erler et al. (2016)](https://doi.org/10.1002/sim.6944) ] .pull-left[ Recurrence of PSC and graft number are **time-varying variables**: <ul class = "arrowlist"> <li>the data is in "long" format</li> <li>multiple rows refer to the same patient</li> </ul> {{content}} ] .pull-right[ <div style = "width: 300px;"> </div> <table class="data-table"> <tr> <th>id</th> <th>\(\mathbf t_{start}\)</th> <th>\(\mathbf t_{stop}\)</th> <th>\(\boldsymbol\delta\)</th> <th>graft</th> <th>rPSC</th> <th>\(\color{var(--nord15)}{\mathbf x_1}\)</th> <th>\(\color{var(--nord15)}{\mathbf x_2}\)</th> <th>\(\ldots\)</th> </tr> <tr><td colspan = "9"; style = "padding: 0px;"><hr /></td><tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr class="hlgt-row"> <td class="rownr">\(i\)</td> <td>0</td> <td>2.7</td> <td>0</td> <td>1</td> <td>no</td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr class = "hlgt-row"> <td class="rownr">\(i\)</td> <td>2.7</td> <td>3.2</td> <td>0</td> <td>1</td> <td>yes</td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr class = "hlgt-row"> <td class="rownr">\(i\)</td> <td>3.2</td> <td>6.5</td> <td>1</td> <td>2</td> <td>yes</td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td>\(\ldots\)</td> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td>\(\ldots\)</td> </tr> <tr> <td class = "rownr"></td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\ddots\)</td> </tr> </table> ] ??? And in the motivating example, there is an additional difficulty for imputation. We have **two time-varying variables**, recurrence of PSC and the graft number. This means that we will usually have our **data in long format**, so that there are multiple rows per patient, with the number of rows depending on whether the patient experienced recurrence and how many transplants he or she got. Depending on the software implementation, we then have two time variables, I called them `\(t_{start}\)` and `\(t_{stop}\)` here, that indicate the beginning and the end of the time interval in which the patient had the values of the covariates given in that row of the data. - - - -- Imputed values of baseline covariates `\(\color{var(--nord15)}{\mathbf x_1}\)`, `\(\color{var(--nord15)}{\mathbf x_2}\)`, ... should be **constant across time/rows**. **Not** all rows are **independent**<br> ⇨ violates of model assumption. ??? In our motivating example, the incomplete variables were all time-constant covariates, so the values of these variables should be constant cross the different rows of the data belonging to the same patient. But if we would apply the MICE algorithm to a dataset in long format, the imputed values would generally differ across rows. So we would obtain inconsistent imputed values and introduce bias. And what we also need to take into account is that rows belonging to the same patient are not independent. But we apply imputation models to this data which do assume that all observations are independent. Violation of this assumption will also result in bias. --- ## Non-linear Associations & Multi-level In settings with **non-linear associations** and **multi-level data** the **correct predictive distribution** $$ p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \text{everything else})$$ may not have a closed form. <br> <div style="width: 85%; padding: 1.15em 2em; background-color: var(--nord0); margin: auto; display: block; text-align: center;"> <strong><span style="font-size: 1.5rem;">⇨</span> We cannot easily specify the correct imputation model directly.</strong> </div> ??? So, in summary, we need to sample imputed values from the predictive distribution of the missing values given everything else, but whenever we have a non-linear association between the response and incomplete covariates, as is the case for time-to-event outcomes, and when we have multi-level data, it is not straightforward to specify this distribution directly. When we use MICE, however, this is exactly what we are doing. By using standard regression models for imputation, we specify **a** distribution of the incomplete variables given everything else, but **not the correct distribution**. --- ## Getting the Correct Distribution ??? The interesting question now is: How do we get the correct predictive distribution? - - - -- .flex-grid[ .col[ * We need `\(\;p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \mathbf y, \ldots)\)` * We know `\(\;p(\mathbf y \mid \color{var(--nord15)}{\mathbf X_{mis}}, \ldots)\)` ] .col[ <div style = "color: var(--nord3);"> <ul> <li>\(\mathbf x\): incomplete covariate(s)</li> <li>\(\mathbf y\): outcome(s)</li> <li>\(\ldots\): everything else </li> </ul> </div> ] ] ??? Remember: what we need is the distribution of an incomplete variable `\(x\)` conditional on the response and everything else. But what we do know is the distribution of the response `\(y\)` conditional on the covariates and everything else, because that's our analysis model. And here Bayes comes to the rescue. - - - -- **Bayes Theorem:** `\begin{align} p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \mathbf y, \ldots) &= \frac{p(\mathbf y \mid \color{var(--nord15)}{\mathbf X_{mis}}, \ldots)\; p(\color{var(--nord15)}{\mathbf X_{mis}},\ldots)}{p\left(\mathbf y, \ldots\right)}\\ &\propto p(\mathbf y \mid \color{var(--nord15)}{\mathbf X_{mis}}, \ldots)\; p(\color{var(--nord15)}{\mathbf X_{mis}},\ldots) = \underset{\text{joint distribution}}{\underbrace{p(\mathbf y, \color{var(--nord15)}{\mathbf X_{mis}}, \ldots)}} \end{align}` ??? Because Bayes theorem allows us to convert a conditional distribution into the reversed conditional. And using this theorem we derive that the distribution of interest, the distribution of the incomplete covariate conditional on everything else is proportional to the distribution of the **response conditional on the covariates**, and the **joined distribution of the covariates and everything else**. The product of these two terms describes the full joint distribution of everything, all the data, response, covariates, observed or missing, and parameters. --- ## Factorization of the Joint Distribution `$$p(\mathbf y, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta) = \underset{\substack{\text{analysis}\\\text{model}}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)}}\; \bbox[#2E3440, 5pt]{\underset{\substack{\text{covariate}\\\text{model(s)}}}{\underbrace{p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta)}}}\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$` ??? And in general, we can write this joint distribution as the product of * the analysis model, which is the distribution of the response given the covariates and parameters, * a model for the covariates conditional on parameters, * and the prior distribution for the parameters. **We've now moved to the Bayesian framework** The analysis model and the priors are the same as when we perform a Bayesian analysis in a complete data setting. But because we have missing values in the covariates, we need to specify a distribution for the covariates as well. - - - -- .flex-grid[ .col[ **Covariate Models** * multiple variables * mixed type (continuous, categorical) * multi-level ] .col[ <br><br> ⇨ **no closed-form** distribution ] ] ??? And this can be a bit tricky. Generally, we will have multiple incomplete covariates so we need a multivariate distribution, and the variables may be of mixed type, so, continuous and categorical and that means that there is not going to be a closed form for this multivariate distribution. On top of that, we could have covariates on different levels in a multi-level setting. - - - -- <div style="width: 90%; padding: 1em; background-color: var(--nord0); margin: auto; display: block; text-align: center;"> ⇨ specification as a <strong>sequence of</strong> (univariate) <strong>conditional distributions</strong> </div> ??? But we can use the same principle we used to divide the joint distribution into analysis model, covariate model and priors, to divide the joint distribution of the covariates into a bunch of univariate conditional distributions that are a lot easier to specify. --- name: covariatemodels ## Covariate Models Example: `\(\qquad p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta) = p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \color{var(--nord15)}{\mathbf x_3(t)}, \mathbf x_4(t) \mid \boldsymbol\theta)\)` .flex-grid[ .col[ * `\(\color{var(--nord15)}{\mathbf x_1}\)`: time-constant, incomplete * `\(\mathbf x_2\)`: time-constant, complete ] .col[ * `\(\color{var(--nord15)}{\mathbf x_3(t)}\)`: time-varying, incomplete * `\(\mathbf x_4(t)\)`: time-varying, complete ] ] ??? As an example, let's assume that we have 4 covariates. `\(x_1\)` and `\(x_2\)` are time-constant and `\(x_3\)` and `\(x_4\)` are time-varying, and `\(x_1\)` and `\(x_3\)` have missing values while `\(x_2\)` and `\(x_4\)` are completely observed. - - - -- `\begin{align} p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \color{var(--nord15)}{\mathbf x_3(t)}, \mathbf x_4(t) \mid \boldsymbol\theta) =\, & p(\color{var(--nord15)}{\mathbf x_3(t)} \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_4(t), \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\ & p(\mathbf x_4(t) \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\ & p(\color{var(--nord15)}{\mathbf x_1} \mid \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLM}}\\ & \color{var(--nord3)}{p(\mathbf x_2 \mid \boldsymbol\theta)} & \color{var(--nord3)}{\small\text{(can be omitted)}} \end{align}` ??? I can then write this joint distribution of the 4 variables as the product of * the distribution of `\(x_3\)` conditional on `\(x_1\)`, `\(x_2\)` and `\(x_4\)`, * the distribution of `\(x_4\)` conditional on `\(x_1\)` and `\(x_2\)`, and * the distribution of `\(x_1\)` conditional on `\(x_2\)`. Because they are univariate, I can easily find appropriate ways to specify each of them, for example using generalized linear mixed models for the distributions of the time-varying variables `\(x_4\)` and `\(x_3\)` and a generalized linear model for the distribution of `\(x_1\)`. I have chosen this particular sequence of models on purpose. When I start with the distributions of the time-varying covariates they do not appear in the linear predictor of the subsequent models for the time-constant covariates. Had I done it the other way around and started with a model for `\(x_1\)`, I would have to condition on `\(x_3\)` and `\(x_4\)` and it would not be straightforward to specify a model for that because this would mean to specify a model for a time-constant, baseline, covariate, with a time-varying, longitudinal variable in the linear predictor, which doesn't really work. I could either start with `\(x_4\)` or `\(x_3\)`, both would work. My reasoning here is that mis-specification of models for incomplete variables probably has more impact on the results than mis-specification of models for complete variables and by including more covariates in the model for the incomplete `\(x_4\)` I make it more likely that the model fits the data well. Another question that you might have is why I did specify a model for the completely observed `\(x_4\)`, but not for the completely observed `\(x_2\)`. This is required to model the dependence between the variables. The model for `\(x_4\)` is what describes the association between `\(x_4\)` and `\(x_1\)`, because `\(x_4\)` does not appear in the linear predictor of the model for `\(x_1\)`. If I would leave out this model, that would imply that `\(x_1\)` is independent of `\(x_4\)`. But because `\(x_2\)` comes last in the sequence, it appears in all the linear predictors of the other models, and the model for `\(x_2\)` does not specify any relevant information. <!-- [Jump to Hep C example default values slide](#hepcdefaults) --> - - - -- <div style="width: 55%; padding: 0.9em; background-color: var(--nord0); margin: auto; display: block; text-align: center;"> covariate models \(\neq\) imputation models! </div> ??? One important thing to realize here is that these models that we specify for the covariates are not actually the imputation models. They are not the models describing the distribution from which imputed values are being sampled. --- ## Imputation Models <br> `$$\bbox[#2E3440, 5pt]{\underset{\text{imputation model}}{\underbrace{p(\color{var(--nord15)}{\mathbf x_1} \mid \text{everything else})}}} \propto \underset{\substack{\text{joint}\\\text{distribution}}}{\underbrace{p(\mathbf y, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)}} = \underset{\substack{\text{analysis}\\\text{model}}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)}}\; \underset{\substack{\text{covariate}\\\text{model(s)}}}{\underbrace{p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta)}}\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$` ??? The imputation model for an incomplete covariate `\(x_1\)`, for example, can be derived from the joint distribution of everything. It is proportional to this joint distribution. And this joint distribution is specified as the product of the analysis model, the joint distribution of the covariates, which we specified in a sequential way on the previous slide, and prior distributions. - - - -- <br> **Imputation Models** typically do not have a closed form. ⇨ Estimation via **MCMC sampling**, with the help of * Gibbs sampler * Metropolis-Hastings (or the like) ??? Because the joint distribution does not generally have a closed form, this is also the case for the imputation models. So we cannot derive the posterior distribution of the missing values analytically. This is the same as for the posterior distribution of the parameters from a Bayesian model, unless the model is extremely simple and we use conjugate priors. Therefore, in practice, we use MCMC sampling to obtain the posterior distribution. And because we usually have multivariate posterior distributions, because we have multiple unknowns (this includes the parameters as well as missing values) this often involves the Gibbs sampler, and because the full conditional distributions in the Gibbs sampler do not have closed forms, we will typically need to use something like the Metropolis-Hastings algorithm. So, here in the Bayesian approach we use Gibbs sampling, but also MICE is based on the Gibbs sampler. Why does one work and the other not? --- ## Fully Bayesian vs MICE .pull-left[ **Fully Bayesian** <img src = "figures/Gibbs.png", height = 350 style = "position: block; margins: auto;"> ] .pull-right[ **MICE / FCS** <img src = "figures/MICE.png", height = 350; style = "position: block; margins: auto;"> ] ??? The difference is that in the fully Bayesian approach we do actually specify the joint distribution. We specify it using a sequence of conditional models. And the full conditional distributions that are then used to create the posterior sample are derived from this joint distribution. When we perform multiple imputation using MICE, we do it the other way around. We specify imputation models (or the software specifies those models for us) and even though these imputation models may even describe full conditional models, there is no guarantee that a joint distribution exists that has the distributions described by the imputation models as its full conditionals. For example, in the setting where an imputation model contradicts the analysis model or another imputation model, as was the case when we had an analysis model that assumes a non-linear association and an imputation model that assumes a linear association, there is no joint distribution that has these two models as full-conditionals. And when that is the case, when this joint distribution does not exist, MICE will not give us valid imputations. --- ## Advantages of the Bayesian Approach `$$p(\mathbf y, \mathbf x, \boldsymbol\theta) = \bbox[#2E3440, 5pt]{\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf x, \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}})}}}\; p(\mathbf x \mid \boldsymbol\theta_x)\; p(\color{var(--nord14}{\boldsymbol\theta_{y\mid x}}, \boldsymbol\theta_x)\qquad \color{var(--nord3)}{\text{with } \boldsymbol\theta = (\boldsymbol\theta_{y\mid x}, \boldsymbol\theta_x)}$$` * specification of the joint distribution<br> ⇨ assures **compatibility** of imputation models ??? This is one of the advantages of the Bayesian approach. Because we specify the joint distribution, we can be sure that all imputation models are compatible with each other. - - - -- * use of the analysis model in the specification of the joint distribution<br> ⇨ **parameters of interest** `\(\color{var(--nord14)}{\boldsymbol\theta_{y\mid x}}\)` are estimated directly<br> ⇨ **non-linear associations** are taken into account<br> ⇨ assures **congeniality** ??? And because the analysis model is part of the specification of this joint distribution, we can be sure that the imputation models are also compatible with the analysis model, even when the analysis model involves non-linear associations with incomplete covariates. In addition, the parameters of interest, so, the parameters of the analysis model, are directly estimated. We do not have separate steps for imputation and analysis, but everything happens simultaneously in the same procedure. - - - -- * response is not in any linear predictor<br> ⇨ no problem to use **complex outcomes** ??? Because we can specify the joint distribution so that one of the conditional models is the analysis model, we can usually make sure that the outcome does not need to appear in the linear predictor of models for covariates. This makes it possible to use this approach in setting where we have complex outcomes that could not be easily included into the linear predictor of some or all of the covariates, for example when we have a repeatedly measured response and missing values in baseline covariates. --- ## In practice: <i class="fab fa-r-project"></i> Package JointAI .footnote[ [<i class = "fas fa-file-alt"></i> Erler et al. (2021)](https://doi.org/10.18637/jss.v100.i20) [<i class = "fas fa-globe-americas"></i> nerler.github.io/JointAI](https://nerler.github.io/JointAI) ] <div class = "container"> <div class = "box" style = "margin-left: 20px;"> <div class = "box-row"> <div class = "box-cell" style="width:180px;"> <strong>Specification</strong><br>of the joint<br> distribution: </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>User</strong></div> <div class = "box-cell" style = "background: var(--nord1); border: 2px solid var(--nord8); ">analysis<br>model</div> </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div> <div class = "box-row"> <div class = "box-cell" style = "background: var(--nord1);">covariate<br>models</div> <div class = "box-cell" style = "background: var(--nord1)">priors</div> </div></div> </div></div></div> <a href = "https://nerler.github.io/JointAI/"> <img src = "figures/JointAI.png" style = "position: absolute; right: 62px; top: 117px; width: 165px;"></a> ??? We've now seen that this Bayesian approach is theoretically valid (at least as long as we have missing at random and our models fit the data well enough) and has certain advantages over using MICE, but how can we actually use it in practice? I've implemented this approach in the R package JointAI, short for Joint Analysis and Imputation. Using the R package, the user only has to specify the the analysis model of interest, and the software specifies the models for the covariates involved in this analysis model and sets prior distributions. As when using the mice package in R, the types of models used for the covariates are based on how the data is coded, whether a variable is continuous or a factor. Of course the default model types used may not always be appropriate and there are options to adapt these default settings. - - - -- <br> <div class = "container"> <div class = "box" style = "margin-left: 20px;"> <div class = "box-row"> <div class = "box-cell" style="width:180px;"> <strong>Estimation:</strong> </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div> <div class = "box-cell" style = "background: var(--nord1);">pre-processing</div> </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>rjags / JAGS</strong></div> <div class = "box-cell" style = "background: var(--nord1);">MCMC sampling</div> </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div> <div class = "box-cell" style = "background: var(--nord1);">post-processing</div> </div> </div></div></div> ??? The estimation of the model is done with the help of JAGS, an external and freely available software that does Gibbs sampling. JointAI does some pre-processing, takes the data and the model formula, figures out which additional models need to be specified and creates all the information that needs to be provided to JAGS. The R package rjags is used as an interface between R and JAGS and the MCMC sampling is then performed in JAGS. JointAI then does some post-processing on the MCMC samples so that you will get an output that resembles what you would get from fitting a model using the standard model types for complete data in R, like `lm()` or `glm()` or the `coxph()` from the survival package. --- ## Example: PSC Recurrence & Survival (Time-Dependent) Proportional Hazards Model in **JointAI**: * B-splines to model the baseline hazard * Random effect notation to "group" rows of the same patient ??? As simple demonstration, we can have a look at how we would need to specify the time-dependent Cox model that we wanted to use for the analysis of our motivating example of recurrence of PSC. One difference between the model used in JointAI and the model fitted using `coxph()` from the survival package is that in the Bayesian approach we cannot leave the baseline hazard unspecified. Instead, we model it flexibly using B-splines. When we have time-varying covariates and need to take into account that some rows in the data belong to the same subject, we can do this in **JointAI** by specifying a random intercept for the subject level. - - - -- ```r library("JointAI") fmla_patsurv <- Surv(pat_time, pat_event) ~ recip_sex + graft_type + ... + graftnr * recurrence + `(1 | ID_pat)` patsurv <- coxph_imp(fmla_patsurv, data = PSC_data, `timevar = "futime"`, `n.iter = 1000`, `n.chains = 8`) ``` * `timevar`: links covariate measurement time to event time * `n.iter`: number of iterations for the MCMC sampling * `n.chains`: number of MCMC chains ??? The model specification in **JointAI** follows the specification of the standard modelling functions in R as closely as possible. To make the syntax a bit more clear, I have split the specification and first specify the model formula and then the model. In the model formula we use a survival object to specify the outcome, and use here the event or censoring time and the event indicator, so not a start and stop time for the interval as we would have to use in the survival package. This is because it was more convenient this way to implement it. Baseline and time-varying covariates are specified in the same way. I've here also included an interaction between the two time-varying covariates graft number and recurrence of PSC. To indicate the ID variable that identifies rows belonging to the same patient we use a term like a random intercept specification in the **lme4** package. To run the model we use the function `coxph_imp()`. Besides the model formula, we also need to specify the number of iterations of the MCMC sampler. In addition, I also specified the number of MCMC chains that should be run to 8. The default would be 3 chains. Because we have time-dependent covariates we also have to specify the argument `timevar` which is set to the name of the variable in the data that contains the measurement times of the longitudinal variables. --- ## Example: PSC Recurrence & Survival * **proportional hazards model**:<br> `pat_time, pat_event` | `graftnr`, `recurrence`, `ischemic_hrs`, ..., `year` ??? When we fit this model, **JointAI** specifies the following sequence of models: * The proportional hazards model using the event time and event indicator, conditional on all the covariates we have specified. - - - -- <p class = "smallbreak"> </p> * **cumulative logit mixed model**:<br> `graftnr` | `recurrence`, `futime`, `ischemic_hrs`, `donor_age`, ..., `recip_age`, `year` * **logistic mixed model**:<br> `recurrence` | `futime`, `ischemic_hrs`, `donor_age`, ..., `recip_age`, `year` ??? Then, two mixed models for the two time-varying covariates `graft number` and `recurrence`. There are no missing values in these variables, but we need these two models to model an association between them and the incomplete covariates. Because the graft number was coded as an ordered factor, a cumulative logit mixed model was chosen and, and for the binary recurrence status a logistic mixed model is the default. And as you can see, both models contain all baseline covariates, and in addition the measurement time, and recurrence is also included in the model for graft number. - - - -- <p class = "smallbreak"> </p> * **linear model**: <br> `ischemic_hrs` | `donor_age`, `graft_type`, `recip_sex`, `recip_age`, `year` * **linear model**:<br> `donor_age` | `graft_type`, `recip_sex`, `recip_age`, `year` * **multinomial logit model**:<br> `graft_type` | `recip_sex`, `recip_age`, `year` ??? And **JointAI** also specifies the models for the incomplete covariates. We have the ischemic time, the time the transplanted organ was outside the body, measured in hours and the age of the donor for which we use linear regression models, and the graft type, so if it was a living or deceased donor and if it was a full liver or just part of it, which was coded as an unordered factor and **JointAI** therefore automatically chose a multinomial logit model. The covariates are again all the remaining variables, for which no model has been specified yet in this sequence of conditional models. The last three covariates, recipient sex, recipient age and year of the first transplant did not have any missing values and so we do not need to specify models for them. One thing I should add here is that the covariates that refer to the donor should actually have been time-varying. Unfortunately, we only had them available for the first graft, and so they had to be included as time-constant. There is just not enough information available in the data to use them as time-varying and consider the information for a second and third graft to be missing. --- ## Example: PSC Recurrence & Survival ```r summary(patsurv) ``` .scroll450[ ``` ## ## Bayesian proportional hazards model fitted with JointAI ## ## Call: ## coxph_imp(formula = Surv(patsurv, last_status) ~ RecSex + DonorGraftCat + ## ns(RecAgeLTx1, df = 2) + ischemic_hrs + TH1_Donor_age + ns(year, ## df = 2) + graftnr * recurrence + (1 | IDPAT), data = patsurv_subDF, ## n.chains = n.chains, n.iter = n.iter, seed = seed, timevar = "futime") ## ## ## Number of events: 356 ## ## Posterior summary: ## Mean SD 2.5% 97.5% tail-prob. GR-crit MCE/SD ## RecSexF -0.20115 0.1187 -0.42839 0.0269 0.09000 1.00 0.0186 ## DonorGraftCatL 0.36646 0.3064 -0.27545 0.9332 0.23375 1.00 0.0190 ## DonorGraftCatDBD_P -0.53683 0.3783 -1.34258 0.1554 0.14300 1.00 0.0187 ## DonorGraftCatDCD -0.19627 0.8227 -2.06765 1.1567 0.89700 1.00 0.0170 ## ns(RecAgeLTx1, df = 2)1 0.43062 0.4092 -0.37078 1.2488 0.29200 1.00 0.0152 ## ns(RecAgeLTx1, df = 2)2 1.69408 0.2650 1.16088 2.2088 0.00000 1.00 0.0177 ## ischemic_hrs 0.02557 0.0175 -0.00965 0.0596 0.14100 1.00 0.0201 ## TH1_Donor_age 0.00976 0.0037 0.00253 0.0170 0.00575 1.00 0.0197 ## ns(year, df = 2)1 -2.31520 0.6036 -3.51769 -1.1646 0.00000 1.00 0.0221 ## ns(year, df = 2)2 -0.30971 0.2976 -0.88715 0.2671 0.29700 1.00 0.0200 ## graftnr2 0.87075 0.1563 0.55022 1.1618 0.00000 1.00 0.0223 ## graftnr3+ 1.00351 0.4884 -0.07082 1.8601 0.06325 1.01 0.0373 ## recurrence1 0.83676 0.1962 0.43004 1.2035 0.00000 1.01 0.0271 ## graftnr2:recurrence1 -0.93363 0.3539 -1.62133 -0.2441 0.00725 1.00 0.0255 ## graftnr3+:recurrence1 0.92089 0.6152 -0.24485 2.1896 0.11825 1.01 0.0375 ## ## Posterior summary of other parameters: ## Mean SD 2.5% 97.5% tail-prob. GR-crit MCE/SD ## beta_Bh0_Srv_ptsrv_lst_s[1] -3.08 0.186 -3.45 -2.73 0.000 1.00 0.0250 ## beta_Bh0_Srv_ptsrv_lst_s[2] -3.89 0.254 -4.40 -3.41 0.000 1.00 0.0372 ## beta_Bh0_Srv_ptsrv_lst_s[3] -4.48 0.292 -5.04 -3.90 0.000 1.01 0.0450 ## beta_Bh0_Srv_ptsrv_lst_s[4] -3.89 0.568 -5.00 -2.80 0.000 1.01 0.0536 ## beta_Bh0_Srv_ptsrv_lst_s[5] -4.44 1.076 -6.42 -2.25 0.001 1.01 0.0510 ## beta_Bh0_Srv_ptsrv_lst_s[6] -4.34 2.037 -8.89 -1.08 0.003 1.01 0.0380 ## ## ## MCMC settings: ## Iterations = 101:1100 ## Sample size per chain = 1000 ## Thinning interval = 1 ## Number of chains = 8 ## ## Number of observations: 3411 ## Number of groups: ## - IDPAT: 1428 ``` ] ??? After having fitted the model, we of course want to see the results, and as with all the standard regression functions in R, you can obtain them using the function `summary()`. Here just a quick overview of how this model summary looks like for a **JointAI** model. Don't pay too close attention to the model formula and the names of the variables. this is output from the actual model I fitted in the PSC project, but I changed the variable names and model structure a bit for the earlier examples to make it easier to read. The most interesting part is the posterior summary, where we get the posterior mean and standard deviation. Instead of confidence intervals we use the 2.5% and 97.5% quantiles of the posterior sample which give us the credible interval. The output also shows us the tail probability, which is sometimes called the Bayesian p-value, the Gelman Rubin criterion for convergence (aka R-hat or potential scale reduction factor) and the ratio of the Monte Carlo error to the posterior standard deviation, as a criterion of sampling precision. The output is split in the main parameters and, for some model types, other parameters. That would be variance parameters, or, here in the proportional hazards model the coefficients from the spline used to model the baseline hazard. At the end, we get some information on the MCMC sampling: the number of iterations, thinning and number of chains. And the number of observations. When we have multi-level data, this is given for each level of the hierarchy. --- ## Example: PSC Recurrence & Graft Survival <img src="index_files/figure-html/graftsurv-1.png" width="100%" /> ??? In this project on recurrence of PSC after liver transplant we were not only interested in patient survival, but also investigated graft survival. In this second analysis, we saw the transplanted organs as the unit of investigation. That changes things a bit. We still have recurrence as a time-varying covariate. Some grafts never experience recurrence, for some grafts recurrence changes from 0 to 1 and grafts that are transplanted after the patient has experienced recurrence, will have the value 1 for recurrence the whole time. The graft number is now no longer time-varying because it does not change through out the "life" of a graft. And the endpoint changed a bit. The event of interest is now graft death, and this includes the case where the patient died, but also when the patient received a new transplant. --- ## Example: PSC Recurrence & Graft Survival **Here:** analysis on the graft level * grafts are nested in patients ⇨ random intercept for `ID_pat` * repeated measurements within grafts ⇨ random intercept for `ID_graft` ```r fmla_graftsurv <- Surv(graft_time, graft_event) ~ recip_sex + graft_type + ... + graftnr * recurrence + `(1 | ID_graft) + (1 | ID_pat)` graftsurv <- coxph_imp(fmla_graftsurv, data = PSC_data, timevar = "futime", n.iter = 1000, n.chains = 8) ``` ??? To model this in **JointAI** we only need to change the model a bit. Because we now consider the grafts as our unit of interest, we have grafts nested in patients, and we have repeated measurements of recurrence within each graft, which means we add an additional level to our hierarchy and add another random intercept to the model. For the rest, the model specification remains the same. - - - -- * Possible to add **more hierarchical levels**:<br> grafts > patients > hospitals > countries * Can be seen as gap time approach for **recurrent events** in patients. ??? Technically, we could add even more levels to the hierarchy. For example, we could model that not just grafts are nested in patients, but that patients are nested in hospitals which are nested in countries. This could be done by just adding the random intercepts for the corresponding ID variables to the model formula. In our application, we considered the transplanted organs as our analysis unit of interest. But we could also interpret this model differently. We can also look at it as using a gap-time approach for modelling recurrent events within patients. Besides the fact that several of our covariates that should ideally be measured repeatedly within each graft, and not just once at baseline, another criticism of this model is that the time-dependent Cox model assumes the covariates to be exogenous, but in many medical applications that is not a realistic assumption. The values of the time-dependent covariates do usually depend on the survival status of the patient. And so in our applications, it is often more appropriate to fit a Joint Model for longitudinal and survival data instead. --- ## Extension to More Complex Analyses <br> <div class = "container"> <div class = "box"> <div class = "box-row"> <div class = "box-cell" style = "background: var(--nord0);">joint<br>distribution</div> <div class = "box-cell">\(=\)</div> <div class = "box-cell" style = "background: var(--nord0);">analysis<br>model</div> <div class = "box-cell" style = "background: var(--nord0);">covariate<br>models</div> <div class = "box-cell" style = "background: var(--nord0)">priors</div> </div> </div> </div> ??? And this brings me to my next point: to show you that this Bayesian approach to handle incomplete covariates is very suitable for quite complex models. Because in this approach we specify the joint distribution in form of the product of the analysis model, a specification for the covariates and the priors, we can just exchange the analysis model with some more complex model. - - - -- <br> **(Multivariate) Joint Model for Longitudinal and Survival Data:** `$$p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf x}, \mathbf b, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{ \underset{\substack{\text{survival}\\\text{model}}}{\underbrace{ p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf x}, \boldsymbol \theta)}}\;\; \underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{ p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf x}, \boldsymbol\theta)\; p(\mathbf b\mid \boldsymbol \theta)}} }}\;\; \underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{ p(\color{var(--nord15)}{\mathbf x} \mid \boldsymbol\theta)}}\;\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$` ??? For example, a multivariate joint model for longitudinal and survival data. There, the model for the response consists of multiple parts, the time-to-event model, and the multivariate mixed model for the longitudinal biomarkers. But the principle remains the same. To specify such a joint model in **JointAI** we can do the following: --- ## Example: Multivariate Joint Model Combine multiple models in a list of `formula` objects: ```r fmla <- list( Surv(etime, event) ~ age + sex + ... + `logBili + logAST + Plt`, Plt ~ age + sex + logAST + logBili + ns(time, df = 3) + (ns(time, df = 3) | StudyID), logBili ~ age + sex + logAST + time + (time | StudyID), logAST ~ age + sex + ns(time, df = 5) + (ns(time, df = 5) | StudyID) ) ``` ??? I again first specify the model formula. Because our analysis model now consists of multiple sub-models, I use a list of `formula` objects. This list contains the formula for a proportional hazards model for the time-to-event outcome, and separate formulas for each longitudinal outcome. Here, I use a different example from a project on patients with chronic hepatitis C. We have tree biomarkers, the log transformed bilirubin and AST, and the platelet count. For each of them I can use separate specifications of the random effects structure and select different sets of covariates. I included all three of them as covariates in the survival model, and, to model the association between the three biomarkers I also added them into each other's linear predictor in a sequential way. - - - -- ```r mod <- JM_imp(fmla, data = HCVdat, `timevar = "time"`, `rd_vcov = "full"`, n.iter = 10000, n.chains = 8) ``` `rd_vcov`: type of random effects variance-covariance matrix. ??? To fit the model we can use the function `JM_imp()`. We need to again specify the name of the time variable that contains the timing of the longitudinal measurements. In addition, we can specify how we would like to model the variance covariance matrix of the random effects. Here I have specified to use a full variance-covariance matrix, so that all the random effects from the three longitudinal models are modelled in one big multivariate normal distribution. Alternatively, I could have specified to use a block-diagonal structure, to treat random effects for the different outcomes as independent, or set all random effects to be independent. --- ## JointAI: Model Types .flex-grid[ .col[ **Generalized Linear Models** * `lm_imp()` * `glm_imp()` * `clm_imp()` * `mlogit_imp()` * `betareg_imp()` * `lognorm_imp()` ] .col[ **Mixed Models** * `lme_imp()` * `glme_imp()` * `clmm_imp()` * `mlogitmm_imp()` * `betamm_imp()` * `lognormmm_imp()` ] .col[ **Survival Models** * `coxph_imp()` * `survreg_imp()` * `JM_imp()` ] ] .footnote[ [Full documentation at https://nerler.github.io/JointAI/](https://nerler.github.io/JointAI/) ] ??? The types of analysis models that are implemented in **JointAI** are listed here: You can fit generalized linear regression models and generalized linear mixed models, as well as cumulative logit models, multinomial logit models, and models using a beta distribution or a log-normal distribution. For survival data, there are the two types of models we've already seen and a parametric model using a Weibull distribution. For any of these models it is possible to specify multiple analysis models by using a list of model formulas, as we've seen in the joint model example. --- ## JointAI: More Features .flex-grid[ .col[ <button class="modal-button" href="#myModal1">Model Specification</button> <div id="myModal1" class="modal"> <div class="modal-content"> <span class="close">×</span> <ul> <li>auxiliary variables (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#auxiliary-variables"><code>auxvars</code></a>)</li> <li> shrinkage (ridge) (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#shrinkage"><code>shrinkage</code></a>) <li> truncation of continuous distributions (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#functions-with-restricted-support"><code>trunc</code></a>) <li> setting reference categories (<a href="https://nerler.github.io/JointAI/articles/ModelSpecification.html#reference-categories"><code>refcats</code></a>) <li> change hyper-parameters (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#hyper-parameters"><code>hyperpars</code></a>) <li> parameters to be monitored (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html"><code>monitor_params</code></a>)</li> <li> export the JAGS model (<code>modelname</code>, <code>modeldir</code>, <code>keep_model</code>) <li> customize the JAGS model (<code>modelname</code>, <code>modeldir</code>, <code>overwrite</code>) </ul> </div> </div> <button class="modal-button" href="#myModal2">MCMC settings</button> <div id="myModal2" class="modal"> <div class="modal-content"> <span class="close">×</span> <ul> <li>adaptive phase (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#adaptive-phase"><code>n.adapt</code></a>)</li> <li>sampling phase (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#sampling-iterations"><code>n.iter</code></a>)</li> <li>number of chains (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#number-of-chains"><code>n.chains</code></a>)</li> <li>thinning (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#thinning"><code>thin</code></a>)</li> <li>initial values (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#initial-values"><code>inits</code></a>, <code><object>$mcmc_settings$inits</code>)</li> <li>seed value (<code>seed</code>)</li> <li>continue sampling (<a href="https://nerler.github.io/JointAI/reference/add_samples.html"><code>add_samples()</code></a>)</li> </ul> </div> </div> <button class="modal-button" href = "#modal-plots">Plots</button> <div class="modal" id="modal-plots"> <div class="modal-content"> <span class="close">×</span> <ul> <li>data distribution (<a href="https://nerler.github.io/JointAI/articles/VisualizingIncompleteData.html#visualize-the-distribution-of-each-variable"> <code>plot_all()</code></a>)</li> <li>missing data pattern (<a href = "https://nerler.github.io/JointAI/articles/VisualizingIncompleteData.html#missing-data-pattern"> <code>md_pattern()</code></a>)</li> <li>MCMC chains (<a href = "https://nerler.github.io/JointAI/articles/AfterFitting.html#trace-plot"> <code>traceplot()</code></a>)</li> <li>posterior density (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#density-plot"><code>densplot()</code></a>)</li> <li>imputed vs observed data (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#export-of-imputed-values"> <code>plot_imp_distr()</code></a>)</li> <li>Monte Carlo Error (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#sec:mcerror"><code>plot(MC_error(<object>))</code></a>)</li> </ul> </div> </div> <button class="modal-button" href = "#modal-parallel">Parallel Sampling</button> <div class="modal" id="modal-parallel"> <div class="modal-content"> <span class="close">×</span> Using the R package <strong>future</strong>: <pre><code class="r hljs remark-code"> <div class="remark-code-line"><span class="hljs-keyword">library</span>(<span class="hljs-string">"future"</span>)</div> <div class="remark-code-line">plan(multisession, workers = <span class="hljs-number">8</span>)</div> <div class="remark-code-line"><span class="hljs-comment">## fit JointAI model ...</span></div><br> <div class="remark-code-line"><span class="hljs-comment">## to re-set to sequential evaluation:</span></div> <div class="remark-code-line">plan(sequential)</div> </code></pre> </div> </div> ] .col[ <button class="modal-button" href = "#modal-clm"> Cumulative Logit Models</button> <div class="modal" id="modal-clm"> <div class="modal-content"> <span class="close">×</span> <ul> <li>invert the OR (<a href="https://nerler.github.io/JointAI/reference/model_imp.html#cumulative-logit-mixed-models"><code>rev</code></a>) \[\log\left(\frac{P(y_i \color{var(--nord14)}{>} k)}{P(y_i \color{var(--nord14)}{\leq} k)}\right) = \gamma_k + \eta_i \quad\text{vs}\quad \log\left(\frac{P(y_i \color{var(--nord14)}{\leq} k)}{P(y_i \color{var(--nord14)}{>} k)}\right) = \gamma_k + \eta_i\] </li> <li>partial proportional odds (<a href="https://nerler.github.io/JointAI/reference/model_imp.html#cumulative-logit-mixed-models"><code>nonprop</code></a>) \[\log\left(\frac{P(y_i > k)}{P(y_i \leq k)}\right) = \gamma_k + \eta_i \quad\text{vs}\quad \log\left(\frac{P(y_i \leq k)}{P(y_i > k)}\right) = \gamma_k + \eta_i \color{var(--nord14)}{+ \eta_{ki}}\] </li> </ul> </div> </div> <button class="modal-button" href = "#modal-multi-level"> Multi-level Models</button> <div class="modal" id="modal-multi-level"> <div class="modal-content"> <span class="close">×</span> <ul> <li><strong>lme4</strong> or <strong>nlme</strong> type specification</li> <li>nested and crossed random effects</li><br> <li>2, 3, 4, ... levels of grouping, i.e., <code> ... + (time | id) + (1 | group) + (1 | center ) + (1 | country) + ...</code></li> <li>multi-level structure also for survival models</li><br> <li>uses hierarchical centering, i.e., \(\mathbf b \sim N(\mathbf X\boldsymbol\beta, \mathbf D)\)</li> </ul> <a href="https://nerler.github.io/JointAI/reference/model_imp.html#model-formulas">⇨ For more info see the package documentation.</a> </div> </div> <button class="modal-button" href = "#modal-survival"> Survival<br>(with Time-varying Covariates) </button> <div class="modal" id="modal-survival"> <div class="modal-content"> <span class="close">×</span> <ul> <li>proportional hazards model with <strong>time-varying covariates</strong><br> (using <a href="https://nerler.github.io/JointAI/reference/model_imp.html#survival-models-with-frailties-or-time-varying-covariates"> <code> + (1 | id)</code></a> and <a href="https://nerler.github.io/JointAI/reference/model_imp.html#survival-models-with-frailties-or-time-varying-covariates"> <code> timevar = "<...>"</code></a>)</li> <li><strong>baseline hazard</strong>: B-spline with <code>df_basehaz</code> degrees of freedom</li><br> <li></strong>joint model</strong> for longitudinal and survival data<br> (using <a href = "https://nerler.github.io/JointAI/reference/model_imp.html#modelling-multiple-models-simultaneously-joint-models"> <code>formula = list(<...>)</code></a> and <a href="https://nerler.github.io/JointAI/reference/model_imp.html#modelling-multiple-models-simultaneously-joint-models"> <code>timevar = "<...>"</code></a>)</li> <li><strong>association structure</strong> (<code>assoc_type</code>) <ul> <li>underlying value (<code>underl.value</code>)</li> <li>observed/imputed value (<code>obs.value</code>)</li> </ul> </li> <li><strong>multivariate joint model</strong>:<br> full / block-diagonal / independent random effects</li> </ul> </div> </div> ] .col[ <button class="modal-button" href = "#modal-blackbox"> Shining Light into the Black Box</button> <div class="modal" id="modal-blackbox"> <div class="modal-content"> <span class="close">×</span> <ul> <li>monitor any node (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#other-parameters"> <code>monitor_params</code></a>)</li> <li>see the JAGS model (<code><object>$jagsmodel</code>)</li> <li>print model info (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#side-note-getting-information-about-of-the-imputation-models"> <code>list_models()</code></a>, <code><object>$models</code>)</li> <li>access the JAGS model object (<code><object>$model</code>)</li> <li>list all monitored parameters (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#parameters-of-the-analysis-model"> <code>parameters()</code></a>)</li> <li>list all regression coefficients (<code><object>$coef_list</code>)</li> </ul> </div> </div> <button class="modal-button" href = "#modal-compinfo"> Computational Infos</button> <div class="modal" id="modal-compinfo"> <div class="modal-content"> <span class="close">×</span> <code><object>$comp_info</code>: <ul> <li>start time</li> <li>duration</li> <li>package version</li> <li>R version</li> <li>parallel setting</li> </ul> </div> </div> <button class="modal-button" href = "#modal-subset"> Output Options</button> <div class="modal" id="modal-subset"> <div class="modal-content"> <span class="close">×</span> <ul> <li>Subset selection <ul> <li>iterations (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#subset-of-mcmc-samples"> <code>start</code>, <code>end</code>, <code>thin</code></a>)</li> <li>nodes (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#subset-of-parameters"> <code>subset</code></a>)</li> <li>chains (<a href = "https://nerler.github.io/JointAI/articles/AfterFitting.html#subset-of-mcmc-samples"><code>exclude_chains</code></a>)</li> <li>sub-models (<code>outcome</code>)</li> </ul><br> <li>export imputed values<br>(<code>monitor_params = c(imps = TRUE)</code>, <a href = "https://nerler.github.io/JointAI/reference/get_MIdat.html"><code>get_MIdat()</code></a>)</li> </ul> </div> </div> <button class="modal-button" href = "#modal-prediction">Prediction</button> <div class="modal" id="modal-prediction"> <div class="modal-content"> <span class="close">×</span> <ul> <li>create "newdata" for effect plots (<a href="https://nerler.github.io/JointAI/reference/predDF.html"> <code>predDF()</code></a>)</li><br> <li>prediction (<code>predict()</code>) <ul> <li><strong>GLM(M)</strong> type models: linear predictor or outcome scale<br> <span style = "text-decoration: underline;">for now</span>: <span style = "color: var(--nord12);"> assumption random effects = 0</span> </li> <li><strong>prop. hazard models</strong>: (log) hazard, (-log) survival</li> <li><strong>joint model</strong> for longitudinal & survival data: <span style = "color:var(--nord11);">not <span style = "text-decoration: underline;">yet</span> possible</span></li> </ul></li> </ul> </div> </div> ] ] --- ## Summary Imputation of covariates ⇨ sampling from the **correct predictive distributions**. **Direct specification is not straightforward** in settings with - non-linear associations (incl. survival), - interactions, - multi-level data. -- .flex-grid[ .col[ **MICE/FCS:**<br> uses direct specification<br> ⇨ **biased** in these settings ] .col[ **Fully Bayesian approach:**<br> - specification of the joint distribution - derives full-conditionals from the joint<br> ⇨ Assures compatibility & congeniality - extendible to complex/custom models & MNAR ] ] --- class: the-end, center, middle layout: true count: false # Thank you for your attention! <div class="contact"> <i class="fas fa-envelope"></i> <a href="mailto:n.erler@erasmusmc.nl" class="email">n.erler@erasmusmc.nl</a>  <a href="https://twitter.com/N_Erler" target="_blank"><i class="fab fa-twitter"></i> N_Erler</a>  <a href="https://github.com/NErler" target="_blank"><i class="fab fa-github"></i> NErler</a>  <a href="https://nerler.com" target="_blank"><i class="fas fa-globe-americas"></i> https://nerler.com</a> </div> --- count: false <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML"> </script> <script> // Get the button that opens the modal var btn = document.querySelectorAll("button.modal-button"); // All page modals var modals = document.querySelectorAll('.modal'); // Get the <span> element that closes the modal var spans = document.getElementsByClassName("close"); // When the user clicks the button, open the modal for (var i = 0; i < btn.length; i++) { btn[i].onclick = function(e) { e.preventDefault(); modal = document.querySelector(e.target.getAttribute("href")); modal.style.display = "block"; } } // When the user clicks on <span> (x), close the modal for (var i = 0; i < spans.length; i++) { spans[i].onclick = function() { for (var index in modals) { if (typeof modals[index].style !== 'undefined') modals[index].style.display = "none"; } } } // When the user clicks anywhere outside of the modal, close it window.onclick = function(event) { if (event.target.classList.contains('modal')) { for (var index in modals) { if (typeof modals[index].style !== 'undefined') modals[index].style.display = "none"; } } } </script>