class: left, top, title-slide
Joint Models for Incomplete Longitudinal and Survival Data
CISNET mid-year meeting 2021
Nicole Erler
Erasmus Medical Center
n.erler@erasmusmc.nl
N_Erler
NErler
https://nerler.com
--- count: false layout: true <div class="my-footer"><span> <a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a>      <a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a>      <a href = "https://nerler.com"><i class="fas fa-globe-americas"></i> nerler.com</a> </span></div> --- count: false ## Motivation: Chronic Hepatitis C * N `\(\approx\)` 700 patients with chronic hepatitis C * follow-up: `\(\leq\)` 38 years * several **baseline variables**: * age, sex, smoking, BMI, ... * **0 - 19% missing values**   ⇨ 66% complete cases * multiple **biomarkers**<br> (repeatedly measured) <img src="figures/plot_nobs.png", width = "650", style="position:absolute; bottom:45px; right:60px;"> ??? To find answers to many of the currently relevant questions in medical research, it is necessary to follow patients over a period of time, and to collect data at various occasions. For example, because we want to **study how certain characteristics or measurements change over time**, to **monitor progression** of a disease, or because we want to **predict clinical outcomes**. And having detailed information on the patient's history can lead to much better predictions. One such **example** is a study on **chronic hepatitis C patients**. Hep C is a **slow progressing** disease, and there are **highly effective treatments**, but patients are usually only **diagnosed once the damage to the liver has progressed** so far that the patients shows **symptoms of liver damage** or even liver failure. In many of those cases, liver **transplantation** is the only way to save the patient and, in the light of scarcity of donor organs, it is of interest to **identify factors that indicate a patient's risk** for liver failure. We have data from approx. **700 patients** available to find an answer to this question. In this data there are several **variables that are only measured once**, at baseline. This includes variables like the patients **age** and **sex**, and the **year of diagnosis**, but also information like **alcohol consumption**, **smoking**, **BMI**, which could change over time, but they were only measured once. In some of these variables, we have **missing values** for **up to 19% of the patients**. If we'd **exclude** everyone with missing values in these variables, we would **loose about a third of the data*. We also have a number of **biomarkers** that are **repeatedly measured**, but, since the data is not from a prospective study, the measurements are **very irregular**. The number of measurements per patient is shown here in the histogram. The **median** number of measurement times is **15**, but are a few patients with just **one single observation**, and one patient with **735 measurement** occasions. --- ## Missing Biomarker Values <img src="figures/HCVlongmissing.png" width="100%" style="display: block; margin: auto;" /> ??? In many cases the biomarkers are **measured at the same time points**, but it also happens that one biomarker value is missing, or was measured at another time. In the **plots** I show the trajectories of 6 different biomarkers of interest for 7 patients. The values are standardized to have similar scales for the purpose of this visualization. Observations that we would loose if we'd **restrict our data** to only those time points at which all 6 markers are measured are shown as transparent. For the full dataset this would result in the loss of 50% of the biomarker observations. --- ## Multivariate Joint Model **Proportional hazards model** for time until event: `$$h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\mathbf x_i^\top \boldsymbol\beta^{(tc)}}} + \underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)$$` ??? The **analysis model** that we would like to fit on this data is a multivariate joint model. This model consists of **two parts**, a **proportional hazards** model for the time-to-event outcome, and a **multivariate mixed model** for the longitudinal biomarkers. The **proportional hazards** model has the usual form, where we model the subject specific hazard using a **baseline hazard and a linear predictor**. In the linear predictor we have some variables that are **time constant**, our baseline covariates, and we have the **time-varying biomarker values**. - - - -- **Longitudinal (mixed) model** for each biomarker `\(k = 1, ... K\)`: `\begin{align*} \mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\ &= \underset{\text{fixed effects}}{\underbrace{\boldsymbol x_{ki}(t)^\top\boldsymbol\beta^{(k)}}} + \underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}} \qquad\scriptsize\text{with } \begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D) \end{align*}` ??? The biomarker values are modelled using mixed models, where the linear predictor `\(\eta\)` consists of the fixed effects part and a random effects part. The **random effects** of the models for the different biomarkers are then **modelled jointly** in a multivariate normal distribution. By using the linear predictors `\(\eta\)` of the longitudinal models as covariates in the time-to-event model we assume that the underlying value of the biomarkers, which may be measured with error, is associated with the risk of an event at the same time point. It is of course possible to assume different association structures, for example a time-lag, or a cumulative effect, but since the focus here is more about the missing values, we'll keep it simple. --- count: false ## Multivariate Joint Model **Proportional hazards model** for time until event: `$$h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\bbox[#3B4252, 2px]{\color{var(--nord15)}{\mathbf x_i}^\top} \boldsymbol\beta^{(tc)}}} + \underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)$$` **Longitudinal (mixed) model** for each biomarker `\(k = 1, ... K\)`: `\begin{align*} \mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\ &= \underset{\text{fixed effects}}{\underbrace{\bbox[#3B4252, 2px]{\color{var(--nord15)}{\boldsymbol x_{ki}(t)}^\top}\boldsymbol\beta^{(k)}}} + \underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}} \qquad\scriptsize\text{with } \begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D) \end{align*}` ??? In both model parts we might want to use baseline covariates, however in our motivating data, those variables have missing values. The missing values in the longitudinal biomarkers are not that interesting at this point, because we are fitting models for those variables anyway. And under ignorable missingness, the mixed models we use provide unbiased estimates. But for the missing baseline covariate values we need to perform some sort of imputation. --- ## Imputation of Missing Values * **uncertainty** about the missing value ??? The basic ideas about imputation go back to Donald Rubin and his work in the 1960s and 70s. The important issue in imputing missing values is that there is **uncertainty** about what the value would have been. And so we **can't just pick** one value and fill it in, because then we would ignore this uncertainty. - - - - -- * some values **more likely** than others * relationship with **other** available **data** ??? But we have some knowledge about the missing values. Usually, some values are going to be more likely than others, and in many cases there is a relationship between the variable that has missing values and the other data that we have collected. - - - - -- .pull-left[ **⇨ missing values have a distribution** <img src="index_files/figure-html/unnamed-chunk-5-1.png" width="60%" style="display: block; margin: auto;" /> <!-- <img src="figures/ImpDens.png", height = 250, style = "margin: auto; display: block;"> --> ] ??? So, it makes sense to assume that missing values have a distribution, and that we need a model to learn how the incomplete variable is related to the other data. - - - - -- .pull-right[ <br> .hlgt-box[ **Predictive distribution** of the missing values given the observed values. `$$p(x_{mis}\mid\text{everything else})$$` ] ] ??? And this means that we can impute the missing values by sampling from the predictive distribution of the missing values conditional on everything else. This everything else includes the observed data, including the response variables, other covariates and parameters. --- ## Multiple Imputation using FCS (MICE) .flex-grid[ .col[ <div style = "text-align: center; margin-bottom: 25px;"> <strong>Multivariate<br>Missingness</strong></div> <table class="simpletable2"> <tr> <th></th> <th>\(\mathbf y\)</th> <th>\(\color{var(--nord15)}{\mathbf x_1}\)</th> <th>\(\color{var(--nord15)}{\mathbf x_2}\)</th> <th>\(\color{var(--nord15)}{\mathbf x_3}\)</th> <th>\(\ldots\)</th> </tr> <tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr">\(i\)</td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <th>\(\ldots\)</th> </tr> <tr> <td class = "rownr"></td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td></td> </tr> </table> ] .col[ **Most common approach:**<br> <span style = "font-weight: bold;">MICE</span> <span style = "color: var(--lgrey);">(multivariate imputation by chained equations)</span><br> <span style = "font-weight: bold;">FCS</span> <span style = "color: var(--lgrey);">(fully conditional specification)</span> <div> \begin{alignat}{10} \color{var(--nord15)}{\mathbf x_1} &= \beta_0 &+& \beta_1 \mathbf y &+& \beta_2 \color{var(--nord15)}{\mathbf x_2} &+& \beta_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_2} &= \alpha_0 &+& \alpha_1 \mathbf y &+& \alpha_2 \color{var(--nord15)}{\mathbf x_1} &+& \alpha_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_3} &= \theta_0 &+& \theta_1 \mathbf y &+& \theta_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta_3 \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon \end{alignat} </div> {{content}} ] ] ??? In practice, we usually have missing values in just a single variable, but in multiple variables. And the most common approach to imputation in this setting is MICE, short for **multivariate imputation by chained equations**, an approach that is also called **fully conditional specification**. The principle is an extension to what we've seen on the previous slide. We impute missing values using models that have all other data in their linear predictor. - - - -- <br> * iterative * based on the idea of the Gibbs sampler ??? Because in these imputation models we now have incomplete covariates, we use an iterative algorithm. You start by randomly drawing starting values from the observed part of the data, and then you cycle through the incomplete variables and impute one at a time. Once we have imputed each missing value, we start again with the first variable, but now use the imputed values of the other variables instead of the starting values, and we do this a few times until the algorithm has converged. This algorithm is based on the idea of the Gibbs sampler, which allows you to sample from a mutlivariate distribution by iteratively sampling from the full conditionals derived from the joint distribution. --- ## A Simple Example .pull-left[ **Implied Assumption:**<br> <span>Linear association</span> between `\(\color{var(--nord15)}{\mathbf x_1}\)` and `\(\mathbf y\)`: `$$\mathbb{E}(\color{var(--nord15)}{\mathbf x_1}) = \theta_0 + \bbox[#3B4252, 2pt]{\theta_1 \mathbf y} + \theta_2 \mathbf x_2 + \theta_3 \mathbf x_3$$` <!-- <img src="figures/linplot.png", width = "450", height = "300", style="position:absolute; bottom:45px;"> --> <img src="index_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" /> ] ??? One thing that is implied when we use such simple regression models for the imputation is that there is a linear association between incompl. covariate and the response (and other covariates). -- .pull-right[ <br> But what if `$$\mathbb{E}(\mathbf y) = \beta_0 + \bbox[#3B4252, 2pt]{\beta_1 \color{var(--nord15)}{\mathbf x_1} + \beta_2 \color{var(--nord15)}{\mathbf x_1}^2} + \beta_3 \mathbf x_2 + \beta_4 \mathbf x_3$$` <img src="index_files/figure-html/unnamed-chunk-8-1.png" width="100%" style="display: block; margin: auto;" /> <!-- <img src="figures/qdrplot.png", width = "450", height = "300", style="position:absolute; bottom:45px;"> --> ] ??? But that is of course not always the case. What if we have a setting where we assume that there is a non-linear association, for example quadratic? --- ## Non-linear Associations .pull-left[ * <span style="font-weight: bold; color:var(--nord4);">true association</span>: non-linear * <span style="font-weight: bold; color:var(--nord7);">imputation assumption</span>: linear ] .pull-right[ <!-- <span style="font-size: 56pt; position: relative; right: 110px; bottom: 20px;">} ⇨</span> --> <!-- <span style = "color: var(--nord11); font-size: 1.2rem; font-weight: bold; position: relative; bottom: 30px; right: 100px;"> --> <!-- bias!</span> --> ] <img src="figures/impplot.png", height = 350, style = "margin: auto; display: block;"> ??? If we * correctly assume a non-linear association in the analysis model * but a linear association in the imputation model we introduce bias, even if we analyse the imputed data under the correct assumption because the imputed values will follow the linear association and thereby reduce the shape of the structure in the observed data. --- count: false class: animated, fadeIn ## Non-linear Associations .pull-left[ * <span style="font-weight: bold; color:var(--nord4);">true association</span>: non-linear * <span style="font-weight: bold; color:var(--nord7);">imputation assumption</span>: linear ] .pull-right[ <span style="font-size: 56pt; position: relative; right: 110px; bottom: 20px;">} ⇨</span> <span style = "color: var(--nord11); font-size: 1.2rem; font-weight: bold; position: relative; bottom: 30px; right: 100px;"> bias!</span> ] <img src="figures/impplot2.png", height = 350, style = "margin: auto; display: block;"> ??? If we * correctly assume a non-linear association in the analysis model * but a linear association in the imputation model we introduce bias, even if we analyse the imputed data under the correct assumption because the imputed values will follow the linear association and thereby reduce the shape of the structure in the observed data. --- ## Time-to-Event Outcomes (Simple) **Proportional Hazards Model:** `$$h_i(t) = h_0(t) \exp\left(\color{var(--nord15)}{\mathbf x_i}^\top \boldsymbol\beta^{(tc)}\right)$$` **Log-likelihood** `$$p(\mathbf T, \boldsymbol \delta \mid \color{var(--nord15)}{\mathbf x}, \boldsymbol\beta^{(tc)}) = \boldsymbol\delta \left(\log h_0(T) + \color{var(--nord15)}{\mathbf x} \boldsymbol\beta^{(tc)}\right) - \int_0^T h_0(s)\exp\left( \color{var(--nord15)}{\mathbf x} \boldsymbol\beta^{(tc)}\right)ds$$` <br> Inconsistent with the (naive) imputation model: `$$\color{var(--nord15)}{\mathbf x} = \theta_0 + \theta_1 \mathbf T + \theta_2 \boldsymbol\delta + \theta_3\ldots$$` ??? Alright, but when we don't have such non-linear associations with incomplete covariates the imputation should just be fine? Not when we are talking about survival data. Take a look at this simple proportional hazards model. I've excluded the time-varying part for now. - - - The corresponding log likelihood then is given by this formula here. This is the likelihood used in our analysis model, and it implies a non-linear association between the response variables, the event time and event indicator, and the potentially incomplete covariates `\(x\)`. --- ## Multi-level Data **Aim:** Impute `\(\color{var(--nord15)}{x_{mis}}\)` from the predictive distribution `\(p(\color{var(--nord15)}{x_{mis}}\mid \text{everything else})\)`. <p class = "smallbreak"> </p> .flex-grid[ .col[ <div style = "width: 770px;"></div> For example: `$$p(\text{BMI} \mid \text{age}, \text{sex}, ..., \text{AST}, \text{ALT}, ...)$$` {{content}} ] .col[ <table class="simpletable2"> <tr> <th></th> <th>age</th> <th>BMI</th> <th>AST</th> <th>ALT</th> <th>\(\ldots\)</th> </tr> <tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr class="hlgt-row"> <td class="rownr">\(i\)</td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr class = "hlgt-row"> <td class="rownr">\(i\)</td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr class = "hlgt-row"> <td class="rownr">\(i\)</td> <td><i class = "fas fa-check"</i></td> <td style="color: var(--nord15);"><i class = "fas fa-question"></i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr> <td class="rownr"></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td><i class = "fas fa-check"</i></td> <td>\(\ldots\)</td> </tr> <tr> <td class = "rownr"></td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td>\(\vdots\)</td> <td></td> </tr> </table> ] ] ??? But also with regards to the other component of our data, the longitudinally measured biomarkers we might have some problems. Shown here are the observation times from the hepatitis C data. On the x-axis we have the time of the measurement, which is measured in years since diagnosis, and on the y-axis just the different patients, sorted by the length of their follow-up. As you can see, there are quite some differences in the number of measurements we have for the patients, and when they were taken. And so we will have our data in long format, so that we have multiple rows per patient. To impute missing values in one of the baseline covariates, for example in BMI, we need the distribution of that covariate, conditional on everything else, which includes the longitudinally measured biomarker values, for example, AST and ALT. The imputations for the three missing observations of BMI for subject `\(i\)`, indicated with the questionmarks, should of course all have the same value, because BMI was only measured at baseline, and the baseline BMI does not change over time. -- ⇨ time-varying imputations for time-constant covariates * **Average** imputed values? * **Summarize** time-varying variables ⇨ wide format? <p class = "smallbreak"> </p> <ul class = "arrowlist"> <li>Imputation of level-2 (baseline) variables is not straightforward.</li> </ul> ??? But when our data is in long format, the imputations we'd get from a regression model, whether we'd use a simple GLM or a mixed model, would not be constant over time. We could consider to then average the imputed values, but then we'd not take into account that measurements of the biomarkers that were taken closer to baseline should probably have a stronger association with BMI than values taken 20 years later. Alternatively, we could think about summarizing the longitudinal variables so that we can represent our data in wide format. But the simple summaries will lose information and thereby probably introduce bias, and when we need to impute missing values in both baseline and time-varying covariates, then we'd have to switch between long and wide format. It is probably possible to come up with some algorithm that would work well enough, but the standard implementations of MICE are not correct. --- ## Non-linear Associations & Multi-level In settings with **non-linear associations** and **multi-level data** the **correct predictive distribution** $$ p(\color{var(--nord15)}{\mathbf x_{mis}} \mid \text{everything else})$$ may not have a closed form. <br> <div style="width: 85%; padding: 1.15em 2em; background-color: var(--nord0); margin: auto; display: block; text-align: center;"> <strong><span style="font-size: 1.5rem;">⇨</span> We cannot easily specify the correct imputation model directly.</strong> </div> ??? So, in summary, whenever we have a non-linear association between the response and incomplete covariates, including time-to-event outcomes, or when we have multi-level data, it is not straightforward to specify the correct imputation model, meaning the correct predictive distribution of the missing values, directly because usually they do not have a closed form. --- ## Getting the Correct Distribution .flex-grid[ .col[ * We need `\(\;p(\color{var(--nord15)}{\mathbf x} \mid \mathbf y, \ldots)\)` * We know `\(\;p(\mathbf y \mid \color{var(--nord15)}{\mathbf x}, \ldots)\)` ] .col[ <div style = "color: var(--nord3);"> <ul> <li>\(\mathbf x\): incomplete covariate(s)</li> <li>\(\mathbf y\): outcome(s)</li> <li>\(\ldots\): everything else </li> </ul> </div> ] ] ??? The interesting question now is: How do we get this predictive distribution? What we need is the distribution of the incomplete variable `\(x\)` conditional on the response and everything else. And what we do know is the distribution of the response conditional on the covariates and everything else, because this is the analysis model that we have specified already. And here Bayes comes to the rescue. -- **Bayes Theorem:** `\begin{align} p(\color{var(--nord15)}{\mathbf x} \mid \mathbf y, \ldots) = \frac{p(\mathbf y \mid \color{var(--nord15)}{\mathbf x}, \ldots)\; p(\color{var(--nord15)}{\mathbf x},\ldots)}{p\left(\mathbf y, \ldots\right)} &\propto \underset{\text{joint distribution}}{\underbrace{p(\mathbf y, \color{var(--nord15)}{\mathbf x}, \ldots)}} \end{align}` ??? Because Bayes theorem allows us to convert a conditional distribution into the reversed conditional, and so using this theorem we can derive that the distribution of interest, the distribution of the incomplete covariate conditional on everything else is proportional to the distribution of the response conditional on the covariates, and the distribution of the covariates and everything else. We can now factorize this further, but I find it easier to just look at it as a specification of the joint distribution of everything. -- For example: $$ p(\mathbf y, \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3, \boldsymbol\theta)}}\; \underset{\text{covariate model(s)}}{\underbrace{p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3 \mid \boldsymbol\theta)}}\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}} $$ ??? And we can use this principle for our example. So we can write the joint distribution of `\(y\)` and `\(x_1\)`, `\(x_2\)` and `\(x_3\)` as the conditional distribution of `\(y\)` given the covariates., the distribution of the covariates conditional on the parameters, and, of course, prior distributions for those parameters. --- ## Bayesian Analysis of Incomplete Data <div class = "container"> <div class = "box"> <div class = "box-row"> <div class = "box-cell" style = "background: var(--nord0);">joint<br>distribution</div> <div class = "box-cell">\(=\)</div> <div class = "box-cell" style = "background: var(--nord0);">analysis<br>model</div> <div class = "box-cell" style = "background: var(--nord0);">covariate<br>models</div> <div class = "box-cell" style = "background: var(--nord0)">priors</div> </div> </div> </div> ??? And this means that we can always divide the joint distribution into the analysis model, a model for the covariates, and prior distributions. And this also works for fairly complex models. - - - -- <br> **(Multivariate) joint model for longitudinal and survival data:** `$$p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf x}, \mathbf b, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{ \underset{\substack{\text{survival}\\\text{model}}}{\underbrace{ p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf x}, \boldsymbol \theta)}}\;\; \underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{ p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf x}, \boldsymbol\theta)\; p(\mathbf b\mid \boldsymbol \theta)}} }}\;\; \underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{ p(\color{var(--nord15)}{\mathbf x} \mid \boldsymbol\theta)}}\;\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$` ??? For example, for our multivariate joint model for longitudinal and survival data. There, the model for the response consists of multiple parts, the time-to-event model, and the multivariate mixed model for the longitudinal biomarkers. But we are still left with the question how we can specify a model for the covariates. This is usually a multivariate distribution, and because we have covariates of different types, they could be continuous or categorical, and they could be from different levels in a multi-level setting, the distribution does not have a closed form. --- name: covariatemodels ## Covariate Models For example `\begin{align} p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \color{var(--nord15)}{\mathbf x_3(t)}, \mathbf x_4(t) \mid \boldsymbol\theta) =\, & p(\color{var(--nord15)}{\mathbf x_3(t)} \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_4(t), \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\ & p(\mathbf x_4(t) \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\ & p(\color{var(--nord15)}{\mathbf x_1} \mid \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLM}}\\ & \color{var(--nord3)}{p(\mathbf x_2 \mid \boldsymbol\theta)} & \color{var(--nord3)}{\small\text{(can be omitted)}} \end{align}` ??? [Jump to Hep C example default values slide](#hepcdefaults) -- <br> <div style="width: 65%; padding: 1.15em 2em; background-color: var(--nord0); margin: auto; display: block; text-align: center;"> Covariate models \(\neq\) imputation models! </div> --- ## Estimation <div class = "container"> <div class = "box"> <div class = "box-row"> <div class = "box-cell" style = "background: var(--nord0);">imputation<br>models</div> <div class = "box-cell">\(\propto\)</div> <div class = "box-cell" style = "background: var(--nord0);">joint<br>distribution</div> <div class = "box-cell">\(=\)</div> <div class = "box-cell" style = "background: var(--nord0);">analysis<br>model</div> <div class = "box-cell" style = "background: var(--nord0);">covariate<br>models</div> <div class = "box-cell" style = "background: var(--nord0)">priors</div> </div> </div> </div> .flex-grid[ .col[ <img src = "graphics/Gibbs.png", height = 350;> ] .col[ <br> <div style = "width: 650px;"> Estimation via MCMC<br> ⇨ Gibbs sampler </div> ] ] ??? In most cases, the posterior distribution will not have a closed form and so we won't be able to derive it analytically. Instead, Markov Chain Monte Carlo methods are used to create a sample from the posterior distribution. The results from the Bayesian analysis are then presented as summary measures of this sample, usually the mean and the 2.5% and 97.5% quantiles, which form the 95% credible interval. Since, in the Bayesian framework, the result is given in terms of the probability distribution of the unknown parameters conditional on the data that was observed, these results have a more intuitive interpretation than frequentist results. -- <img src = "graphics/MICE.png", height = 300; style = "position: absolute; bottom: 50px; right: 60px;"> --- ## Advantages of the Bayesian Approach `$$p(\mathbf y, \mathbf x, \boldsymbol\theta) = \bbox[#2E3440, 5pt]{\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf x, \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}})}}}\; p(\mathbf x \mid \boldsymbol\theta_x)\; p(\color{var(--nord14}{\boldsymbol\theta_{y\mid x}}, \boldsymbol\theta_x)\qquad \color{var(--nord3)}{\text{with } \boldsymbol\theta = (\boldsymbol\theta_{y\mid x}, \boldsymbol\theta_x)}$$` * specification of the joint distribution<br> ⇨ assures **compatibility** of imputation models -- * use of the analysis model in the specification of the joint distribution<br> ⇨ **parameters of interest** `\(\color{var(--nord14)}{\boldsymbol\theta_{y\mid x}}\)` are estimated directly<br> ⇨ **non-linear associations** are taken into account<br> ⇨ assures **congeniality** -- * response is not in any linear predictor<br> ⇨ no problem to use **complex outcomes** --- ## In practice: <i class="fab fa-r-project"></i> Package JointAI <div class = "container"> <div class = "box" style = "margin-left: 20px;"> <div class = "box-row"> <div class = "box-cell" style="width:180px;"> <strong>Specification</strong><br>of the joint<br> distribution: </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>User</strong></div> <div class = "box-cell" style = "background: var(--nord1); border: 2px solid var(--nord8); ">analysis<br>model</div> </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div> <div class = "box-row"> <div class = "box-cell" style = "background: var(--nord1);">covariate<br>models</div> <div class = "box-cell" style = "background: var(--nord1)">priors</div> </div></div> </div></div></div> -- <br> <div class = "container"> <div class = "box" style = "margin-left: 20px;"> <div class = "box-row"> <div class = "box-cell" style="width:180px;"> <strong>Estimation:</strong> </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div> <div class = "box-cell" style = "background: var(--nord1);">pre-processing</div> </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>rjags / JAGS</strong></div> <div class = "box-cell" style = "background: var(--nord1);">MCMC sampling</div> </div> <div class = "box-cell" style = "background: var(--nord0);"> <div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div> <div class = "box-cell" style = "background: var(--nord1);">post-processing</div> </div> </div></div></div> --- ## Analysis of the Hepatitis C Data ```r fmla <- list( Surv(etime, event) ~ age + sex + ... + DM + `logBili` + `logALT` + `logAST` + `Plt`, `Plt` ~ age + sex + `logAST` + `logALT` + `logBili` + ns(time, df = 3) + (ns(time, df = 3) | StudyID), `logBili` ~ age + sex + `logAST` + `logALT` + time + (time | StudyID), `logAST` ~ age + sex + `logALT` + ns(time, df = 5) + (ns(time, df = 5) | StudyID), `logALT` ~ age + sex + ns(time, df = 3) + (ns(time, df = 3) | StudyID) ) ``` -- ```r library("JointAI") mod <- JM_imp(fmla, data = HCVdat, timevar = "time", inits = <inits>, n.iter = 10000, n.chains = 8) ``` ??? Initial values for `\(\tau\)`, `\(\b\)` and `\(D^{-1}\)` --- name: hepcdefaults ## JointAI: Default Setting .flex-grid[ .col[ * Baseline hazard: B-spline * Defaults: * 6df in the B-spline * block-diagonal structure for `\(\mathbf D\)` * underlying value association * no model for `"time"` * normal distr. for continuous covariates ... ] .col[ <img src = "graphics/VC.png", height = 450;> ] ] ??? [Jump to covariate models](#covariatemodels) --- ## JointAI: List all Models .scroll450[ ```r list_models(mod) ``` ``` ## Joint survival and longitudinal model for "Surv_etime_evnt" ## * Predictor variables: ## age, sexFemale, year, alcoholYes, smokingNo, BMI, antiHBcYes, DMYes, logBili, ## logALT, logAST, Plt ## * Regression coefficients: ## beta[1:12] (normal prior(s) with mean 0 and precision 0.001) ## * Regression coefficients of the baseline hazard: ## beta_Bh0_Surv_etime_evnt[1:6] (normal priors with mean 0 and precision 0.001) ## * association types: ## - logBili: underl.value ## - logALT: underl.value ## - logAST: underl.value ## - Plt: underl.value ## ## Linear mixed model for "Plt" ## family: gaussian ## link: identity ## * Predictor variables: ## (Intercept), age, sexFemale, logAST, logALT, logBili, ## ns(time, df = 4)1, ns(time, df = 4)2, ns(time, df = 4)3, ns(time, df = 4)4 ## * Regression coefficients: ## beta[13:22] (normal prior(s) with mean 0 and precision 1e-04) ## * Precision of "Plt" : ## tau_Plt (Gamma prior with shape parameter 0.01 and rate parameter 0.01) ## ## ## Linear mixed model for "logBili" ## family: gaussian ## link: identity ## * Predictor variables: ## (Intercept), age, sexFemale, logAST, logALT, ns(time, df = 3)1, ## ns(time, df = 3)2, ns(time, df = 3)3 ## * Regression coefficients: ## beta[23:30] (normal prior(s) with mean 0 and precision 1e-04) ## * Precision of "logBili" : ## tau_logBili (Gamma prior with shape parameter 0.01 and rate parameter 0.01) ## ## ## Linear mixed model for "logAST" ## family: gaussian ## link: identity ## * Predictor variables: ## (Intercept), age, sexFemale, logALT, ns(time, df = 3)1, ns(time, df = 3)2, ## ns(time, df = 3)3 ## * Regression coefficients: ## beta[31:37] (normal prior(s) with mean 0 and precision 1e-04) ## * Precision of "logAST" : ## tau_logAST (Gamma prior with shape parameter 0.01 and rate parameter 0.01) ## ## ## Linear mixed model for "logALT" ## family: gaussian ## link: identity ## * Predictor variables: ## (Intercept), age, sexFemale, ns(time, df = 5)1, ns(time, df = 5)2, ## ns(time, df = 5)3, ns(time, df = 5)4, ns(time, df = 5)5 ## * Regression coefficients: ## beta[38:45] (normal prior(s) with mean 0 and precision 1e-04) ## * Precision of "logALT" : ## tau_logALT (Gamma prior with shape parameter 0.01 and rate parameter 0.01) ## ## ## Linear model for "BMI" ## family: gaussian ## link: identity ## * Predictor variables: ## (Intercept), age, sexFemale, year, alcoholYes, smokingNo, antiHBcYes, DMYes ## * Regression coefficients: ## alpha[1:8] (normal prior(s) with mean 0 and precision 1e-04) ## * Precision of "BMI" : ## tau_BMI (Gamma prior with shape parameter 0.01 and rate parameter 0.01) ## ## ## Binomial model for "smoking" ## family: binomial ## link: logit ## * Reference category: "Yes" ## * Predictor variables: ## (Intercept), age, sexFemale, year, alcoholYes, antiHBcYes, DMYes ## * Regression coefficients: ## alpha[9:15] (normal prior(s) with mean 0 and precision 1e-04) ## ## ## Binomial model for "antiHBc" ## family: binomial ## link: logit ## * Reference category: "No" ## * Predictor variables: ## (Intercept), age, sexFemale, year, alcoholYes, DMYes ## * Regression coefficients: ## alpha[16:21] (normal prior(s) with mean 0 and precision 1e-04) ## ## ## Binomial model for "alcohol" ## family: binomial ## link: logit ## * Reference category: "No" ## * Predictor variables: ## (Intercept), age, sexFemale, year, DMYes ## * Regression coefficients: ## alpha[22:26] (normal prior(s) with mean 0 and precision 1e-04) ## ## ## Binomial model for "DM" ## family: binomial ## link: logit ## * Reference category: "No" ## * Predictor variables: ## (Intercept), age, sexFemale, year ## * Regression coefficients: ## alpha[27:30] (normal prior(s) with mean 0 and precision 1e-04) ``` ] --- ## JointAI: Model Types .flex-grid[ .col[ **Univariate Models:** * `lm_imp()` * `glm_imp()` * `clm_imp()` * `mlogit_imp()` * `betareg_imp()` * `lognorm_imp()` ] .col[ **Mixed Models** * `lme_imp()` * `glme_imp()` * `clmm_imp()` * `mlogitmm_imp()` * `betamm_imp()` * `lognormmm_imp()` ] .col[ **Survival Models** * `coxph_imp()` * `survreg_imp()` * `JM_imp()` ] ] .footnote[ [Full documentation at https://nerler.github.io/JointAI/](https://nerler.github.io/JointAI/) ] --- ## JointAI: More Features .flex-grid[ .col[ <button class="modal-button" href="#myModal1">Model Specification</button> <div id="myModal1" class="modal"> <div class="modal-content"> <span class="close">×</span> <ul> <li>auxiliary variables (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#auxiliary-variables"><code>auxvars</code></a>)</li> <li> shrinkage (ridge) (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#shrinkage"><code>shrinkage</code></a>) <li> truncation of continuous distributions (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#functions-with-restricted-support"><code>trunc</code></a>) <li> setting reference categories (<a href="https://nerler.github.io/JointAI/articles/ModelSpecification.html#reference-categories"><code>refcats</code></a>) <li> change hyper-parameters (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#hyper-parameters"><code>hyperpars</code></a>) <li> parameters to be monitored (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html"><code>monitor_params</code></a>)</li> <li> export the JAGS model (<code>modelname</code>, <code>modeldir</code>, <code>keep_model</code>) <li> customize the JAGS model (<code>modelname</code>, <code>modeldir</code>, <code>overwrite</code>) </ul> </div> </div> <button class="modal-button" href="#myModal2">MCMC settings</button> <div id="myModal2" class="modal"> <div class="modal-content"> <span class="close">×</span> <ul> <li>adaptive phase (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#adaptive-phase"><code>n.adapt</code></a>)</li> <li>sampling phase (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#sampling-iterations"><code>n.iter</code></a>)</li> <li>number of chains (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#number-of-chains"><code>n.chains</code></a>)</li> <li>thinning (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#thinning"><code>thin</code></a>)</li> <li>initial values (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#initial-values"><code>inits</code></a>, <code><object>$mcmc_settings$inits</code>)</li> <li>seed value (<code>seed</code>)</li> <li>continue sampling (<a href="https://nerler.github.io/JointAI/reference/add_samples.html"><code>add_samples()</code></a>)</li> </ul> </div> </div> <button class="modal-button" href = "#modal-plots">Plots</button> <div class="modal" id="modal-plots"> <div class="modal-content"> <span class="close">×</span> <ul> <li>data distribution (<a href="https://nerler.github.io/JointAI/articles/VisualizingIncompleteData.html#visualize-the-distribution-of-each-variable"> <code>plot_all()</code></a>)</li> <li>missing data pattern (<a href = "https://nerler.github.io/JointAI/articles/VisualizingIncompleteData.html#missing-data-pattern"> <code>md_pattern()</code></a>)</li> <li>MCMC chains (<a href = "https://nerler.github.io/JointAI/articles/AfterFitting.html#trace-plot"> <code>traceplot()</code></a>)</li> <li>posterior density (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#density-plot"><code>densplot()</code></a>)</li> <li>imputed vs observed data (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#export-of-imputed-values"> <code>plot_imp_distr()</code></a>)</li> <li>Monte Carlo Error (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#sec:mcerror"><code>plot(MC_error(<object>))</code></a>)</li> </ul> </div> </div> <button class="modal-button" href = "#modal-parallel">Parallel Sampling</button> <div class="modal" id="modal-parallel"> <div class="modal-content"> <span class="close">×</span> Using the R packages <strong>future</strong> and <strong>doFuture</strong>: <pre><code class="r hljs remark-code"> <div class="remark-code-line"><span class="hljs-keyword">library</span>(<span class="hljs-string">"doFuture"</span>)</div> <div class="remark-code-line">plan(multiprocess(workers = <span class="hljs-number">6</span>))</div> <div class="remark-code-line">registerDoFuture()</div> <div class="remark-code-line"></div> <div class="remark-code-line"><span class="hljs-comment">## fit JointAI model ...</span></div><br> <div class="remark-code-line"><span class="hljs-comment">## to re-set to sequential evaluation:</span></div> <div class="remark-code-line">plan(sequential())</div> </code></pre> </div> </div> ] .col[ <button class="modal-button" href = "#modal-clm"> Cumulative Logit Models</button> <div class="modal" id="modal-clm"> <div class="modal-content"> <span class="close">×</span> <ul> <li>invert the OR (<a href="https://nerler.github.io/JointAI/reference/model_imp.html#cumulative-logit-mixed-models"><code>rev</code></a>) \[\log\left(\frac{P(y_i \color{var(--nord14)}{>} k)}{P(y_i \color{var(--nord14)}{\leq} k)}\right) = \gamma_k + \eta_i \quad\text{vs}\quad \log\left(\frac{P(y_i \color{var(--nord14)}{\leq} k)}{P(y_i \color{var(--nord14)}{>} k)}\right) = \gamma_k + \eta_i\] </li> <li>partial proportional odds (<a href="https://nerler.github.io/JointAI/reference/model_imp.html#cumulative-logit-mixed-models"><code>nonprop</code></a>) \[\log\left(\frac{P(y_i > k)}{P(y_i \leq k)}\right) = \gamma_k + \eta_i \quad\text{vs}\quad \log\left(\frac{P(y_i \leq k)}{P(y_i > k)}\right) = \gamma_k + \eta_i \color{var(--nord14)}{+ \eta_{ki}}\] </li> </ul> </div> </div> <button class="modal-button" href = "#modal-multi-level"> Multi-level Models</button> <div class="modal" id="modal-multi-level"> <div class="modal-content"> <span class="close">×</span> <ul> <li><strong>lme4</strong> or <strong>nlme</strong> type specification</li> <li>nested and crossed random effects<br>(determined by data structure)</li> <li>2, 3, 4, ... levels of grouping, i.e., <code> ... + (time | id) + (1 | group) + (1 | center ) + (1 | country) + ...</code></li> <li>multi-level structure also for survival models</li><br> <li>uses hierarchical centering, i.e., \(\mathbf b \sim N(\mathbf X\boldsymbol\beta, \mathbf D)\)</li> </ul> <a href="https://nerler.github.io/JointAI/reference/model_imp.html#model-formulas">⇨ For more info see the package documentation.</a> </div> </div> <button class="modal-button" href = "#modal-survival"> Survival<br>(with Time-varying Covariates) </button> <div class="modal" id="modal-survival"> <div class="modal-content"> <span class="close">×</span> <ul> <li>proportional hazards model with <strong>time-varying covariates</strong><br> (using <a href="https://nerler.github.io/JointAI/reference/model_imp.html#survival-models-with-frailties-or-time-varying-covariates"> <code> + (1 | id)</code></a> and <a href="https://nerler.github.io/JointAI/reference/model_imp.html#survival-models-with-frailties-or-time-varying-covariates"> <code> timevar = "<...>"</code></a>)</li> <li><strong>baseline hazard</strong>: B-spline with <code>df_basehaz</code> degrees of freedom</li><br> <li></strong>joint model</strong> for longitudinal and survival data<br> (using <a href = "https://nerler.github.io/JointAI/reference/model_imp.html#modelling-multiple-models-simultaneously-joint-models"> <code>formula = list(<...>)</code></a> and <a href="https://nerler.github.io/JointAI/reference/model_imp.html#modelling-multiple-models-simultaneously-joint-models"> <code>timevar = "<...>"</code></a>)</li> <li><strong>association structure</strong> (<code>assoc_type</code>) <ul> <li>underlying value (<code>underl.value</code>)</li> <li>observed/imputed value (<code>obs.value</code>)</li> </ul> </li> <li><strong>multivariate joint model</strong>:<br> on CRAN <span style = "color:var(--nord12);">block-diagonal random effects</span><br> GitHub (rd_vcov) <span style = "color:var(--nord12);">full / block-diagonal / independent random effects</span></li> </ul> </div> </div> ] .col[ <button class="modal-button" href = "#modal-blackbox"> Shining Light into the Black Box</button> <div class="modal" id="modal-blackbox"> <div class="modal-content"> <span class="close">×</span> <ul> <li>monitor any node (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#other-parameters"> <code>monitor_params</code></a>)</li> <li>see the JAGS model (<code><object>$jagsmodel</code>)</li> <li>print model info (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#side-note-getting-information-about-of-the-imputation-models"> <code>list_models()</code></a>, <code><object>$models</code>)</li> <li>access the JAGS model object (<code><object>$model</code>)</li> <li>list all monitored parameters (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#parameters-of-the-analysis-model"> <code>parameters()</code></a>)</li> <li>list all regression coefficients (<code><object>$coef_list</code>)</li> </ul> </div> </div> <button class="modal-button" href = "#modal-compinfo"> Computational Infos</button> <div class="modal" id="modal-compinfo"> <div class="modal-content"> <span class="close">×</span> <code><object>$comp_info</code>: <ul> <li>start time</li> <li>duration</li> <li>package version</li> <li>parallel setting</li> </ul> </div> </div> <button class="modal-button" href = "#modal-subset"> Output Options</button> <div class="modal" id="modal-subset"> <div class="modal-content"> <span class="close">×</span> <ul> <li>Subset selection <ul> <li>iterations (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#subset-of-mcmc-samples"> <code>start</code>, <code>end</code>, <code>thin</code></a>)</li> <li>nodes (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#subset-of-parameters"> <code>subset</code></a>)</li> <li>chains (<code>exclude_chains</code>)</li> <li>sub-models (<code>outcome</code>)</li> </ul><br> <li>export imputed values<br>(<code>monitor_params = c(imps = TRUE)</code>, <code>get_MIdat()</code>)</li> </ul> </div> </div> <button class="modal-button" href = "#modal-prediction">Prediction</button> <div class="modal" id="modal-prediction"> <div class="modal-content"> <span class="close">×</span> <ul> <li>create "newdata" for effect plots (<a href="https://nerler.github.io/JointAI/reference/predDF.html"> <code>predDF()</code></a>)</li><br> <li>prediction (<code>predict()</code>) <ul> <li><strong>GLM(M)</strong> type models: linear predictor or outcome scale<br> <span style = "text-decoration: underline;">for now</span>: <span style = "color: var(--nord12);"> assumption random effects = 0</span> </li> <li><strong>prop. hazard models</strong>: (log) hazard, (-log) survival</li> <li><strong>joint model</strong> for longitudinal & survival data: <span style = "color:var(--nord11);">not <span style = "text-decoration: underline;">yet</span> possible</span></li> </ul></li> </ul> </div> </div> ] ] --- ## Summary Imputation via the posterior predictive distribution<br> ⇨ difficult to specify directly in complex settings * **MICE** uses direct specification<br> ⇨ violations in complex settings ⇨ bias * **Bayes**: specification via factorization of the joint distribution<br> ⇨ theoretically valid -- <br> **Extensions:** * different **analysis models** * "other" joint models * extension to **MNAR** --- class: the-end, center, middle layout: true count: false # Thank you for your attention! <div class="contact"> <i class="fas fa-envelope"></i> <a href="mailto:n.erler@erasmusmc.nl" class="email">n.erler@erasmusmc.nl</a>  <a href="https://twitter.com/N_Erler" target="_blank"><i class="fab fa-twitter"></i> N_Erler</a>  <a href="https://github.com/NErler" target="_blank"><i class="fab fa-github"></i> NErler</a>  <a href="https://nerler.com" target="_blank"><i class="fas fa-globe-americas"></i> https://nerler.com</a> </div> --- count: false <script type="text/javascript" async src="https://cdnjs.cloudflare.com/ajax/libs/mathjax/2.7.7/MathJax.js?config=TeX-MML-AM_CHTML"> </script> <script> // Get the button that opens the modal var btn = document.querySelectorAll("button.modal-button"); // All page modals var modals = document.querySelectorAll('.modal'); // Get the <span> element that closes the modal var spans = document.getElementsByClassName("close"); // When the user clicks the button, open the modal for (var i = 0; i < btn.length; i++) { btn[i].onclick = function(e) { e.preventDefault(); modal = document.querySelector(e.target.getAttribute("href")); modal.style.display = "block"; } } // When the user clicks on <span> (x), close the modal for (var i = 0; i < spans.length; i++) { spans[i].onclick = function() { for (var index in modals) { if (typeof modals[index].style !== 'undefined') modals[index].style.display = "none"; } } } // When the user clicks anywhere outside of the modal, close it window.onclick = function(event) { if (event.target.classList.contains('modal')) { for (var index in modals) { if (typeof modals[index].style !== 'undefined') modals[index].style.display = "none"; } } } </script>