Bayesian Methods for Handling Missing Values in Complex Settings

<div class = "title">Bayesian Methods for Handling Missing Values in Complex Settings</div>
<div class = "subtitle">CMStatistics 2021</div>
<div class  = "author">Nicole Erler</div>
<div class = "institute">Department of Biostatistics</div>
<div class = "contact">
<i class="fas fa-envelope"></i> <a href="mailto:n.erler@erasmusmc.nl" class="email">n.erler@erasmusmc.nl</a> 
<a href= https://twitter.com/N_Erler><i class="fab fa-twitter"></i> N_Erler</a> 
  <a href= https://github.com/NErler><i class="fab fa-github"></i> NErler</a> 
  <a href= https://nerler.com><i class="fas fa-globe-americas"></i> https://nerler.com</a>
</div>

---

<div class="my-footer"><span>
<a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a>
&emsp;&emsp;&emsp;&emsp;
<a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a> &emsp;&emsp;&emsp;&emsp;
<a href = "https://nerler.com"><i class="fas fa-globe-americas"></i> nerler.com</a>
</span></div>

---

## How do we Handle Missing Data?

[Shivasabesan et al. (2018)](https://doi.org/10.1016/j.injury.2018.03.035)

* all trauma registry-based manuscripts (2015 - 2017)
* 241/539 (45%) described how missing data was handled:
    * 234 (43%) **complete case analysis** (CCA)
    * 18 (3%) **multiple imputation** (MI)
    * 34 (6%) combination of CCA & MI

???

Life isn't perfect, and neither is real world data.
And, so in most datasets, there are some or even many observations missing,
at least when you work with clinical data like I do.

In randomized controlled trials, missing data is usually restricted to missing
outcome values when participants drop out early, but in observational studies
missing values are a lot bigger issue.

We need to include many covariates in the model to reduce bias due to confounding,
and the more variables we need, the more likely it is that some haven't been 
recorded very thoroughly.
When data is collected retrospectively things are usually worse then in prospective
studies, and proportion of missing values up to 50% in some variables are not
uncommon.

Missing data are such a widespread issue, that it is not even feasible to get an
overview of how many studies have to deal with them.

But there are a few studies that have looked into how missing data are handled
and reported in specific types of studies or fields of application.

- - -

A more recent systematic review in the clinical journal "Injury"
investigated missing data in trauma registries.
About half of the 539 included papers described how missing data was handled,
and the most common approach was again to perform a complete case analysis,
followed by a combination with multiple imputation or just multiple imputation.

[Carroll et al. (2020)](https://doi.org/10.1186/s12874-020-01018-7)

* observational time-to-event studies in oncology using prop. hazards model (2012 - 2018)
* approx. 130/148 (88%) reported on the missing data method
    * 79 (53%) **complete case analysis**
    * 33 (22%) **multiple imputation**

???

Last year, Carroll and colleagues published their investigation on missing
data methods in observational time-to-event studies that used proportional
hazards models. They included 148 studies, of which most reported on how
missing data was handled, and, again, half of the studies performed complete
case analyses, and the next most popular method was multiple imputation.

---

## How do we Handle Missing Data?

[Hunt et al. (2021)](https://doi.org/10.1002/pds.5245)

* multi-database pharmacoepidemiologic studies (2018 - 2019)
* 19/62 (31%) reported the missing data method:
    * 13 (21%) **complete case analysis**
    * 4 (6%) **multiple imputation**

???

Even more recently, a review on multi-database studies in pharmacoepidemiology
found that in that area of research, only on third of the studies reported on
the missing data method, and also there, complete cases analysis, followed by
multiple imputation was the most commonly reported method.

- - -

More and more clinical researchers are aware that in most settings complete case
analysis is not appropriate, and multiple imputation is somehow considered the
universal remedy to the missing data problem.

---

## Multiple Imputation

???

The principle of multiple imputation is simple:

* We impute each missing value multiple times, in order to take into account the
uncertainty that we have about the missing value, and thereby create a set of
completed datasets.

* Each of these datasets can then be analysed by standard complete data methods,
and the results of these separate analyses are then pooled to obtain an overall
result.

One often cited advantage of this approach is that we can run this imputation
once, and then re-use the imputed data for multiple analyses performed by 
different researchers.

---

## Imputation Step

* `$\mathbf y$`: **response** variable
* `$\color{var(--nord15)}{\mathbf x_1}$`: **incomplete** covariate
* `$\mathbf x_2$`, `$\mathbf x_3$`: **complete** covariates

]

**Sampling from the predictive distribution:&emsp;&nbsp;&nbsp;&nbsp;**

]
]

???

The most interesting part of this multiple imputation is the imputation step.

Missing values should be imputed from their predictive distribution.

The idea for imputation of `$x_1$` is similar to predicting the missing values.

We can think of this as fitting a model to the cases where `$x_1$` is observed,
**for example**, this could be a **linear regression model** that has the 
**response `$y$`** and the other covariates as its covariates.

From this model, we can get estimates for the regression coefficients `$\beta$`
and the standard deviation of the residuals `$\sigma$`.

- - - - -

* Define a model for the incomplete variable
`$\color{var(--nord15)}{\mathbf x_1}$`, e.g.
`$$\color{var(--nord15)}{\mathbf x_1} = \beta_0 + \beta_1 \mathbf y +
\beta_2 \mathbf x_2 + \beta_3 \mathbf x_3 + \varepsilon$$`
to get estimates `$\boldsymbol{\hat\beta}$` (and `$\hat\sigma$`)
{{content}}

<br>

* Sample
`$\color{var(--nord15)}{\hat{x}_{i1}}$` from the predictive distribution
`$$p(\color{var(--nord15)}{x_{i1}} \mid y_i, x_{i2}, x_{i3}, \boldsymbol{\hat\beta}, \hat\sigma)$$`

???

And using these parameter estimates we can then calculate the fitted or
predicted value of `$x_1$` for those cases where `$x_1$` is missing.

---

## Multiple Imputation using FCS (MICE)

<div style = "text-align: center; margin-bottom: 25px;">
<strong>Multivariate<br>Missingness</strong></div>

<table class="simpletable">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\color{var(--nord15)}{\mathbf x_1}$</th>
<th>$\color{var(--nord15)}{\mathbf x_2}$</th>
<th>$\color{var(--nord15)}{\mathbf x_3}$</th>
<th>$\ldots$</th>
</tr>
<tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td></td>
</tr>
</table>

]

**Most common approach:**<br>
<span style = "font-weight: bold;">MICE</span> 
  <span style = "color: var(--lgrey);">(multivariate imputation by chained equations)</span><br>
  <span style = "font-weight: bold;">FCS</span> 
  <span style = "color: var(--lgrey);">(fully conditional specification), i.e.,</span>
  
  `$$p(\color{var(--nord15)}{x_{mis}} \mid \text{everything else})$$`

<div>
\begin{alignat}{10}
\color{var(--nord15)}{\mathbf x_1} &= \beta_0 &+& \beta_1 \mathbf y &+&
\beta_2 \color{var(--nord15)}{\mathbf x_2} &+& \beta_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\
\color{var(--nord15)}{\mathbf x_2} &= \alpha_0 &+& \alpha_1 \mathbf y &+&
\alpha_2 \color{var(--nord15)}{\mathbf x_1} &+& \alpha_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\
\color{var(--nord15)}{\mathbf x_3} &= \theta_0 &+& \theta_1 \mathbf y &+&
\theta_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta_3 \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon
\end{alignat} 
</div>

{{content}}
]
]

???

In practice, we usually have missing values in just a single variable, but in
multiple variables.

And the most common approach to imputation in this setting is MICE, short for
**multivariate imputation by chained equations**, an approach that is also
called **fully conditional specification**.

The principle is an extension to what we've seen on the previous slide.
We impute missing values using models that have all other data in their linear
predictor.
- - -

--
<br>

* iterative
* based on the idea of the Gibbs sampler

???

Because in these imputation models we now have incomplete covariates, we use an
iterative algorithm. You start by randomly drawing starting values from the 
observed part of the data, and then you cycle through the
incomplete variables and impute one at a time.

Once we have imputed each missing value, we start again with the first
variable, but now use the imputed values of the other variables instead of the
starting values, and we do this a few times until the algorithm has converged.

This algorithm is based on the idea of the Gibbs sampler, which allows you to 
sample from a multivariate distribution by iteratively sampling from the
full conditionals derived from the joint distribution.

---

## A Simple Example

.pull-left[
**Implied Assumption:**<br>
<span>Linear association</span>
between `$\color{var(--nord15)}{\mathbf x_1}$` and `$\mathbf y$`:

`$$\color{var(--nord15)}{\mathbf x_1} = 
\theta_0 + \bbox[#3B4252, 2pt]{\theta_1 \mathbf y} +
\theta_2 \mathbf x_2 + \theta_3 \mathbf x_3 + \ldots$$`

]

???

One thing that is implied when we use such simple regression models for the
imputation is that there is a linear association between incompl. covariate and
the response (and other covariates).

.pull-right[
<br>
But what if 
`$$\mathbf y = \beta_0 + 
\bbox[#3B4252, 2pt]{\beta_1 \color{var(--nord15)}{\mathbf x_1} +
\beta_2 \color{var(--nord15)}{\mathbf x_1}^2} +
\beta_3 \mathbf x_2 + \ldots$$`

]

???

But that is of course not always the case. 
What if we have a setting where we assume that there is a non-linear 
association, for example quadratic?

---

## Non-linear Associations

.pull-left[
* <span style="font-weight: bold; color:var(--nord4);">true association</span>: non-linear
* <span style="font-weight: bold; color:var(--nord7);">imputation assumption</span>: linear
]

???

If we

* correctly assume a non-linear association in the analysis model
* but a linear association in the imputation model

we introduce bias, even if we analyse the imputed data under the correct assumption
because the imputed values will follow the linear association and thereby
reduce the shape of the structure in the observed data.

---
count: false
class: animated, fadeIn

## Non-linear Associations

.pull-left[
* <span style="font-weight: bold; color:var(--nord4);">true association</span>: non-linear
* <span style="font-weight: bold; color:var(--nord7);">imputation assumption</span>: linear
]

.pull-right[
<span style="font-size: 56pt; position: relative; right: 110px; bottom: 20px;">} &#8680;</span>
<span style = "color: var(--nord11); font-size: 1.2rem; font-weight: bold; position: relative; bottom: 30px; right: 100px;">
bias!</span>
]

???

If we

* correctly assume a non-linear association in the analysis model
* but a linear association in the imputation model

---

## Time-to-Event Outcomes

(Simple) **Proportional Hazards Model:**
`$$h(t) = h_0(t) \exp\left(\color{var(--nord15)}{\mathbf x}^\top \boldsymbol\beta\right)$$`

**Log-likelihood**
`$$p(\mathbf T, \boldsymbol \delta \mid \color{var(--nord15)}{\mathbf x}, \boldsymbol\beta) =
\boldsymbol\delta \left(\log h_0(T) + \color{var(--nord15)}{\mathbf x} \boldsymbol\beta\right) - 
\int_0^T h_0(s)\exp\left(
\color{var(--nord15)}{\mathbf x} \boldsymbol\beta\right)ds$$`

<br>

Inconsistent with the (naive) imputation model:
`$$\color{var(--nord15)}{\mathbf x} = \theta_0 + \theta_1 \mathbf T + \theta_2  \boldsymbol\delta + \theta_3\ldots$$`
???

The corresponding log likelihood then is given by this formula here.

This is the likelihood used in our analysis model, and it implies a non-linear
association between the response variables, the event time and event indicator,
and the potentially incomplete covariates `$x$`.

---

## Multi-level Data

.flex-grid[
.col[
<img src="index_files/figure-html/unnamed-chunk-7-1.png" width="100%" style="display: block; margin: auto;" />
]

<table class="simpletable">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\color{var(--nord15)}{\mathbf x_1}$</th>
<th>$\mathbf x_2$</th>
<th>$\mathbf x_3$</th>
</tr>
<tr><td></td><td colspan = "4"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class="hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
</tr>
</table>

]
]

???

We have multi-level data when we have measured the same variable repeatedly in
the same patient, but also when we have a clustering structure in our data,
for example in a multi-center study. In both cases, observations from the 
same patient, or the same cluster are not independent.

We typically represent this type of data in long format, so that we now have
multiple rows that belong to the same patient.

---

## FCS in Multi-level Data

If `$\color{var(--nord15)}{\mathbf x_1}$` is **level-1**:<br>
`$p(\color{var(--nord15}{x_{i1}(t)} \mid \text{everything else})$` could be
a mixed model, e.g.:
`$$x_{i1}(t) = 
\underset{\color{var(--nord3)}{\text{fixed effects}}}{\color{var(--nord3)}{\underbrace{\color{var(--nord4)}{\theta_0 + \theta_1 y_i(t) + \theta_2 x_{2i}(t) + \theta_3 x_{3i}(t)}}}} + 
\underset{\color{var(--nord3)}{\substack{\text{random}\\\text{effects}}}}{\color{var(--nord3)}{\underbrace{\color{var(--nord4)}{\mathbf b_i \mathbf z_i(t)}}}} +
\varepsilon_i(t)$$`

{{content}}
]

<table class="simpletable">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\color{var(--nord15)}{\mathbf x_1}$</th>
<th>$\mathbf x_2$</th>
<th>$\mathbf x_3$</th>
</tr>
<tr><td></td><td colspan = "4"; style = "padding: 0px;"><hr /></td>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
</tr>
<tr class="hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
</tr>
</table>
]
]

???
If `$x_1$` is a level-2 variable:

* imputed values for the same subject should be identical
{{content}}
--
* Use a mixed model?&emsp;{{content}}
--
<i class="fas fa-thumbs-down" style = "color:var(--nord11);"></i>
{{content}}
--
* Use a GLM?&emsp;{{content}}
--
<i class="fas fa-thumbs-down" style = "color:var(--nord11);"></i>
{{content}}
--
<br>
<ul class = "arrowlist">
<li><strong>Average</strong> imputed values?</li>
<li><strong>Summarize</strong> time-varying variables?</li>
<li>Imputation in <strong>wide format</strong>?</li>
</ul>

???
&#8680; For incomplete baseline variables imputation in wide format might be better(?)

---

## Non-linear Associations & Multi-level

In settings with **non-linear associations** and **multi-level data**
the **correct predictive distribution**
$$ p(\color{var(--nord15)}{\mathbf x_{mis}} \mid \text{everything else})$$
may not have a closed form.

<br>

<div style="width: 85%; padding: 1.15em 2em; background-color: var(--nord0); margin: auto; 
display: block; text-align: center;">
<strong><span style="font-size: 1.5rem;">&#8680;</span> 
We cannot easily specify the correct imputation model directly.</strong>
</div>

???

So, in summary, whenever we have a non-linear association between the response
and incomplete covariates, including time-to-event outcomes,
or when we have multi-level data, it is not straightforward to specify the
correct imputation model, meaning the correct predictive distribution of the 
missing values, directly because usually they do not have a closed form.

---

## Getting the Correct Distribution

.flex-grid[
.col[
* We need `$\;p(\color{var(--nord15)}{\mathbf x} \mid \mathbf y, \ldots)$`
* We know `$\;p(\mathbf y \mid \color{var(--nord15)}{\mathbf x}, \ldots)$`
]

.col[
<div style = "color: var(--nord3);">
<ul>
<li>$\mathbf x$: incomplete covariate(s)</li>
<li>$\mathbf y$: outcome(s)</li>
<li>$\ldots$: everything else </li>
</ul>
</div>
]
]

???

The interesting question now is: How do we get this predictive distribution?
What we need is the distribution of the incomplete variable `$x$` conditional on
the response and everything else.

And what we do know is the distribution of the response conditional on the
covariates and everything else, because this is the analysis model that we have
specified already.

And here Bayes comes to the rescue.

**Bayes Theorem:**

`\begin{align}
p(\color{var(--nord15)}{\mathbf x} \mid \mathbf y, \ldots) = 
\frac{p(\mathbf y \mid \color{var(--nord15)}{\mathbf x}, \ldots)\;
p(\color{var(--nord15)}{\mathbf x},\ldots)}{p\left(\mathbf y, \ldots\right)}
&\propto \underset{\text{joint distribution}}{\underbrace{p(\mathbf y, \color{var(--nord15)}{\mathbf x}, \ldots)}}
\end{align}`

???

Because Bayes theorem allows us to convert a conditional distribution into the
reversed conditional, and so using this theorem we can derive that the
distribution of interest, the distribution of the incomplete covariate
conditional on everything else is proportional to the distribution of the
response conditional on the covariates, and the distribution of the covariates
and everything else. We can now factorize this further, but I find it easier to
just look at it as a specification of the joint distribution of everything.

For example:

$$
p(\mathbf y, \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3, \boldsymbol\theta) = 
\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3, \boldsymbol\theta)}}\;
\underset{\text{covariate model(s)}}{\underbrace{p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3 \mid \boldsymbol\theta)}}\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}
$$

???

And we can use this principle for our example. So we can write the joint 
distribution of `$y$` and `$x_1$`, `$x_2$` and `$x_3$` as the conditional distribution
of `$y$` given the covariates., the distribution of the covariates conditional on the parameters, and, of course, prior distributions for those parameters.

---

## Covariate Models

For example

`\begin{align}
 p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \color{var(--nord15)}{\mathbf x_3(t)}, \mathbf x_4(t) \mid \boldsymbol\theta)
=\, & p(\color{var(--nord15)}{\mathbf x_3(t)} \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_4(t), \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\
& p(\mathbf x_4(t) \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \boldsymbol\theta) &
\color{var(--nord3)}{\small\text{e.g., GLMM}}\\
& p(\color{var(--nord15)}{\mathbf x_1} \mid \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLM}}\\
& \color{var(--nord3)}{p(\mathbf x_2 \mid \boldsymbol\theta)} & \color{var(--nord3)}{\small\text{(can be omitted)}}
\end{align}`

???

[Jump to Hep C example default values slide](#hepcdefaults)

<br>
<div style="width: 65%; padding: 1.15em 2em; background-color: var(--nord0); margin: auto; display: block; text-align: center;">
Covariate models $\neq$ imputation models!
</div>

---

## Estimation

<div class = "container">
<div class = "box">
<div class = "box-row">
<div class = "box-cell" style = "background: var(--nord0);">imputation<br>models</div>&nbsp;
<div class = "box-cell">$\propto$</div>
<div class = "box-cell" style = "background: var(--nord0);">joint<br>distribution</div>&nbsp;
<div class = "box-cell">$=$</div> 
<div class = "box-cell" style = "background: var(--nord0);">analysis<br>model</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord0);">covariate<br>models</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord0)">priors</div>
</div>
</div>
</div>

]
.col[
<br>
<div style = "width: 650px;">
Estimation via MCMC<br>
&#8680; Gibbs sampler
</div>
]
]

???
In most cases, the posterior distribution will not have a closed form and so 
we won't be able to derive it analytically.

Instead, Markov Chain Monte Carlo methods are used to create a sample from the
posterior distribution. The results from the Bayesian analysis are then
presented as summary measures of this sample, usually the mean and the 2.5% and
97.5% quantiles, which form the 95% credible interval.

Since, in the Bayesian framework, the result is given in terms of the
probability distribution of the unknown parameters conditional on the data that
was observed, these results have a more intuitive interpretation than
frequentist results.

---

## Advantages of the Bayesian Approach
`$$p(\mathbf y, \mathbf x, \boldsymbol\theta) =
\bbox[#2E3440, 5pt]{\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf x, \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}})}}}\;
p(\mathbf x \mid \boldsymbol\theta_x)\;
p(\color{var(--nord14}{\boldsymbol\theta_{y\mid x}}, \boldsymbol\theta_x)\qquad
\color{var(--nord3)}{\text{with } \boldsymbol\theta = (\boldsymbol\theta_{y\mid x}, \boldsymbol\theta_x)}$$`

* specification of the joint distribution<br>
  &#8680; assures **compatibility** of imputation models

* use of the analysis model in the specification of the joint distribution<br>
  &#8680; **parameters of interest** `$\color{var(--nord14)}{\boldsymbol\theta_{y\mid x}}$` are estimated directly<br>
  &#8680; **non-linear associations** are taken into account<br>
  &#8680; assures **congeniality**

* response is not in any linear predictor<br>
  &#8680; no problem to use **complex outcomes**

---

## In Complex Settings

**(Multivariate) joint model for longitudinal and survival data:**

`$$p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf x}, \mathbf b, \boldsymbol\theta) =
\underset{\text{analysis model}}{\underbrace{
\underset{\substack{\text{survival}\\\text{model}}}{\underbrace{
p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf x}, \boldsymbol \theta)}}\;\;
\underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{
p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf x}, \boldsymbol\theta)\;
p(\mathbf b\mid \boldsymbol \theta)}}
}}\;\;
\underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{
p(\color{var(--nord15)}{\mathbf x} \mid \boldsymbol\theta)}}\;\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$`

???

For example, for our multivariate joint model for longitudinal and survival
data. There, the model for the response consists of multiple parts,
the time-to-event model, and the multivariate mixed model for the longitudinal
biomarkers.

<br>

<div class = "container">
<div class = "box">
<div class = "box-row">
<div class = "box-cell" style = "background: var(--nord0);">joint<br>distribution</div>&nbsp;
<div class = "box-cell">$=$</div> 
<div class = "box-cell" style = "background: var(--nord0);">analysis<br>model</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord0);">covariate<br>models</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord0)">priors</div>
</div>
</div>
</div>

???

And this means that we can always divide the joint distribution into the 
analysis model, a model for the covariates, and prior distributions.

And this also works for fairly complex models.

---

## In practice: <i class="fab fa-r-project"></i> Package JointAI

<div class = "box-cell" style="width:180px;">
<strong>Specification</strong><br>of the joint<br> distribution:
</div>

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>User</strong></div>
<div class = "box-cell" style = "background: var(--nord1); border: 2px solid var(--nord8); ">analysis<br>model</div>
</div>

&nbsp;

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div>
<div class = "box-row">
<div class = "box-cell" style = "background: var(--nord1);">covariate<br>models</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord1)">priors</div>
</div></div>

</div></div></div>

<br>

<div class = "box-cell" style="width:180px;">
<strong>Estimation:</strong>
</div>

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div>
<div class = "box-cell" style = "background: var(--nord1);">pre-processing</div>
</div>

&nbsp;

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>rjags / JAGS</strong></div>
<div class = "box-cell" style = "background: var(--nord1);">MCMC sampling</div>
</div>

&nbsp;

</div></div></div>

---

## JointAI: Model Types & Specification

* `lm_imp()`
* `glm_imp()`
* `clm_imp()`
* `mlogit_imp()`
* `betareg_imp()`
* `lognorm_imp()`
]

* `lme_imp()`
* `glme_imp()`
* `clmm_imp()`
* `mlogitmm_imp()`
* `betamm_imp()`
* `lognormmm_imp()`
]

* `coxph_imp()`
* `survreg_imp()`
* `JM_imp()`
]
]

**For Example:**

```r
mod <- lm_imp(SBP ~ gender + age + alc * creat, data = NHANES, n.iter = 200)
```

.footnote[
[Full documentation at https://nerler.github.io/JointAI/](https://nerler.github.io/JointAI/)
]

---

## JointAI: More Features

.flex-grid[
.col[
<button class="modal-button" href="#myModal1">Model Specification</button>
<div id="myModal1" class="modal">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li>auxiliary variables (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#auxiliary-variables"><code>auxvars</code></a>)</li>
    <li> shrinkage (ridge)
    (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#shrinkage"><code>shrinkage</code></a>)
    <li> truncation of continuous distributions 
    (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#functions-with-restricted-support"><code>trunc</code></a>)
    <li> setting reference categories 
    (<a href="https://nerler.github.io/JointAI/articles/ModelSpecification.html#reference-categories"><code>refcats</code></a>)
    <li> change hyper-parameters 
    (<a href = "https://nerler.github.io/JointAI/articles/ModelSpecification.html#hyper-parameters"><code>hyperpars</code></a>)
    <li> parameters to be monitored 
    (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html"><code>monitor_params</code></a>)</li>
    <li> export the JAGS model
    (<code>modelname</code>,
    <code>modeldir</code>,
    <code>keep_model</code>)
    <li> customize the JAGS model (<code>modelname</code>,
    <code>modeldir</code>,
    <code>overwrite</code>)
    </ul>
  </div>
</div>

<button class="modal-button" href="#myModal2">MCMC settings</button>
<div id="myModal2" class="modal">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li>adaptive phase 
    (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#adaptive-phase"><code>n.adapt</code></a>)</li>
    <li>sampling phase 
        (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#sampling-iterations"><code>n.iter</code></a>)</li>
    <li>number of chains
        (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#number-of-chains"><code>n.chains</code></a>)</li>
    <li>thinning
        (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#thinning"><code>thin</code></a>)</li>
    <li>initial values 
        (<a href="https://nerler.github.io/JointAI/articles/MCMCsettings.html#initial-values"><code>inits</code></a>,
    <code>&lt;object&gt;$mcmc_settings$inits</code>)</li>
    <li>seed value (<code>seed</code>)</li>
    <li>continue sampling 
    (<a href="https://nerler.github.io/JointAI/reference/add_samples.html"><code>add_samples()</code></a>)</li>
    </ul>
  </div>
</div>

<button class="modal-button" href = "#modal-plots">Plots</button>
<div class="modal" id="modal-plots">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li>data distribution 
        (<a href="https://nerler.github.io/JointAI/articles/VisualizingIncompleteData.html#visualize-the-distribution-of-each-variable">
        <code>plot_all()</code></a>)</li>
    <li>missing data pattern 
        (<a href = "https://nerler.github.io/JointAI/articles/VisualizingIncompleteData.html#missing-data-pattern">
        <code>md_pattern()</code></a>)</li>
    <li>MCMC chains 
        (<a href = "https://nerler.github.io/JointAI/articles/AfterFitting.html#trace-plot">
        <code>traceplot()</code></a>)</li>
    <li>posterior density 
        (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#density-plot"><code>densplot()</code></a>)</li>
    <li>imputed vs observed data 
        (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#export-of-imputed-values">
        <code>plot_imp_distr()</code></a>)</li>
    <li>Monte Carlo Error 
    (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#sec:mcerror"><code>plot(MC_error(&lt;object&gt;))</code></a>)</li>
    </ul>
  </div>
</div>

<button class="modal-button" href = "#modal-parallel">Parallel Sampling</button>
<div class="modal" id="modal-parallel">
  <div class="modal-content">
    <span class="close">&times;</span>
    Using the R packages <strong>future</strong> and <strong>doFuture</strong>:
<pre><code class="r hljs remark-code">
<div class="remark-code-line"><span class="hljs-keyword">library</span>(<span class="hljs-string">"doFuture"</span>)</div>
<div class="remark-code-line">plan(multiprocess(workers = <span class="hljs-number">6</span>))</div>
<div class="remark-code-line">registerDoFuture()</div>
<div class="remark-code-line"></div>
<div class="remark-code-line"><span class="hljs-comment">## fit JointAI model ...</span></div><br>
<div class="remark-code-line"><span class="hljs-comment">## to re-set to sequential evaluation:</span></div>
<div class="remark-code-line">plan(sequential())</div>
</code></pre>
</div>
</div>

]

<button class="modal-button" href = "#modal-clm">
Cumulative Logit Models</button>
<div class="modal" id="modal-clm">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li>invert the OR 
    (<a href="https://nerler.github.io/JointAI/reference/model_imp.html#cumulative-logit-mixed-models"><code>rev</code></a>)
        \[\log\left(\frac{P(y_i \color{var(--nord14)}{>} k)}{P(y_i \color{var(--nord14)}{\leq} k)}\right) = \gamma_k + \eta_i
        \quad\text{vs}\quad
        \log\left(\frac{P(y_i \color{var(--nord14)}{\leq} k)}{P(y_i \color{var(--nord14)}{>} k)}\right) = \gamma_k + \eta_i\]
    </li>
    <li>partial proportional odds 
        (<a href="https://nerler.github.io/JointAI/reference/model_imp.html#cumulative-logit-mixed-models"><code>nonprop</code></a>)
        \[\log\left(\frac{P(y_i > k)}{P(y_i \leq k)}\right) = \gamma_k + \eta_i
        \quad\text{vs}\quad
        \log\left(\frac{P(y_i \leq k)}{P(y_i > k)}\right) = \gamma_k + \eta_i \color{var(--nord14)}{+ \eta_{ki}}\]
    </li>
    </ul>
  </div>
</div>

<button class="modal-button" href = "#modal-multi-level">
Multi-level Models</button>
<div class="modal" id="modal-multi-level">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li><strong>lme4</strong> or <strong>nlme</strong> type specification</li>
    <li>nested and crossed random effects<br>(determined by data structure)</li>
    <li>2, 3, 4, ... levels of grouping,
        i.e., <code> ... + (time | id) + (1 | group) + (1 | center ) + (1 | country) + ...</code></li>
    <li>multi-level structure also for survival models</li><br>
    <li>uses hierarchical centering, i.e., 
    $\mathbf b \sim N(\mathbf X\boldsymbol\beta, \mathbf D)$</li>
    </ul>
    <a href="https://nerler.github.io/JointAI/reference/model_imp.html#model-formulas">&#8680; For more info see the package documentation.</a>
  </div>
</div>

<button class="modal-button" href = "#modal-survival">
Survival<br>(with Time-varying Covariates) </button>
<div class="modal" id="modal-survival">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li>proportional hazards model with <strong>time-varying covariates</strong><br>
    (using <a href="https://nerler.github.io/JointAI/reference/model_imp.html#survival-models-with-frailties-or-time-varying-covariates">
    <code> + (1 | id)</code></a> and
    <a href="https://nerler.github.io/JointAI/reference/model_imp.html#survival-models-with-frailties-or-time-varying-covariates">
    <code> timevar = "&lt;...&gt;"</code></a>)</li>
    <li><strong>baseline hazard</strong>: B-spline with <code>df_basehaz</code> degrees of freedom</li><br>
    <li></strong>joint model</strong> for longitudinal and survival data<br>
    (using <a href = "https://nerler.github.io/JointAI/reference/model_imp.html#modelling-multiple-models-simultaneously-joint-models">
    <code>formula = list(&lt;...&gt;)</code></a> and
    <a href="https://nerler.github.io/JointAI/reference/model_imp.html#modelling-multiple-models-simultaneously-joint-models">
    <code>timevar = "&lt;...&gt;"</code></a>)</li>
    <li><strong>association structure</strong> (<code>assoc_type</code>)
        <ul>
        <li>underlying value (<code>underl.value</code>)</li>
        <li>observed/imputed value (<code>obs.value</code>)</li>
        </ul>
    </li>
    <li><strong>multivariate joint model</strong>:<br>
    full / block-diagonal / independent random effects</li>
    </ul>
  </div>
</div>

]

<button class="modal-button" href = "#modal-blackbox">
Shining Light into the Black Box</button>
<div class="modal" id="modal-blackbox">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li>monitor any node 
        (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#other-parameters">
        <code>monitor_params</code></a>)</li>
    <li>see the JAGS model (<code>&lt;object&gt;$jagsmodel</code>)</li>
    <li>print model info (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#side-note-getting-information-about-of-the-imputation-models">
    <code>list_models()</code></a>,
    <code>&lt;object&gt;$models</code>)</li>
    <li>access the JAGS model object
    (<code>&lt;object&gt;$model</code>)</li>
    <li>list all monitored parameters 
    (<a href="https://nerler.github.io/JointAI/articles/SelectingParameters.html#parameters-of-the-analysis-model">
    <code>parameters()</code></a>)</li>
    <li>list all regression coefficients 
    (<code>&lt;object&gt;$coef_list</code>)</li>
    </ul>
  </div>
</div>

<button class="modal-button" href = "#modal-compinfo">
Computational Infos</button>
<div class="modal" id="modal-compinfo">
  <div class="modal-content">
    <span class="close">&times;</span>
    <code>&lt;object&gt;$comp_info</code>:
    <ul>
    <li>start time</li>
    <li>duration</li>
    <li>package version</li>
    <li>parallel setting</li>
    </ul>
  </div>
</div>

<button class="modal-button" href = "#modal-subset">
Output Options</button>
<div class="modal" id="modal-subset">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li>Subset selection
        <ul>
        <li>iterations 
            (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#subset-of-mcmc-samples">
            <code>start</code>, <code>end</code>, <code>thin</code></a>)</li>
        <li>nodes
        (<a href="https://nerler.github.io/JointAI/articles/AfterFitting.html#subset-of-parameters">
        <code>subset</code></a>)</li>
        <li>chains (<code>exclude_chains</code>)</li>
        <li>sub-models (<code>outcome</code>)</li>
        </ul><br>
    <li>export imputed values<br>(<code>monitor_params = c(imps = TRUE)</code>,
<code>get_MIdat()</code>)</li>
    </ul>
  </div>
</div>

<button class="modal-button" href = "#modal-prediction">Prediction</button>
<div class="modal" id="modal-prediction">
  <div class="modal-content">
    <span class="close">&times;</span>
    <ul>
    <li>create "newdata" for effect plots 
        (<a href="https://nerler.github.io/JointAI/reference/predDF.html">
        <code>predDF()</code></a>)</li><br>
    <li>prediction (<code>predict()</code>)
    <ul> 
    <li><strong>GLM(M)</strong> type models: linear predictor or outcome scale<br>
    <span style = "text-decoration: underline;">for now</span>:
    <span style = "color: var(--nord12);">
    assumption random effects = 0</span>
    </li>
    <li><strong>prop. hazard models</strong>: (log) hazard, (-log) survival</li>
    <li><strong>joint model</strong> for longitudinal & survival data:
    <span style = "color:var(--nord11);">not 
    <span style = "text-decoration: underline;">yet</span> possible</span></li>
    </ul></li>
    </ul>
  </div>
</div>

]

---

## Summary

Imputation via the posterior predictive distribution<br>
&#8680; difficult to specify directly in complex settings

* **MICE** uses direct specification<br>
  &#8680; violations in complex settings &#8680; bias
* **Bayes**: specification via factorization of the joint distribution<br>
  &#8680; theoretically valid

<br>

Valid under **MAR** <br>
&#8680; Extension to **MNAR** by specifying models for missingness process.

Allows **tailoring to** model **specific features** of the data.

---
class: the-end, center, middle
layout: true
count: false

# Thank you for your attention!

<div class="contact">
<i class="fas fa-envelope"></i> <a href="mailto:n.erler@erasmusmc.nl" class="email">n.erler@erasmusmc.nl</a>&emsp;
<a href="https://twitter.com/N_Erler" target="_blank"><i class="fab fa-twitter"></i> N_Erler</a>&emsp;
  <a href="https://github.com/NErler" target="_blank"><i class="fab fa-github"></i> NErler</a>&emsp;
  <a href="https://nerler.com" target="_blank"><i class="fas fa-globe-americas"></i> https://nerler.com</a>
</div>

---
count: false

// All page modals
var modals = document.querySelectorAll('.modal');

// Get the <span> element that closes the modal
var spans = document.getElementsByClassName("close");

// When the user clicks the button, open the modal
for (var i = 0; i < btn.length; i++) {
 btn[i].onclick = function(e) {
    e.preventDefault();
    modal = document.querySelector(e.target.getAttribute("href"));
    modal.style.display = "block";
 }
}

// When the user clicks on <span> (x), close the modal
for (var i = 0; i < spans.length; i++) {
 spans[i].onclick = function() {
    for (var index in modals) {
      if (typeof modals[index].style !== 'undefined') modals[index].style.display = "none";    
    }
 }
}

// When the user clicks anywhere outside of the modal, close it
window.onclick = function(event) {
    if (event.target.classList.contains('modal')) {
     for (var index in modals) {
      if (typeof modals[index].style !== 'undefined') modals[index].style.display = "none";    
     }
    }
}
</script>