When and Why Imputation with MICE / FCS Might Fail

<div class = "title">When and Why Imputation with MICE / FCS Might Fail</div>
<div class  = "author">Nicole Erler</div>
<div class = "institute">Assistant Professor<br>Department of Biostatistics</div>
<div class = "contact">
<i class="fas fa-envelope"></i> <a href="mailto:n.erler@erasmusmc.nl" class="email">n.erler@erasmusmc.nl</a> 
<a href= https://twitter.com/N_Erler><i class="fab fa-twitter"></i> N_Erler</a> 
  <a href= https://github.com/NErler><i class="fab fa-github"></i> NErler</a> 
  <a href= https://nerler.com><i class="fas fa-globe-americas"></i> https://nerler.com</a>
</div>

---

<div class="my-footer"><span>
<a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a>
&emsp;&emsp;&emsp;&emsp;
<a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a> &emsp;&emsp;&emsp;&emsp;
<a href = "https://nerler.com"><i class="fas fa-globe-americas"></i> nerler.com</a>
</span></div>

---
count: false

## Outline

* Motivation / Setting the Stage
* (Multiple) Imputation & MICE / FCS
* Potential Issues with MICE
* A Fully Bayesian Alternative
* Summary

]

<i class="far fa-images link-slides "></i>
<span class = "link-address">
<a href = "https://www.nerler.com/talk/greifswald2023/">https://nerler.com/#talks</a>
</span>

---

# Setting the Stage

---

## Motivation: Chronic Hepatitis C

* N `$\approx$` 700 patients with chronic hepatitis C
* follow-up: `$\leq$` 38 years
* several **baseline variables**:
  * age, sex, smoking, BMI, ...
  * **0 - 19% missing values** &emsp; &#8680; 66% complete cases
  
  
???

This project is motivated by a study in patients with a **chronic hepatitis C
infection**.

In many patients, the hepatitis has caused severe liver damage, and they will
need a transplant. In light of the scarcity of donor organs, it is important
to **identify factors that indicate a patient's risk** for liver failure.

We have observational data from approx. **700 patients**. In this data, several
**variables are only measured once**, at baseline.
This includes the patient's **age at baseline** and **sex** and the
**year of diagnosis**, as well as variables like **alcohol consumption**,
**smoking**, or **BMI**, which could change over time, but were only measured
once.

In some of these variables, we have **missing values** for **up to 19% of the
patients**. If we'd **exclude** everyone with missing values in these variables,
we would **lose about a third of the data**.

- - -

--
  
* multiple **biomarkers**<br>
  (repeatedly measured)
  
  
<img src="figures/plot_nobs.png", width = "650", style="position:absolute; bottom:45px; right:60px;">

???

We also have several **biomarkers** that are **repeatedly measured**,
but since the data is not from a prospective study, the measurements are 
**very irregular**. The number of measurements per patient is shown here in 
the histogram.

The **median** number of measurement times per patient is **15**, but there are a few
patients with just **one single observation** and even one patient with
**&#62; 700 measurement** occasions.

* **Time-to-event Outcome**:<br>
  Clinical events<br>
  (death, transplant, ...)

---

## Missing Biomarker Values

???

In the **plots** here, you see the trajectories of 6 different biomarkers of interest
for 7 patients. The values are standardized to have similar scales for the
purpose of this visualization.

In many cases, the biomarkers are **measured at the same time points**, but it
also happens that one biomarker value is missing or was measured at another
time.

Observations that we would loose if we'd **restrict our data** to only those
time points at which all 6 markers are measured are shown as transparent.

For the full dataset, this would result in the loss of 50% of the biomarker
observations.

[2:00 min]

---

## Analysis: Multivariate Joint Model

**Proportional hazards model** for time until event:
`$$h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\mathbf x_i^\top \boldsymbol\beta^{(tc)}}} +
\underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)$$`

???

To analyse this data, we'd like to use a multivariate joint model.

This model consists of **two parts**, a **proportional hazards** model for the
time-to-event outcome and a **multivariate mixed model** for the longitudinal
biomarkers.

The **proportional hazards** model has the usual form.
In the linear predictor, we have some variables that are **time constant**,
our baseline covariates, and we have the **time-varying biomarker values**.

- - -
--

**Longitudinal (mixed) model** for each biomarker `$k = 1, ... K$`:
`\begin{align*}
\mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\
 &= \underset{\text{fixed effects}}{\underbrace{\mathbf x_{ki}(t)^\top\boldsymbol\beta^{(k)}}} +
\underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}}
\qquad\scriptsize\text{with }
\begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D)
\end{align*}`

???

Each biomarker is modelled using a mixed model.

- - -

???

The **random effects** of the models for the different biomarkers are
**modelled jointly** in a multivariate normal distribution.

- - -

???

By using the linear predictors  `$\eta$` of the longitudinal models as covariates
in the time-to-event model, we assume that the underlying value of the biomarkers,
which may be measured with error, is associated with the risk of an event at 
the same time point.

It is, of course, possible to assume different association structures, for example,
a time lag or a cumulative effect, but since the focus here is on the
missing values, we'll keep it simple.

---

## Analysis: Multivariate Joint Model

**Proportional hazards model** for time until event:
`$$h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\bbox[#3B4252, 2px]{\color{var(--nord15)}{\mathbf x_i}^\top} \boldsymbol\beta^{(tc)}}} +
\underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)$$`

**Longitudinal (mixed) model** for each biomarker `$k = 1, ... K$`:
`\begin{align*}
\mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\
 &= \underset{\text{fixed effects}}{\underbrace{\bbox[#3B4252, 2px]{\color{var(--nord15)}{\mathbf x_{ki}(t)}^\top}\boldsymbol\beta^{(k)}}} +
\underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}}
\qquad\scriptsize\text{with }
\begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D)
\end{align*}`

???

In both model parts, we might want to use **baseline covariates**. However, in
our motivating data, those variables **have missing values**.

[3:22 min]

---

## How are Missing Data Handled?

[<i class="fas fa-file-alt"></i> Shivasabesan et al. (2018)](https://doi.org/10.1016/j.injury.2018.03.035)

* all trauma registry-based manuscripts (2015 - 2017)
* 241/539 (45%) described how missing data was handled:
    * 234 (43%) **complete case analysis** (CCA)
    * 18 (3%) **multiple imputation** (MI)
    * 34 (6%) combination of CCA & MI

???

Missing data are a widespread issue, especially in observational data. 
But how are they usually dealt with?

It is not easy to get a very good overview of how Missing Values are handled
in practice.

But there are a few studies that have looked into how missing data are handled
and reported in specific types of studies or fields of application.

- - -

A systematic review that was published in the clinical journal "Injury"
investigated missing data in trauma registries.

About half of the 539 included papers described how missing data was handled,
and the most common approach was again to perform a complete case analysis,
followed by a combination with multiple imputation or just multiple imputation.

- - -

[<i class="fas fa-file-alt"></i> Carroll et al. (2020)](https://doi.org/10.1186/s12874-020-01018-7)

* observational time-to-event studies in oncology using prop. hazards model (2012 - 2018)
* approx. 130/148 (88%) reported on the missing data method
    * 79 (53%) **complete case analysis**
    * 33 (22%) **multiple imputation**

???

In 2020, Carroll and colleagues published their investigation on missing
data methods in observational time-to-event studies that used proportional
hazards models. They included 148 studies, of which most reported on how
missing data was handled, and, again, half of the studies performed complete
case analyses, and the next most popular method was multiple imputation.

---

## How are Missing Data Handled?

[<i class="fas fa-file-alt"></i> Hunt et al. (2021)](https://doi.org/10.1002/pds.5245)

* multi-database pharmacoepidemiologic studies (2018 - 2019)
* 19/62 (31%) reported the missing data method:
    * 13 (21%) **complete case analysis**
    * 4 (6%) **multiple imputation**

???

Even more recently, a review on multi-database studies in pharmacoepidemiology
found that in that area of research, only on third of the studies reported on
the missing data method, and also there, complete cases analysis, followed by
multiple imputation was the most commonly reported method.

- - -

More and more clinical researchers are aware that in most settings complete case
analysis is not appropriate, and multiple imputation is somehow considered the
universal remedy to the missing data problem.

---

## Complete Case Analysis

Complete Case Analysis is 
<ul class="fa-ul">
<li><span class = "fa-li" style = "color:var(--nord11);"><i class="far fa-frown"></i></span>inefficient</li>
<li><span class = "fa-li" style = "color:var(--nord11);"><i class="far fa-frown"></i></span>usually biased</li>
</ul>

<br>

**For Example:**
<ul class="fa-ul">
<li>
<a href = "https://thestatsgeek.com/2013/07/06/when-is-complete-case-analysis-unbiased">
<span class = "fa-li"><i class="fab fa-wordpress"></i></span>
thestatsgeek.com (2013)</a>

<li>
<a href = "https://doi.org/10.1002/sim.3944">
<span class = "fa-li"><i class="fas fa-file-alt"></i></span>
White & Carlin (2010)</a>
</li>

<li>
<a href = "https://doi.org/10.1016/j.jclinepi.2006.01.015">
<span class = "fa-li"><i class="fas fa-file-alt"></i></span>
Van der Heijden et al. (2006)</a>
</li>

<li>
<a href = "https://doi.org/10.1016/j.jclinepi.2009.12.008">
<span class = "fa-li"><i class="fas fa-file-alt"></i></span>
Janssen et al. (2010)</a>
</li>
</ul>

]

.pull-right[
<img src="index_files/figure-html/ccdemo-1.png" width="120%" align = "right" style="display: block; margin: auto;" />

]

???

The issue with complete case analysis is that it is at least inefficient.

Depending on the number of variables that have missing values and the proportion
of values missing, you may loose quite a large proportion of your data.
Especially in observational studies that require the inclusion of many
covariates to reduce the bias due to confounding.

I have listed here a couple of resources on the issues associated with complete
case analysis.

---

# (Multiple) Imputation & MICE / FCS

---

## Development of (Multiple) Imputation

**In the 1960s/70s:**

Increasing number of missing values in the US census:<br>
&#8680; Need to solve the missing data problem.

<br>

.hlgt-box.w60[
**Idea / Requirement**<br>
&#8680; fix the missing data problem once (centrally)<br>
&#8680; supply completed data to many analysts
]

---

## Imputation of Missing Values

**Rubin (in the 1970s):**

* **uncertainty** about the missing value
* some values **more likely** than others
* relationship with **other** available **data**

???

The basic ideas about imputation go back to Donald Rubin and his work in the
1960s and 70s.

The important issue in working with missing values is the **uncertainty** about
what the value would have been. And so we **can't just pick** a single value and
fill it in because then we would ignore this uncertainty.

But we have some knowledge about the missing values. Usually, some values will
be more likely than others, and often there is a relationship between
the incomplete variable and the other data.

- - -

<strong>&#8680;</strong> missing values have a <strong>distribution</strong>

</div>

???

So, it makes sense to assume  that missing values have a distribution and that
we need a model to describe how an incomplete variable is related to the other
data.

- - - -

<div class = "bayes-text">
Bayesian Concept!!!
</div>

Impute from the **predictive distribution**
of the .n15[missing values] given the observed values:
`$$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})$$`
]

???

And this means that we can impute the missing values by **sampling from the
predictive distribution of the missing values conditional on the observed data**.

[4:00 min]

---

## Multiple Imputation

???

The principle of multiple imputation is simple:

* We impute each missing value multiple times, in order to take into account the
uncertainty that we have about the missing value, and thereby create a set of
completed datasets.

* Each of these datasets can then be analysed by standard complete data methods,
and the results of these separate analyses are then pooled to obtain an overall
result.

One often cited advantage of this approach is that we can run this imputation
once, and then re-use the imputed data for multiple analyses performed by 
different researchers.

---

## Imputation Step

<br>

* `$\mathbf y$`: **response** variable
* `$\color{var(--nord15)}{\mathbf x_1}$`: **incomplete** covariate
* `$\mathbf x_2$`, `$\mathbf x_3$`: **complete** covariates

]

**Sampling from the predictive distribution:&emsp;&nbsp;&nbsp;&nbsp;**

]
]

???

The most interesting part of this multiple imputation is the imputation step.

Missing values should be imputed from their predictive distribution.

The idea for imputation of `$x_1$` is similar to predicting the missing values.

We can think of this as fitting a model to the cases where `$x_1$` is observed,
**for example**, this could be a **linear regression model** that has the 
**response `$y$`** and the other covariates as its covariates.

From this model, we can get estimates for the regression coefficients `$\beta$`
and the standard deviation of the residuals `$\sigma$`.

- - - - -

* Define a model for the incomplete variable
`$\color{var(--nord15)}{\mathbf x_1}$`, e.g.
`$$\color{var(--nord15)}{\mathbf x_1} = \beta_0 + \beta_1 \mathbf y +
\beta_2 \mathbf x_2 + \beta_3 \mathbf x_3 + \varepsilon$$`
to get estimates `$\boldsymbol{\hat\beta}$` (and `$\hat\sigma$`)
{{content}}

<br>

* Sample
`$\color{var(--nord15)}{\hat{x}_{i1}}$` from the predictive distribution
`$$p(\color{var(--nord15)}{x_{i1}} \mid y_i, x_{i2}, x_{i3}, \boldsymbol{\hat\beta}, \hat\sigma)$$`

???

And using these parameter estimates we can then calculate the fitted or
predicted value of `$x_1$` for those cases where `$x_1$` is missing.

---

## Multivariate Missingness

<div style = "text-align: center; margin-bottom: 25px; font-weight: 600;">
Multivariate<br>Missingness</div>

<table class="simpletable2">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\color{var(--nord15)}{\mathbf x_1}$</th>
<th>$\color{var(--nord15)}{\mathbf x_2}$</th>
<th>$\color{var(--nord15)}{\mathbf x_3}$</th>
<th>$\ldots$</th>
</tr>
<tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td></td>
</tr>
</table>

<br>

<div style = "display: block; margin: auto;">
<ul class = "arrowlist">
<li>$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})$ is<br>
     <strong>multivariate</strong></li>
</ul>
</div>

</div>

???

In practice, we usually have missing values in multiple variables, which means
that this **predictive distribution is multivariate**.

- - -

<i class="fas fa-exclamation-triangle fa-2x" style = "color: var(--nord11);"></i>
<div style = "position: relative; left: 75px; top: -57px; margin-bottom: -57px;">
When covariates are of <strong>mixed type</strong>:<br>
<span class = "red bold">no closed form</span>.
</div>

&#8680; approximate `$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})$`

</div>

???

When we also have covariates of mixed type, meaning continuous and categorical,
this multivariate distribution does not have a closed-form, and so we need to
approximate it.

- - -

**"Original" approach:**<br>
Joint Model (Multiple) Imputation

* Approximate 
`$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})$`
with a known multivariate distribution.

**Most common approach:**<br>
<span class = "n4">
<span class = "n6 bold">M</span><span class = "s">ultivariate</span>
<span class = "n6 bold">I</span><span class = "s">mputation by</span>
<span class = "n6 bold">C</span><span class = "s">hained</span>
<span class = "n6 bold">E</span><span class = "s">quations</span>
</span><br>
<span class = "n4">
<span class = "n6 bold">F</span><span class = "s">ully</span>
<span class = "n6 bold">C</span><span class = "s">onditional</span>
<span class = "n6 bold">S</span><span class = "s">pecification</span>
</span><br>

* Based on the idea of the **Gibbs sampler**

---

## Joint Model MI

For example, assume all incomplete variables and the outcome are (latent) normal:
`$$\mathbf x_k \text{ binary }\rightarrow \text{ latent } \boldsymbol{\hat{\mathbf{x}}_k} \text{ is standard normal: }
  \left\{\begin{array}{c} \mathbf x_k = 1\\ \mathbf x_k = 0\end{array}\right.\text{ if } \begin{array}{c}
  \boldsymbol{\hat{\mathbf{x}}_k}\geq \kappa_k\\
  \boldsymbol{\hat{\mathbf{x}}_k} < \kappa_k\end{array}$$`

.pull-left[
&#8680; `$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})$` is
multivariate normal<br>
&#8680; easy to sample from

<br>

&#8680; <i class="fab fa-r-project"></i> package [**jomo**](https://CRAN.R-project.org/package=jomo).

]

.pull-right[
<img src="index_files/figure-html/unnamed-chunk-6-1.png" width="100%" style="display: block; margin: auto;" />
]

---

## MICE / FCS

Iterative sampling from the **full-conditionals**.

* Specify a model for each incomplete variable
with **all other variables in the linear predictor**.

]
.col[

<strong>Example:</strong>

<br>

<div class = "s">
\begin{alignat}{10}
\color{var(--nord15)}{\mathbf x_1} &= \theta^{(1)}_0 &+& \theta^{(1)}_1 \mathbf y &+&
\theta^{(1)}_2 \color{var(--nord15)}{\mathbf x_2} &+& \theta^{(1)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\
\color{var(--nord15)}{\mathbf x_2} &= \theta^{(2)}_0 &+& \theta^{(2)}_1 \mathbf y &+&
\theta^{(2)}_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta^{(2)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\
\color{var(--nord15)}{\mathbf x_3} &= \theta_0^{(3)} &+& \theta_1^{(3)} \mathbf y &+&
\theta_2^{(3)} \color{var(--nord15)}{\mathbf x_1} &+& \theta_3^{(3)} \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon
\end{alignat} 
</div>
]
]

???

The most common approach to handle such multivariate missingness is 
the MICE algorithm, which is also called imputation using a
**fully conditional specification** because that is exactly what it does.

It follows the idea of the Gibbs sampler to obtain a sample from the
multivariate distribution by iterative sampling from the full conditional
distribution of each of the incomplete variables.

**In practice**, these full conditionals are typically specified using
**regression models** that include **all other variables in the linear
predictor**.

An implication of this is that we assume linear associations between the 
dependent variables, here, the incomplete variables, and the independent 
variables, in our case the other covariates and the response of our 
analysis model of interest.

[5:15 min]

* Draw random **starting values** from the observed data.

* **Cycle** through all incomplete variables until convergence.

* Keep last value for each missing observation:<br>
  **&#8680; one imputed dataset**

**Multiple runs** with different starting values<br>
&#8680; multiple imputed datasets

---

# Potential Issues with MICE

---

## A Simple Example

.pull-left[
**Implied Assumption:**<br>
<span>Linear association</span>
between `$\color{var(--nord15)}{\mathbf x_1}$` and `$\mathbf y$`:

`$$\color{var(--nord15)}{\mathbf x_1} = 
\theta_0 + \bbox[#3B4252, 2pt]{\theta_1 \mathbf y} +
\theta_2 \mathbf x_2 + \theta_3 \mathbf x_3 + \ldots$$`

]

???

One thing that is implied when we use such simple regression models for the
imputation is that there is a linear association between incompl. covariate and
the response (and other covariates).

.pull-right[
<br>
But what if 
`$$\mathbf y = \beta_0 + 
\bbox[#3B4252, 2pt]{\beta_1 \color{var(--nord15)}{\mathbf x_1} +
\beta_2 \color{var(--nord15)}{\mathbf x_1}^2} +
\beta_3 \mathbf x_2 + \ldots$$`

]

???

But that is of course not always the case. 
What if we have a setting where we assume that there is a non-linear 
association, for example quadratic?

---

## Non-linear Associations

* .bold[true association]: non-linear
* .bold.n7[imputation assumption]: linear

???

But that is, of course, not always the case. 
We might, for example, have a quadratic association between the response `$y$` 
and the incomplete covariate `$x_1$`.

- - -

???

If we now impute missing values in `$x_1$` using a standard regression model,
all the imputed values follow a linear relationship.

- - -

???

And even if we correctly assume the non-linear association in the analysis model
that we fit on the imputed data, we will get biased results.

I'm talking about a quadratic effect here because I can visualize it more easily
than the non-linear relationship implied by the survival part of our joint
model.

---

## Time-to-Event Outcomes

(Simple) **Proportional Hazards Model:**
`$$h_i(t) = h_0(t) \exp\left(\color{var(--nord15)}{\mathbf x_i}^\top \boldsymbol\beta^{(tc)}\right)$$`

**Likelihood**
`$$p(T_i, \delta_i \mid \color{var(--nord15)}{\mathbf x_i}, \boldsymbol\beta^{(tc)}) =
\left[h_0(T_i) \exp \left\{ \color{var(--nord15)}{\mathbf x_i} \boldsymbol\beta^{(tc)}\right\}\right]^{\delta_i}  
\exp \left[-\int_0^{T_i} h_0(s)\exp\left(
\color{var(--nord15)}{\mathbf x_i} \boldsymbol\beta^{(tc)}\right)ds\right]$$`

???

But we can see the issue when we look at this simple proportional hazards
model (I've excluded the time-varying part for now) and the corresponding
likelihood for a subject `$i$`.

This likelihood describes a complex and non-linear relationship between
the response (in this case, the event time `$T$` and event indicator `$\delta$`)
and incomplete covariates `$x$`.

- - -

.bold.red[Inconsistent] with the (naive) imputation model:
`$$\color{var(--nord15)}{\mathbf x_1} = \theta_0 + \theta_1 \mathbf T + \theta_2 \boldsymbol \delta + \theta_3 \mathbf x_2 + \ldots$$`

.footnote[
[<i class = "fas fa-file-alt"></i> White & Royston (2009)](https://doi.org/10.1002/sim.3618)
&nbsp;
[<i class = "fas fa-file-alt"></i> Bartlett et al. (2015)](https://doi.org/10.1177/0962280214521348)
]

???

But the regression model we would use to impute values in, say `$x_1$`,
would look like this and imply a linear association.

This inconsistency between the analysis model and imputation model will 
result in bias.

[6:30 min]

---

## Multi-level Data

.flex-grid[
.col[
<img src="index_files/figure-html/unnamed-chunk-11-1.png" width="100%" style="display: block; margin: auto;" />
]

<table class="simpletable2">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\color{var(--nord15)}{\mathbf x_1}$</th>
<th>$\mathbf x_2$</th>
<th>$\mathbf x_3$</th>
</tr>
<tr><td></td><td colspan = "4"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class="hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
</tr>
</table>

]
]

???

We have multi-level data when we have measured the same variable repeatedly in
the same patient, but also when we have a clustering structure in our data,
for example in a multi-center study. In both cases, observations from the 
same patient, or the same cluster are not independent.

We typically represent this type of data in long format, so that we now have
multiple rows that belong to the same patient.

---

## Multi-level Data

.footnote[
[<i class = "fas fa-file-alt"></i> Erler et al. (2016)](https://doi.org/10.1002/sim.6944)
]

For example:
`$$p(\color{var(--nord15)}{\text{sex}} \mid \text{age}, ..., \underset{\text{time varying}}{\underbrace{\text{ALT}, \text{AST}, ...}})$$`

]

<table class="simpletable2">
<tr>
<th></th>
<th>age</th>
<th>sex</th>
<th>time</th>
<th>ALT</th>
<th>AST</th>
<th>$\ldots$</th>
</tr>
<tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr class="hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td></td>
</tr>
</table>
]
]

???

But also the other component of our data, the **longitudinally
measured biomarkers, give us an additional challenge** when dealing with the
missing values in the baseline covariates.

Because we have **unbalanced** biomarker measurements, our data is in **long format**
where multiple rows belong to the same patient.

To impute missing values in baseline covariates, for example, in `sex`,
we need the distribution of that covariate, conditional on everything else, which 
includes the longitudinally measured biomarker values, for example, `AST` and `ALT`.

Given that `sex` does not change over time, the imputed values should also all
be the same in the rows belonging to the same patient.

- - -

<i class = "fas fa-bolt red"></i> time-varying imputations for time-constant covariates!<br>
<i class = "fas fa-bolt red"></i> ignore correlation between repeated observations?

???

If we would now apply the MICE algorithm to our data in long format,
we'd get **time-varying imputations for the time-constant covariates**.
And if we're not careful, we **might also ignore** that rows belonging to the
same patient are **correlated**.

- - -

<ul class = "arrowlist">
<li> <strong>Average</strong> imputed values?</li>
<li> <strong>Summarize</strong> time-varying variables &#8680; wide format?</li>
</ul>

Imputation of baseline covariates is not straightforward.

???

So what could we do?

We could try to **average the repeated imputations** of baseline covariates, but
that would imply that all biomarker values are equally related to baseline
variables, irrespective of when they were measured (and in our data we have 
follow-up of more than 20 years...).

Or we could somehow **summarize time-varying variables** to get our data into
wide format.

But neither strategy really captures the hierarchical structure of the data.

[7:45 min]

---

## Non-linear Associations & Multi-level

In settings with **non-linear associations** and **multi-level data**
the **correct predictive distribution**
$$ p(\color{var(--nord15)}{\mathbf x_{mis}} \mid \text{everything else})$$
may not have a closed form.

<br>

<div style="width: 85%; padding: 1.15em 2em; background-color: var(--nord0); margin: auto; 
display: block; text-align: center;">
<strong><span style="font-size: 1.5rem;">&#8680;</span> 
We cannot easily specify the correct imputation model directly.</strong>
</div>

???

So, in summary, whenever we have a non-linear association between the response
and incomplete covariates, including time-to-event outcomes,
or when we have multi-level data, it is not straightforward to specify the
correct imputation model, meaning the correct predictive distribution of the 
missing values, directly because usually they do not have a closed form.

---

# A Fully Bayesian Alternative

---

## Getting the Correct Distribution

.flex-grid[
.col[
* We need `$\;p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \mathbf y, \ldots)$`
* We know `$\;p(\mathbf y \mid \color{var(--nord15)}{\mathbf X_{mis}}, \ldots)$`
]

.col[
<div style = "color: var(--nord3);">
<ul>
<li>$\mathbf X_{mis}$: incomplete covariate(s)</li>
<li>$\mathbf y$: outcome(s)</li>
<li>$\ldots$: everything else </li>
</ul>
</div>
]
]

???

The interesting question now is: How do we get this predictive distribution?
What we need is the distribution of the incomplete variable `$x$` conditional on
the response and everything else.

And what we do know is the distribution of the response conditional on the
covariates and everything else, because this is the analysis model that we have
specified already.

And here Bayes comes to the rescue.

**Bayes Theorem:**

`\begin{align}
p(\color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta \mid \mathbf y, \mathbf X_{obs}) = 
\frac{p(\mathbf y, \mathbf X_{obs} \mid \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\;
p(\color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}{p\left(\mathbf y, \mathbf X_{obs}\right)}
&\propto \underset{\text{joint distribution}}{\underbrace{p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}
\end{align}`

???

Because Bayes theorem allows us to convert a conditional distribution into the
reversed conditional, and so using this theorem we can derive that the
distribution of interest, the distribution of the incomplete covariate
conditional on everything else is proportional to the distribution of the
response conditional on the covariates, and the distribution of the covariates
and everything else. We can now factorize this further, but I find it easier to
just look at it as a specification of the joint distribution of everything.

???

And so, in the Bayesian approach, we go a different route.

From Bayes theorem, we can derive that the posterior distribution of the 
missing values and parameters `$\theta$` is proportional to the 
joint distribution of all our data, observed and unobserved, and the parameters.

I use `$y$` here to represent the response of an analysis model of interest, 
whatever that might be.

- - -

--
<br>

This joint distribution does not have a closed form either **&#8680; factorization!**

???

This joint distribution does not have a closed-form either, but we can
specify it conveniently as a product of conditional distributions.

[8:30 min]

---

## Factorization of the Joint Distribution

<br>

`$$p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = 
p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\;\;
p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)\;\;
p(\boldsymbol\theta)$$`

???

We can, for instance, split it into the distribution for the response `$y$`
conditional on the covariates, the distribution of the covariates, observed
and missing, and the distribution of the parameter vector `$\boldsymbol\theta$`.

- - -

\[
p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = 
\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}\;\;
\underset{\text{covariate model(s)}}{\underbrace{p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)}}\;\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}
\]

</div>

???

By factorizing the joint distribution like this, we have written it as the product
of our analysis model, a model for the covariates, and the prior distributions
for the parameters.

The nice thing about this division is that it works irrespective of how complex
our analysis model of interest is.

- - -

<br>

**For Example:**

`$$p(\mathbf y, \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3, \boldsymbol\theta) = 
\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3, \boldsymbol\theta)}}\;
\underset{\text{covariate model(s)}}{\underbrace{p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3 \mid \boldsymbol\theta)}}\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$`

???

And we can use this principle for our example. So we can write the joint 
distribution of `$y$` and `$x_1$`, `$x_2$` and `$x_3$` as the conditional distribution
of `$y$` given the covariates., the distribution of the covariates conditional on the parameters, and, of course, prior distributions for those parameters.

---

## Factorization of the Joint Distribution

<br>

`$$p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = 
\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}\;\;
\underset{\text{covariate model(s)}}{\underbrace{p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)}}\;\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$`

<br>

**(Multivariate) joint model for longitudinal and survival data:**

`$$p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf X}, \mathbf b, \boldsymbol\theta) =
\underset{\text{analysis model}}{\underbrace{
\underset{\substack{\text{survival}\\\text{model}}}{\underbrace{
p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol \theta)}}\;\;
\underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{
p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)\;
p(\mathbf b\mid \boldsymbol \theta)}}
}}\;
\bbox[#242933, 3pt, border: 2px solid #242933]{\underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{
p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta)}}}\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$`

???

For example, for our multivariate joint model, the analysis model consists of 
multiple parts, the time-to-event sub-model and the model for the longitudinal
biomarkers, which also involves the specification of the distribution of the
random effects `$\mathbf b$`.

---

## Factorization of the Joint Distribution

<br>

**(Multivariate) joint model for longitudinal and survival data:**

`$$p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf X}, \mathbf b, \boldsymbol\theta) =
\underset{\text{analysis model}}{\underbrace{
\underset{\substack{\text{survival}\\\text{model}}}{\underbrace{
p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol \theta)}}\;\;
\underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{
p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)\;
p(\mathbf b\mid \boldsymbol \theta)}}
}}\;
\bbox[#2E3440, 3pt, border: 2px solid #88C0D0]{\underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{
p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta)}}}\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$`

???

Most of this is straightforward to specify because it is the same that we'd need
to specify when doing a Bayesian analysis on complete data.

The only extra thing we need to do is specify the distribution of the
covariates.

---

## Covariate Models

<i class="fas fa-exclamation-triangle" style="color: var(--nord11);"></i>
When `$\color{var(--nord15)}{\mathbf X}$` is multivariate & of mixed type &#8680; .bold.red[no closed form]

<div style = "margin: auto; display: block; text-align: center;">
<strong>&#8680; Factorization!</strong>
</div>

???

But we still have a multivariate distribution with no closed-form. The trick is
to apply the same strategy as for the joint distribution and to specify it as a
product of conditional distributions.

- - -

For example

`\begin{align}
 p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \color{var(--nord15)}{\mathbf x_3(t)}, \mathbf x_4(t) \mid \boldsymbol\theta)
=\, & p(\color{var(--nord15)}{\mathbf x_3(t)} \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_4(t), \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\
& p(\mathbf x_4(t) \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \boldsymbol\theta) &
\color{var(--nord3)}{\small\text{e.g., GLMM}}\\
& p(\color{var(--nord15)}{\mathbf x_1} \mid \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLM}}\\
& \color{var(--nord3)}{p(\mathbf x_2 \mid \boldsymbol\theta)} & \color{var(--nord3)}{\small\text{(not needed)}}
\end{align}`

???

For example, if we had 4 covariates, `$x_1$` and `$x_3$` having missing values, and
`$x_1$` and `$x_2$` being baseline covariates while `$x_3$` and `$x_4$` are time-varying,
we could specify the multivariate distribution using this sequence of models.

Here I always condition on those other covariates for which I haven't specified
a model yet.

And when we are smart about the order in which we take the covariates, we can
start with the models for time-varying covariates because then we don't need 
to include those anymore into the linear predictors of the subsequent models
for the baseline covariates.

This solves our issue with the hierarchical structure of the data.

- - -

<div style="width: 60%; padding: 1em 0 0.7em; background-color: var(--nord0); margin: auto; display: block; text-align: center; border: solid 2px var(--nord8); border-radius: 10px;">
covariate models $\neq$ imputation models!
</div>

???

An important thing to realize is that these models here are not the imputation
models. They are still part of the specification of the joint distribution of
all the data.

[10:45 min]

---

## Imputation Models

<div class = "container">
<div class = "box">
<div class = "box-row">
<div class = "box-cell" style = "background: var(--nord0);">imputation<br>models</div>&nbsp;
<div class = "box-cell">$\propto$</div>
<div class = "box-cell" style = "background: var(--nord0);">joint<br>distribution</div>&nbsp;
<div class = "box-cell">$=$</div> 
<div class = "box-cell" style = "background: var(--nord0);">analysis<br>model</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord0);">covariate<br>models</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord0)">priors</div>
</div>
</div>
</div>

<div style = "position: absolute; left: 430px; bottom: 250px;">
<span style = "font-weight: bold; color: var(--nord7);">Gibbs sampler</span>
</div>

???

The imputation models are then derived as full-conditional distributions from 
this joint distribution in the same way we do it with the posterior distributions
of the parameters, typically using Gibbs sampling,

- - -

???

The important difference to the MICE algorithm is that in the Bayesian approach,
we do specify the joint distribution and derive the imputation models as full
conditionals from this joint distribution. The models are usually much more
complex than the ones we specify when using MICE and can provide a sample from
the joint distribution of the missing values, even in complex settings.

In MICE, we specify the imputation models directly, which means they typically
have a relatively simple structure, and they could be incompatible with each
other and the analysis model. Hence they may not produce a joint distribution
that approximates the correct joint distribution well enough.

[11:30 min]

<div class = "MH">
Metropolis-Hastings
</div>

---

## Advantages of the Bayesian Approach
`$$p(\mathbf y, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta) =
\bbox[#2E3440, 5pt]{\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf X}, \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}})}}}\;
p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta_x)\;
p(\color{var(--nord14}{\boldsymbol\theta_{y\mid x}}, \boldsymbol\theta_x)\qquad
\color{var(--nord3)}{\text{with } \boldsymbol\theta = (\boldsymbol\theta_{y\mid x}, \boldsymbol\theta_x)}$$`

* specification of the joint distribution<br>
  &#8680; assures **compatibility** of imputation models

???

This is one of the advantages of the Bayesian approach. Because we specify the
joint distribution, we can ensure that all imputation models are compatible
with each other.

- - -

* use of the analysis model in the specification of the joint distribution<br>
  &#8680; **parameters of interest** `$\color{var(--nord14)}{\boldsymbol\theta_{y\mid x}}$` are estimated directly<br>
  &#8680; **non-linear associations** are taken into account<br>
  &#8680; assures **congeniality**

???

And because the analysis model is part of the specification of the joint
distribution, we can be sure that the imputation models are also compatible with
the analysis model, even when there are non-linear associations with covariates.

In addition, the parameters of interest, so the parameters of the analysis
model, are directly estimated. We do not have separate steps for imputation and
analysis, but everything happens simultaneously in the same procedure.

- - -

* response is not in any linear predictor<br>
  &#8680; no problem to use **complex outcomes**

???

Because we can specify the joint distribution so that one of the conditional
models is the analysis model, we can usually make sure that the outcome does not
need to appear in the linear predictor of models for covariates. This makes it
possible to use this approach in settings where we have complex outcomes that
could not be easily included in the linear predictor of some or all of the
covariates.

[12:30 min]

---

## In practice: <i class="fab fa-r-project"></i> Package JointAI

.footnote[
[<i class = "fas fa-file-alt"></i> Erler et al. (2021)](https://doi.org/10.18637/jss.v100.i20)
&nbsp;
[<i class = "fas fa-globe-americas"></i> nerler.github.io/JointAI](https://nerler.github.io/JointAI)
]

<div class = "box-cell" style="width:180px;">
<strong>Specification</strong><br>of the joint<br> distribution:
</div>

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>User</strong></div>
<div class = "box-cell" style = "background: var(--nord1); border: 2px solid var(--nord8); ">analysis<br>model</div>
</div>

&nbsp;

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div>
<div class = "box-row">
<div class = "box-cell" style = "background: var(--nord1);">covariate<br>models</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord1)">priors</div>
</div></div>

</div></div></div>

???

To use this Bayesian approach, for example, to fit the multivariate joint model
in the motivating example, you can use the R package JointAI, short for Joint
Analysis and Imputation.

To specify the joint distribution, the user only has to specify the analysis
model of interest, and the software sets defaults for the covariate models
and priors.

Several different analysis models are available, and the specification is almost
the same as for standard functions in R like `lm()` or `glm()`.

- - -

<br>

<div class = "box-cell" style="width:180px;">
<strong>Estimation:</strong>
</div>

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div>
<div class = "box-cell" style = "background: var(--nord1);">pre-processing</div>
</div>

&nbsp;

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>rjags / JAGS</strong></div>
<div class = "box-cell" style = "background: var(--nord1);">MCMC sampling</div>
</div>

&nbsp;

</div></div></div>

???

The model estimation is done via MCMC with the help of JAGS, an external
and freely available software that does Gibbs sampling.

[13:15]

---

## JointAI: Model Types & Specification

* `lm_imp()`
* `glm_imp()`
* `clm_imp()`
* `mlogit_imp()`
* `betareg_imp()`
* `lognorm_imp()`
]

* `lme_imp()`
* `glme_imp()`
* `clmm_imp()`
* `mlogitmm_imp()`
* `betamm_imp()`
* `lognormmm_imp()`
]

* `coxph_imp()`
* `survreg_imp()`
* `JM_imp()`
]
]

**For Example:**

```r
mod <- lm_imp(SBP ~ gender + age + alc * creat, data = NHANES, n.iter = 200)
```

.footnote[
[Full documentation at https://nerler.github.io/JointAI/](https://nerler.github.io/JointAI/)
]

---

# Summary

---

## In Comparison

<table class = "ppt" style = "margin-top: -20px;">
<tr>
<th style = "text-align: center;"></th>
<th style = "width: 25%;">MICE</th>
<th style = "width: 30%;">Joint Model MI</th>
<th>Bayesian Analysis</th>
</tr>
<tr>
<td></td>
<td colspan = "2" style = "text-align: center;">
 <strong>separate</strong> imputation & analysis</td>
<td><strong>simultaneous</strong> analysis & imputation</td>
</tr>
<tr>
<td></td>
<td colspan = "2" style = "text-align: center;"><strong>direct</strong> specification of imputation model</td>
<td><strong>indirect</strong> specification of the imputation model</td>
</tr>
<tr style = "padding-top: 0px;">
<td style = "text-align: center;"><i class="fas fa-thumbs-up fa-2x" style = "color: var(--nord10)"></i></td>
<td><ul>
<li>simple settings</li>
</ul></td>
<td><ul>
<li>simple settings</li>
<li>multi-level data</li>
</ul></td>
<td><ul>
<li>non-linear associations</li>
<li>time-to-event outcomes</li>
<li>multi-level data</li>
<li>more complex analyses</li></ul></td>
</tr>
<tr>
<td style = "text-align: center;"><i class="fas fa-thumbs-down fa-2x" style = "color: var(--nord10)"></i></td>
<td>
<ul>
<li>non-linear associations</li>
<li>complex outcomes / data structure</li>
</ul></td>
<td><ul>
<li>non-linear associations</li>
<li>many incomplete variables / complex random effects</li>
</ul>
</td>
<td><ul>
<li> very large datasets</li>
</ul></td>
</tr>
<tr style = "text-align: center;">
<td><i class="fab fa-r-project fa-2x" style = "color: var(--nord10);"></i></td>
<td><span style = "color: var(--nord10); font-weight: bold;">mice</span></td>
<td><span style = "color: var(--nord10); font-weight: bold;">jomo</span></td>
<td><span style = "color: var(--nord10); font-weight: bold;">JointAI</span></td>
</tr>
</table>

???

MICE and Joint Model imputation are two different options to perform the
imputation step in a multiple imputation procedure. This means, that in both
cases the imputation is completely separate from the analysis. This separation
can be convenient, when the same incomplete data is used in multiple analyses,
but it is also introduces the risk of having imputation models that are not
compatible with the analysis, as for example when there is a non-linear
association in the analysis model.

In the Bayesian approach, analysis and imputation are combined. This combination
assures that the imputation and analysis models do not contradict each other. It
is, however, possible to extract the imputed values sampled in the Bayesian
approach so that this method could also serve as the imputation step in a
multiple imputation.

In MICE and the Joint model imputation we specify the imputation models
directly. This is what makes these approaches difficult to use when the
incomplete variables do not just have simple linear associations with the other
variables.

In the Bayesian approach we specify the likelihood for the data, but the imputed
values are sampled from the posterior distribution that is derived from the
likelihood and the prior. With all the advanced sampling techniques available
nowadays, this also works when the posterior does not have a closed form, and so
this approach is well suited for settings with complex associations., while MICE
is better suited for simpler settings.

Joint model imputation can handle multi-level settings, but assumes linear
associations between all sub-models.

Because the Bayesian approach is more computationally intensive, it may be less
well suited for very large datasets.

All three approaches are available in R, as the R packages mice, jomo and
JointAI.

---

## Summary

Imputation requires sampling from 
`$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs})$`.

**Multiple** Imputation: 
* Practical approach do handle the uncertainty about the missing values:<br>
  &#8680; Sample just a "few" values from `$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs})$`.
* Pooling to take into account within & between imputation variation.

For models that involve
* survival outcomes 
* other non-linear associations
* multi-level data

it is **difficult to specify** `$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs})$` directly.

---

## Summary

**MICE / FCS**<br>
Approximate `$p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs})$` by sampling from directly specified 
full-conditionals `$p(\color{var(--nord15)}{\mathbf x_k} \mid \cdot)$`.

<ul class = "arrowlist">
<li>incompatibility / uncongeniality issues</li>
<li>bias</li>
</ul>

???

MICE tries to approximate this distribution using the idea of the Gibbs sampler.
But because the full-conditionals may not be a good approximation of the 
correct full conditionals, it has issues with incompatibility and uncongeniality
in complex settings and may result in bias.

- - -

<strong>Fully Bayesian approach</strong>
<ul>
<li>Specification of the joint distribution $p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}, \mathbf{\mathcal D}_{obs}, \boldsymbol
\theta)$ via factorization.</li>
<li>Derive $p(\color{var(--nord15)}{\mathbf x_k} \mid \cdot)$ from this joint
distribution.
<ul class = "arrowlist">
<li>theoretically valid, compatible &amp; congenial</li>
</ul>
</li>
</ul>

<span style = "float: right;">
.sgrey[... MAR, model fit, other model assumptions, ...]
</span>

???

Using a fully Bayesian approach, we can specify the joint distribution as a
sequence of conditional models that gives us a number of advantages.
The imputation models are then derived from this joint distribution, resulting
in a theoretically valid approach that assures compatibility and congeniality.

---

# Thank you for your attention!

<div class="contact">
<i class="fas fa-envelope"></i> <a href="mailto:n.erler@erasmusmc.nl" class="email">n.erler@erasmusmc.nl</a>&emsp;
<a href="https://twitter.com/N_Erler" target="_blank"><i class="fab fa-twitter"></i> N_Erler</a>&emsp;
  <a href="https://github.com/NErler" target="_blank"><i class="fab fa-github"></i> NErler</a>&emsp;
  <a href="https://nerler.com" target="_blank"><i class="fas fa-globe-americas"></i> https://nerler.com</a>
</div>

---
count: false