Dealing with Missing Values in Multivariate Joint Models for Longitudinal and Survival Data

<div class = "title">Dealing with Missing Values in Multivariate Joint Models for Longitudinal and Survival Data</div>
<div class = "subtitle">IBC2022 Riga, July 2022</div>
<div class  = "author">Nicole Erler</div>
<div class = "institute">Erasmus Medical Center</div>
<div class = "contact">
<i class="fas fa-envelope"></i> <a href="mailto:n.erler@erasmusmc.nl" class="email">n.erler@erasmusmc.nl</a> 
<a href= https://twitter.com/N_Erler><i class="fab fa-twitter"></i> N_Erler</a> 
  <a href= https://github.com/NErler><i class="fab fa-github"></i> NErler</a> 
  <a href= https://nerler.com><i class="fas fa-globe-americas"></i> https://nerler.com</a>
</div>

---

<div class="my-footer"><span>
<a href="https://twitter.com/N_Erler"><i class="fab fa-twitter"></i> N_Erler</a>
&emsp;&emsp;&emsp;&emsp;
<a href="https://github.com/NErler"><i class="fab fa-github"></i> NErler</a> &emsp;&emsp;&emsp;&emsp;
<a href = "https://nerler.com"><i class="fas fa-globe-americas"></i> nerler.com</a>
</span></div>

---
count: false

## Conflict of Interest

<br><br>
I have no current or past relationships with commercial entities.

???

I do not have any conflicts of interest.

---

## Motivation: Chronic Hepatitis C

* N `$\approx$` 700 patients with chronic hepatitis C
* follow-up: `$\leq$` 38 years
* several **baseline variables**:
  * age, sex, smoking, BMI, ...
  * **0 - 19% missing values** &emsp; &#8680; 66% complete cases
  
  
???

This project is motivated by a study in patients with a **chronic hepatitis C
infection**.

In many patients, the hepatitis has caused severe liver damage, and they will
need a transplant. In light of the scarcity of donor organs, it is important
to **identify factors that indicate a patient's risk** for liver failure.

We have observational data from approx. **700 patients**. In this data, several
**variables are only measured once**, at baseline.
This includes the patient's **age at baseline** and **sex** and the
**year of diagnosis**, as well as variables like **alcohol consumption**,
**smoking**, or **BMI**, which could change over time, but were only measured
once.

In some of these variables, we have **missing values** for **up to 19% of the
patients**. If we'd **exclude** everyone with missing values in these variables,
we would **lose about a third of the data**.

- - -

--
  
* multiple **biomarkers**<br>
  (repeatedly measured)
  
  
<img src="figures/plot_nobs.png", width = "650", style="position:absolute; bottom:45px; right:60px;">

???

We also have several **biomarkers** that are **repeatedly measured**,
but since the data is not from a prospective study, the measurements are 
**very irregular**. The number of measurements per patient is shown here in 
the histogram.

The **median** number of measurement times per patient is **15**, but there are a few
patients with just **one single observation** and even one patient with
**&#62; 700 measurement** occasions.

---

## Missing Biomarker Values

???

In the **plots** here, you see the trajectories of 6 different biomarkers of interest
for 7 patients. The values are standardized to have similar scales for the
purpose of this visualization.

In many cases, the biomarkers are **measured at the same time points**, but it
also happens that one biomarker value is missing or was measured at another
time.

Observations that we would loose if we'd **restrict our data** to only those
time points at which all 6 markers are measured are shown as transparent.

For the full dataset, this would result in the loss of 50% of the biomarker
observations.

[2:00 min]

---

## Multivariate Joint Model

**Proportional hazards model** for time until event:
`$$h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\mathbf x_i^\top \boldsymbol\beta^{(tc)}}} +
\underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)$$`

???

To analyse this data, we'd like to use a multivariate joint model.

This model consists of **two parts**, a **proportional hazards** model for the
time-to-event outcome and a **multivariate mixed model** for the longitudinal
biomarkers.

The **proportional hazards** model has the usual form.
In the linear predictor, we have some variables that are **time constant**,
our baseline covariates, and we have the **time-varying biomarker values**.

- - -
--

**Longitudinal (mixed) model** for each biomarker `$k = 1, ... K$`:
`\begin{align*}
\mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\
 &= \underset{\text{fixed effects}}{\underbrace{\mathbf x_{ki}(t)^\top\boldsymbol\beta^{(k)}}} +
\underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}}
\qquad\scriptsize\text{with }
\begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D)
\end{align*}`

???

Each biomarker is modelled using a mixed model.

- - -

???

The **random effects** of the models for the different biomarkers are
**modelled jointly** in a multivariate normal distribution.

- - -

???

By using the linear predictors  `$\eta$` of the longitudinal models as covariates
in the time-to-event model, we assume that the underlying value of the biomarkers,
which may be measured with error, is associated with the risk of an event at 
the same time point.

It is, of course, possible to assume different association structures, for example,
a time lag or a cumulative effect, but since the focus here is on the
missing values, we'll keep it simple.

---

## Multivariate Joint Model

**Proportional hazards model** for time until event:
`$$h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\bbox[#3B4252, 2px]{\color{var(--nord15)}{\mathbf x_i}^\top} \boldsymbol\beta^{(tc)}}} +
\underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)$$`

**Longitudinal (mixed) model** for each biomarker `$k = 1, ... K$`:
`\begin{align*}
\mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\
 &= \underset{\text{fixed effects}}{\underbrace{\bbox[#3B4252, 2px]{\color{var(--nord15)}{\mathbf x_{ki}(t)}^\top}\boldsymbol\beta^{(k)}}} +
\underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}}
\qquad\scriptsize\text{with }
\begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D)
\end{align*}`

???

In both model parts, we might want to use **baseline covariates**. However, in
our motivating data, those variables **have missing values**.

The missing values in the longitudinal biomarkers are not that interesting at
this point because we are fitting models for those variables anyway.
And under ignorable missingness, the mixed models we use provide unbiased
estimates.

But for the missing baseline covariate values, we need to perform some sort of
imputation.

[3:22 min]

---

## Imputation of Missing Values

**Rubin (in the 1970s):**

* **uncertainty** about the missing value
* some values **more likely** than others
* relationship with **other** available **data**

???

The basic ideas about imputation go back to Donald Rubin and his work in the
1960s and 70s.

The important issue in working with missing values is the **uncertainty** about
what the value would have been. And so we **can't just pick** a single value and
fill it in because then we would ignore this uncertainty.

But we have some knowledge about the missing values. Usually, some values will
be more likely than others, and often there is a relationship between
the incomplete variable and the other data.

- - -

<strong>&#8680;</strong> missing values have a <strong>distribution</strong>

</div>

???

So, it makes sense to assume  that missing values have a distribution and that
we need a model to describe how an incomplete variable is related to the other
data.

- - - -

Impute from the **predictive distribution**
of the missing values given the observed values:
`$$p(\color{var(--nord15)}{\mathbf X_{mis}}\mid \mathbf X_{obs})$$`
]

???

And this means that we can impute the missing values by **sampling from the
predictive distribution of the missing values conditional on the observed data**.

[4:00 min]

---

## Multivariate Missingness

<div style = "text-align: center; margin-bottom: 25px; font-weight: 600;">
Multivariate<br>Missingness</div>

<table class="simpletable2">
<tr>
<th></th>
<th>$\mathbf y$</th>
<th>$\color{var(--nord15)}{\mathbf x_1}$</th>
<th>$\color{var(--nord15)}{\mathbf x_2}$</th>
<th>$\color{var(--nord15)}{\mathbf x_3}$</th>
<th>$\ldots$</th>
</tr>
<tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<th>$\ldots$</th>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td></td>
</tr>
</table>

<br>

<div style = "display: block; margin: auto;">
<ul class = "arrowlist">
<li>$p(\color{var(--nord15)}{\mathbf X_{mis}}\mid \mathbf X_{obs})$ is<br>
     <strong>multivariate</strong></li>
</ul>
</div>

]

]
]

???

In practice, we usually have missing values in multiple variables, which means
that this **predictive distribution is multivariate**.

- - -

<p style = "margin-top: -10px;">
When covariates are of <strong>mixed type</strong>:<br>
<span class = "red bold">no closed form</span>.
</p>

<div style = "width: 630px;">
&#8680; approximate $p(\color{var(--nord15)}{\mathbf X_{mis}}\mid \mathbf X_{obs})$
</div>

<br>

???

When we also have covariates of mixed type, meaning continuous and categorical,
this multivariate distribution does not have a closed-form, and so we need to
approximate it.

- - -

**Most common approach:**<br>
<span class = "n4">
<span class = "n6 bold">M</span><span class = "s">ultivariate</span>
<span class = "n6 bold">I</span><span class = "s">mputation by</span>
<span class = "n6 bold">C</span><span class = "s">hained</span>
<span class = "n6 bold">E</span><span class = "s">quations</span>
</span><br>
<span class = "n4">
<span class = "n6 bold">F</span><span class = "s">ully</span>
<span class = "n6 bold">C</span><span class = "s">onditional</span>
<span class = "n6 bold">S</span><span class = "s">pecification</span>
</span><br>

Based on the idea of the **Gibbs sampler**<br>
&#8680; iterative sampling from the **full-conditionals**:

<div class = "s">
\begin{alignat}{10}
\color{var(--nord15)}{\mathbf x_1} &= \theta^{(1)}_0 &+& \theta^{(1)}_1 \mathbf y &+&
\theta^{(1)}_2 \color{var(--nord15)}{\mathbf x_2} &+& \theta^{(1)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\
\color{var(--nord15)}{\mathbf x_2} &= \theta^{(2)}_0 &+& \theta^{(2)}_1 \mathbf y &+&
\theta^{(2)}_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta^{(2)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\
\color{var(--nord15)}{\mathbf x_3} &= \theta_0^{(3)} &+& \theta_1^{(3)} \mathbf y &+&
\theta_2^{(3)} \color{var(--nord15)}{\mathbf x_1} &+& \theta_3^{(3)} \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon
\end{alignat} 
</div>

???

The most common approach to handle such multivariate missingness is 
the MICE algorithm, which is also called imputation using a
**fully conditional specification** because that is exactly what it does.

It follows the idea of the Gibbs sampler to obtain a sample from the
multivariate distribution by iterative sampling from the full conditional
distribution of each of the incomplete variables.

**In practice**, these full conditionals are typically specified using
**regression models** that include **all other variables in the linear
predictor**.

An implication of this is that we assume linear associations between the 
dependent variables, here, the incomplete variables, and the independent 
variables, in our case the other covariates and the response of our 
analysis model of interest.

[5:15 min]

---

## Non-linear Associations

* .bold[true association]: non-linear
* .bold.n7[imputation assumption]: linear

???

But that is, of course, not always the case. 
We might, for example, have a quadratic association between the response `$y$` 
and the incomplete covariate `$x_1$`.

- - -

???

If we now impute missing values in `$x_1$` using a standard regression model,
all the imputed values follow a linear relationship.

- - -

???

And even if we correctly assume the non-linear association in the analysis model
that we fit on the imputed data, we will get biased results.

I'm talking about a quadratic effect here because I can visualize it more easily
than the non-linear relationship implied by the survival part of our joint
model.

---

## Time-to-Event Outcomes

(Simple) **Proportional Hazards Model:**
`$$h_i(t) = h_0(t) \exp\left(\color{var(--nord15)}{\mathbf x_i}^\top \boldsymbol\beta^{(tc)}\right)$$`

**Likelihood**
`$$p(T_i, \delta_i \mid \color{var(--nord15)}{\mathbf x_i}, \boldsymbol\beta^{(tc)}) =
\left[h_0(T_i) \exp \left\{ \color{var(--nord15)}{\mathbf x_i} \boldsymbol\beta^{(tc)}\right\}\right]^{\delta_i}  
\exp \left[-\int_0^{T_i} h_0(s)\exp\left(
\color{var(--nord15)}{\mathbf x_i} \boldsymbol\beta^{(tc)}\right)ds\right]$$`

???

But we can see the issue when we look at this simple proportional hazards
model (I've excluded the time-varying part for now) and the corresponding
likelihood for a subject `$i$`.

This likelihood describes a complex and non-linear relationship between
the response (in this case, the event time `$T$` and event indicator `$\delta$`)
and incomplete covariates `$x$`.

- - -

.bold.red[Inconsistent] with the (naive) imputation model:
`$$\color{var(--nord15)}{\mathbf x_1} = \theta_0 + \theta_1 \mathbf T + \theta_2 \boldsymbol \delta + \theta_3 \mathbf x_2 + \ldots$$`

.footnote[
[<i class = "fas fa-file-alt"></i> White & Royston (2009)](https://doi.org/10.1002/sim.3618)
&nbsp;
[<i class = "fas fa-file-alt"></i> Bartlett et al. (2015)](https://doi.org/10.1177/0962280214521348)
]

???

But the regression model we would use to impute values in, say `$x_1$`,
would look like this and imply a linear association.

This inconsistency between the analysis model and imputation model will 
result in bias.

[6:30 min]

---

## Multi-level Data

.footnote[
[<i class = "fas fa-file-alt"></i> Erler et al. (2016)](https://doi.org/10.1002/sim.6944)
]

For example:
`$$p(\color{var(--nord15)}{\text{sex}} \mid \text{age}, ..., \underset{\text{time varying}}{\underbrace{\text{ALT}, \text{AST}, ...}})$$`

]

<table class="simpletable2">
<tr>
<th></th>
<th>age</th>
<th>sex</th>
<th>time</th>
<th>ALT</th>
<th>AST</th>
<th>$\ldots$</th>
</tr>
<tr><td></td><td colspan = "5"; style = "padding: 0px;"><hr /></td><tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr class="hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr class = "hlgt-row">
<td class="rownr">$i$</td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td style="color: var(--nord15);"><i class = "fas fa-question"></i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr>
<td class="rownr"></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td><i class = "fas fa-check"</i></td>
<td>$\ldots$</td>
</tr>
<tr>
<td class = "rownr"></td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td>$\vdots$</td>
<td></td>
</tr>
</table>
]
]

???

But also the other component of our data, the **longitudinally
measured biomarkers, give us an additional challenge** when dealing with the
missing values in the baseline covariates.

Because we have **unbalanced** biomarker measurements, our data is in **long format**
where multiple rows belong to the same patient.

To impute missing values in baseline covariates, for example, in `sex`,
we need the distribution of that covariate, conditional on everything else, which 
includes the longitudinally measured biomarker values, for example, `AST` and `ALT`.

Given that `sex` does not change over time, the imputed values should also all
be the same in the rows belonging to the same patient.

- - -

<i class = "fas fa-bolt red"></i> time-varying imputations for time-constant covariates!<br>
<i class = "fas fa-bolt red"></i> ignore correlation between repeated observations?

???

If we would now apply the MICE algorithm to our data in long format,
we'd get **time-varying imputations for the time-constant covariates**.
And if we're not careful, we **might also ignore** that rows belonging to the
same patient are **correlated**.

- - -

<ul class = "arrowlist">
<li> <strong>Average</strong> imputed values?</li>
<li> <strong>Summarize</strong> time-varying variables &#8680; wide format?</li>
</ul>

Imputation of baseline covariates is not straightforward.

???

So what could we do?

We could try to **average the repeated imputations** of baseline covariates, but
that would imply that all biomarker values are equally related to baseline
variables, irrespective of when they were measured (and in our data we have 
follow-up of more than 20 years...).

Or we could somehow **summarize time-varying variables** to get our data into
wide format.

But neither strategy really captures the hierarchical structure of the data.

[7:45 min]

---

## Non-linear Associations & Multi-level

* non-linear associations
* multi-level data

<div style="font-size: 56pt; position: absolute; left: 430px; top: 100px;">}</div>
<div style = "position: absolute; right: 280px; top: 120px; width: 450px;">
Direct specification of <strong>appropriate full-conditionals</strong> is difficult!
</div>

???

So, in summary, because our time-to-event component implies complex non-linear
associations and we have multi-level data, it is not straightforward to specify
appropriate full-conditional distributions to impute incomplete covariates.

The imputation models typically specified in the MICE algorithm
will, in this case, not result in a joint distribution that approximates the 
correct multivariate distribution of the missing values given the observed 
values sufficiently well.

- - -

<br>

**Bayes Theorem:**

`\begin{align}
p(\color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta \mid \mathbf y, \mathbf X_{obs}) = 
\frac{p(\mathbf y, \mathbf X_{obs} \mid \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\;
p(\color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}{p\left(\mathbf y, \mathbf X_{obs}\right)}
&\propto \underset{\text{joint distribution}}{\underbrace{p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}
\end{align}`

???

And so, in the Bayesian approach, we go a different route.

From Bayes theorem, we can derive that the posterior distribution of the 
missing values and parameters `$\theta$` is proportional to the 
joint distribution of all our data, observed and unobserved, and the parameters.

I use `$y$` here to represent the response of an analysis model of interest, 
whatever that might be.

- - -

--
<br>

This joint distribution does not have a closed form either **&#8680; factorization!**

???

This joint distribution does not have a closed-form either, but we can
specify it conveniently as a product of conditional distributions.

[8:30 min]

---

## Factorization of the Joint Distribution

<br>

`$$p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = 
p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\;\;
p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)\;\;
p(\boldsymbol\theta)$$`

???

We can, for instance, split it into the distribution for the response `$y$`
conditional on the covariates, the distribution of the covariates, observed
and missing, and the distribution of the parameter vector `$\boldsymbol\theta$`.

- - -

\[
p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = 
\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}\;\;
\underset{\text{covariate model(s)}}{\underbrace{p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)}}\;\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}
\]

</div>

???

By factorizing the joint distribution like this, we have written it as the product
of our analysis model, a model for the covariates, and the prior distributions
for the parameters.

The nice thing about this division is that it works irrespective of how complex
our analysis model of interest is.

- - -

**(Multivariate) joint model for longitudinal and survival data:**

`$$p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf X}, \mathbf b, \boldsymbol\theta) =
\underset{\text{analysis model}}{\underbrace{
\underset{\substack{\text{survival}\\\text{model}}}{\underbrace{
p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol \theta)}}\;\;
\underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{
p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)\;
p(\mathbf b\mid \boldsymbol \theta)}}
}}\;
\bbox[#242933, 3pt, border: 2px solid #242933]{\underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{
p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta)}}}\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$`

???

For example, for our multivariate joint model, the analysis model consists of 
multiple parts, the time-to-event sub-model and the model for the longitudinal
biomarkers, which also involves the specification of the distribution of the
random effects `$\mathbf b$`.

---

## Factorization of the Joint Distribution

<br>

</div>

**(Multivariate) joint model for longitudinal and survival data:**

`$$p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf X}, \mathbf b, \boldsymbol\theta) =
\underset{\text{analysis model}}{\underbrace{
\underset{\substack{\text{survival}\\\text{model}}}{\underbrace{
p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol \theta)}}\;\;
\underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{
p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)\;
p(\mathbf b\mid \boldsymbol \theta)}}
}}\;
\bbox[#2E3440, 3pt, border: 2px solid #88C0D0]{\underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{
p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta)}}}\;
\underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}$$`

???

Most of this is straightforward to specify because it is the same that we'd need
to specify when doing a Bayesian analysis on complete data.

The only extra thing we need to do is specify the distribution of the
covariates.

---

## Covariate Models

<i class="fas fa-exclamation-triangle" style="color: var(--nord11);"></i> When `$\color{var(--nord15)}{\mathbf X}$` is multivariate & of mixed type &#8680; .bold.red[no closed form]

<div style = "margin: auto; display: block; text-align: center;">
<strong>&#8680; Factorization!</strong>
</div>

???

But we still have a multivariate distribution with no closed-form. The trick is
to apply the same strategy as for the joint distribution and to specify it as a
product of conditional distributions.

- - -

For example

`\begin{align}
 p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \color{var(--nord15)}{\mathbf x_3(t)}, \mathbf x_4(t) \mid \boldsymbol\theta)
=\, & p(\color{var(--nord15)}{\mathbf x_3(t)} \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_4(t), \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\
& p(\mathbf x_4(t) \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \boldsymbol\theta) &
\color{var(--nord3)}{\small\text{e.g., GLMM}}\\
& p(\color{var(--nord15)}{\mathbf x_1} \mid \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLM}}\\
& \color{var(--nord3)}{p(\mathbf x_2 \mid \boldsymbol\theta)} & \color{var(--nord3)}{\small\text{(not needed)}}
\end{align}`

???

For example, if we had 4 covariates, `$x_1$` and `$x_3$` having missing values, and
`$x_1$` and `$x_2$` being baseline covariates while `$x_3$` and `$x_4$` are time-varying,
we could specify the multivariate distribution using this sequence of models.

Here I always condition on those other covariates for which I haven't specified
a model yet.

And when we are smart about the order in which we take the covariates, we can
start with the models for time-varying covariates because then we don't need 
to include those anymore into the linear predictors of the subsequent models
for the baseline covariates.

This solves our issue with the hierarchical structure of the data.

- - -

<br>

<div style="width: 60%; padding: 1.15em 2em; background-color: var(--nord0); margin: auto; display: block; text-align: center; border: solid 2px var(--nord8); border-radius: 10px;">
covariate models $\neq$ imputation models!
</div>

???

An important thing to realize is that these models here are not the imputation
models. They are still part of the specification of the joint distribution of
all the data.

[10:45 min]

---

## Imputation Models

<div class = "container">
<div class = "box">
<div class = "box-row">
<div class = "box-cell" style = "background: var(--nord0);">imputation<br>models</div>&nbsp;
<div class = "box-cell">$\propto$</div>
<div class = "box-cell" style = "background: var(--nord0);">joint<br>distribution</div>&nbsp;
<div class = "box-cell">$=$</div> 
<div class = "box-cell" style = "background: var(--nord0);">analysis<br>model</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord0);">covariate<br>models</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord0)">priors</div>
</div>
</div>
</div>

<div style = "position: absolute; left: 430px; bottom: 250px;">
<span style = "font-weight: bold; color: var(--nord7);">Gibbs sampler</span>
</div>

???

The imputation models are then derived as full-conditional distributions from 
this joint distribution in the same way we do it with the posterior distributions
of the parameters, typically using Gibbs sampling,

- - -

???

The important difference to the MICE algorithm is that in the Bayesian approach,
we do specify the joint distribution and derive the imputation models as full
conditionals from this joint distribution. The models are usually much more
complex than the ones we specify when using MICE and can provide a sample from
the joint distribution of the missing values, even in complex settings.

In MICE, we specify the imputation models directly, which means they typically
have a relatively simple structure, and they could be incompatible with each
other and the analysis model. Hence they may not produce a joint distribution
that approximates the correct joint distribution well enough.

[11:30 min]

---

## Advantages of the Bayesian Approach
`$$p(\mathbf y, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta) =
\bbox[#2E3440, 5pt]{\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf X}, \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}})}}}\;
p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta_x)\;
p(\color{var(--nord14}{\boldsymbol\theta_{y\mid x}}, \boldsymbol\theta_x)\qquad
\color{var(--nord3)}{\text{with } \boldsymbol\theta = (\boldsymbol\theta_{y\mid x}, \boldsymbol\theta_x)}$$`

* specification of the joint distribution<br>
  &#8680; assures **compatibility** of imputation models

???

This is one of the advantages of the Bayesian approach. Because we specify the
joint distribution, we can ensure that all imputation models are compatible
with each other.

- - -

* use of the analysis model in the specification of the joint distribution<br>
  &#8680; **parameters of interest** `$\color{var(--nord14)}{\boldsymbol\theta_{y\mid x}}$` are estimated directly<br>
  &#8680; **non-linear associations** are taken into account<br>
  &#8680; assures **congeniality**

???

And because the analysis model is part of the specification of the joint
distribution, we can be sure that the imputation models are also compatible with
the analysis model, even when there are non-linear associations with covariates.

In addition, the parameters of interest, so the parameters of the analysis
model, are directly estimated. We do not have separate steps for imputation and
analysis, but everything happens simultaneously in the same procedure.

- - -

* response is not in any linear predictor<br>
  &#8680; no problem to use **complex outcomes**

???

Because we can specify the joint distribution so that one of the conditional
models is the analysis model, we can usually make sure that the outcome does not
need to appear in the linear predictor of models for covariates. This makes it
possible to use this approach in settings where we have complex outcomes that
could not be easily included in the linear predictor of some or all of the
covariates.

[12:30 min]

---

## In practice: <i class="fab fa-r-project"></i> Package JointAI

.footnote[
[<i class = "fas fa-file-alt"></i> Erler et al. (2021)](https://doi.org/10.18637/jss.v100.i20)
&nbsp;
[<i class = "fas fa-globe-americas"></i> nerler.github.io/JointAI](https://nerler.github.io/JointAI)
]

<div class = "box-cell" style="width:180px;">
<strong>Specification</strong><br>of the joint<br> distribution:
</div>

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>User</strong></div>
<div class = "box-cell" style = "background: var(--nord1); border: 2px solid var(--nord8); ">analysis<br>model</div>
</div>

&nbsp;

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div>
<div class = "box-row">
<div class = "box-cell" style = "background: var(--nord1);">covariate<br>models</div>&nbsp;
<div class = "box-cell" style = "background: var(--nord1)">priors</div>
</div></div>

</div></div></div>

???

To use this Bayesian approach, for example, to fit the multivariate joint model
in the motivating example, you can use the R package JointAI, short for Joint
Analysis and Imputation.

To specify the joint distribution, the user only has to specify the analysis
model of interest, and the software sets defaults for the covariate models
and priors.

Several different analysis models are available, and the specification is almost
the same as for standard functions in R like `lm()` or `glm()`.

- - -

<br>

<div class = "box-cell" style="width:180px;">
<strong>Estimation:</strong>
</div>

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>JointAI</strong></div>
<div class = "box-cell" style = "background: var(--nord1);">pre-processing</div>
</div>

&nbsp;

<div class = "box-cell" style = "background: var(--nord0);">
<div style = "padding-bottom: 10px;"> <strong>rjags / JAGS</strong></div>
<div class = "box-cell" style = "background: var(--nord1);">MCMC sampling</div>
</div>

&nbsp;

</div></div></div>

???

The model estimation is done via MCMC with the help of JAGS, an external
and freely available software that does Gibbs sampling.

[13:15]

---

## Summary

Multivariate Joint Models involve
* survival outcome &#8680; non-linear association
* multi-level data

<div style = "position: absolute; right: 60px; top: 150px; width: 25%">
&#8680; difficult to specify 
$p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \mathbf X_{obs})$ directly
</div>

???

In summary, we’ve seen that both the survival component and the hierarchical
structure of joint models make it difficult to directly specify the correct
distribution from which missing covariate values should be sampled.

- - -

<div class = "sbox" style = "left: 60px;">
<strong>MICE/FCS</strong><br>
<ul>
<li>approximate $p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \mathbf X_{obs})$
by (direct) sampling from
full-conditionals $p(\color{var(--nord15)}{\mathbf x_k} \mid \cdot)$</li>
</ul>

<ul class = "arrowlist">
<li>incompatibility / uncongeniality issues</li>
<li>bias</li>
</ul>
</div>

???

MICE tries to approximate this distribution using the idea of the Gibbs sampler.
But because the full-conditionals may not be a good approximation of the 
correct full conditionals, it has issues with incompatibility and uncongeniality
in complex settings and may result in bias.

- - -

<div class = "sbox" style = "right: 60px;">
<strong>Fully Bayesian approach</strong>
<ul>
<li>specification of the joint distribution via factorization</li>
<li>derive $p(\color{var(--nord15)}{\mathbf x_k} \mid \cdot)$ from the joint
distribution</li>
</ul>

<ul class = "arrowlist">
<li>theoretically valid, compatible &amp; congenial</li>
</ul>
</div>

???

Using a fully Bayesian approach, we can specify the joint distribution as a
sequence of conditional models that gives us a number of advantages.
The imputation models are then derived from this joint distribution, resulting
in a theoretically valid approach that assures compatibility and congeniality.

---
class: the-end, center, middle
layout: true
count: false

# Thank you for your attention!

<div class="contact">
<i class="fas fa-envelope"></i> <a href="mailto:n.erler@erasmusmc.nl" class="email">n.erler@erasmusmc.nl</a>&emsp;
<a href="https://twitter.com/N_Erler" target="_blank"><i class="fab fa-twitter"></i> N_Erler</a>&emsp;
  <a href="https://github.com/NErler" target="_blank"><i class="fab fa-github"></i> NErler</a>&emsp;
  <a href="https://nerler.com" target="_blank"><i class="fas fa-globe-americas"></i> https://nerler.com</a>
</div>

---
count: false