Processing math: 0%
+ - 0:00:00
Notes for current slide
Notes for next slide
When and Why Imputation with MICE / FCS Might Fail
Nicole Erler
Assistant Professor
Department of Biostatistics
1

Outline

  • Motivation / Setting the Stage
  • (Multiple) Imputation & MICE / FCS
  • Potential Issues with MICE
  • A Fully Bayesian Alternative
  • Summary
1

Outline

  • Motivation / Setting the Stage
  • (Multiple) Imputation & MICE / FCS
  • Potential Issues with MICE
  • A Fully Bayesian Alternative
  • Summary

https://nerler.com/#talks

1

Setting the Stage

2

Motivation: Chronic Hepatitis C

  • N 700 patients with chronic hepatitis C
  • follow-up: 38 years
  • several baseline variables:
    • age, sex, smoking, BMI, ...
    • 0 - 19% missing values   ⇨ 66% complete cases
3

This project is motivated by a study in patients with a chronic hepatitis C infection.

In many patients, the hepatitis has caused severe liver damage, and they will need a transplant. In light of the scarcity of donor organs, it is important to identify factors that indicate a patient's risk for liver failure.

We have observational data from approx. 700 patients. In this data, several variables are only measured once, at baseline. This includes the patient's age at baseline and sex and the year of diagnosis, as well as variables like alcohol consumption, smoking, or BMI, which could change over time, but were only measured once.

In some of these variables, we have missing values for up to 19% of the patients. If we'd exclude everyone with missing values in these variables, we would lose about a third of the data.


Motivation: Chronic Hepatitis C

  • N 700 patients with chronic hepatitis C
  • follow-up: 38 years
  • several baseline variables:
    • age, sex, smoking, BMI, ...
    • 0 - 19% missing values   ⇨ 66% complete cases
  • multiple biomarkers
    (repeatedly measured)

3

This project is motivated by a study in patients with a chronic hepatitis C infection.

In many patients, the hepatitis has caused severe liver damage, and they will need a transplant. In light of the scarcity of donor organs, it is important to identify factors that indicate a patient's risk for liver failure.

We have observational data from approx. 700 patients. In this data, several variables are only measured once, at baseline. This includes the patient's age at baseline and sex and the year of diagnosis, as well as variables like alcohol consumption, smoking, or BMI, which could change over time, but were only measured once.

In some of these variables, we have missing values for up to 19% of the patients. If we'd exclude everyone with missing values in these variables, we would lose about a third of the data.


We also have several biomarkers that are repeatedly measured, but since the data is not from a prospective study, the measurements are very irregular. The number of measurements per patient is shown here in the histogram.

The median number of measurement times per patient is 15, but there are a few patients with just one single observation and even one patient with > 700 measurement occasions.

Motivation: Chronic Hepatitis C

  • N 700 patients with chronic hepatitis C
  • follow-up: 38 years
  • several baseline variables:
    • age, sex, smoking, BMI, ...
    • 0 - 19% missing values   ⇨ 66% complete cases
  • multiple biomarkers
    (repeatedly measured)

  • Time-to-event Outcome:
    Clinical events
    (death, transplant, ...)
3

This project is motivated by a study in patients with a chronic hepatitis C infection.

In many patients, the hepatitis has caused severe liver damage, and they will need a transplant. In light of the scarcity of donor organs, it is important to identify factors that indicate a patient's risk for liver failure.

We have observational data from approx. 700 patients. In this data, several variables are only measured once, at baseline. This includes the patient's age at baseline and sex and the year of diagnosis, as well as variables like alcohol consumption, smoking, or BMI, which could change over time, but were only measured once.

In some of these variables, we have missing values for up to 19% of the patients. If we'd exclude everyone with missing values in these variables, we would lose about a third of the data.


We also have several biomarkers that are repeatedly measured, but since the data is not from a prospective study, the measurements are very irregular. The number of measurements per patient is shown here in the histogram.

The median number of measurement times per patient is 15, but there are a few patients with just one single observation and even one patient with > 700 measurement occasions.

Missing Biomarker Values

4

In the plots here, you see the trajectories of 6 different biomarkers of interest for 7 patients. The values are standardized to have similar scales for the purpose of this visualization.

In many cases, the biomarkers are measured at the same time points, but it also happens that one biomarker value is missing or was measured at another time.

Observations that we would loose if we'd restrict our data to only those time points at which all 6 markers are measured are shown as transparent.

For the full dataset, this would result in the loss of 50% of the biomarker observations.

[2:00 min]

Analysis: Multivariate Joint Model

Proportional hazards model for time until event: h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\mathbf x_i^\top \boldsymbol\beta^{(tc)}}} + \underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)

5

To analyse this data, we'd like to use a multivariate joint model.

This model consists of two parts, a proportional hazards model for the time-to-event outcome and a multivariate mixed model for the longitudinal biomarkers.

The proportional hazards model has the usual form. In the linear predictor, we have some variables that are time constant, our baseline covariates, and we have the time-varying biomarker values.


Analysis: Multivariate Joint Model

Proportional hazards model for time until event: h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\mathbf x_i^\top \boldsymbol\beta^{(tc)}}} + \underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)

Longitudinal (mixed) model for each biomarker k = 1, ... K: \begin{align*} \mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\ &= \underset{\text{fixed effects}}{\underbrace{\mathbf x_{ki}(t)^\top\boldsymbol\beta^{(k)}}} + \underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}} \qquad\scriptsize\text{with } \begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D) \end{align*}

5

To analyse this data, we'd like to use a multivariate joint model.

This model consists of two parts, a proportional hazards model for the time-to-event outcome and a multivariate mixed model for the longitudinal biomarkers.

The proportional hazards model has the usual form. In the linear predictor, we have some variables that are time constant, our baseline covariates, and we have the time-varying biomarker values.


Each biomarker is modelled using a mixed model.


Analysis: Multivariate Joint Model

Proportional hazards model for time until event: h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\mathbf x_i^\top \boldsymbol\beta^{(tc)}}} + \underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)

Longitudinal (mixed) model for each biomarker k = 1, ... K: \begin{align*} \mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\ &= \underset{\text{fixed effects}}{\underbrace{\mathbf x_{ki}(t)^\top\boldsymbol\beta^{(k)}}} + \underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}} \qquad\scriptsize\text{with } \begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D) \end{align*}

5

To analyse this data, we'd like to use a multivariate joint model.

This model consists of two parts, a proportional hazards model for the time-to-event outcome and a multivariate mixed model for the longitudinal biomarkers.

The proportional hazards model has the usual form. In the linear predictor, we have some variables that are time constant, our baseline covariates, and we have the time-varying biomarker values.


Each biomarker is modelled using a mixed model.


The random effects of the models for the different biomarkers are modelled jointly in a multivariate normal distribution.


Analysis: Multivariate Joint Model

Proportional hazards model for time until event: h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\mathbf x_i^\top \boldsymbol\beta^{(tc)}}} + \underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)

Longitudinal (mixed) model for each biomarker k = 1, ... K: \begin{align*} \mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\ &= \underset{\text{fixed effects}}{\underbrace{\mathbf x_{ki}(t)^\top\boldsymbol\beta^{(k)}}} + \underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}} \qquad\scriptsize\text{with } \begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D) \end{align*}

5

To analyse this data, we'd like to use a multivariate joint model.

This model consists of two parts, a proportional hazards model for the time-to-event outcome and a multivariate mixed model for the longitudinal biomarkers.

The proportional hazards model has the usual form. In the linear predictor, we have some variables that are time constant, our baseline covariates, and we have the time-varying biomarker values.


Each biomarker is modelled using a mixed model.


The random effects of the models for the different biomarkers are modelled jointly in a multivariate normal distribution.


By using the linear predictors \eta of the longitudinal models as covariates in the time-to-event model, we assume that the underlying value of the biomarkers, which may be measured with error, is associated with the risk of an event at the same time point.

It is, of course, possible to assume different association structures, for example, a time lag or a cumulative effect, but since the focus here is on the missing values, we'll keep it simple.

Analysis: Multivariate Joint Model

Proportional hazards model for time until event: h_i(t) = h_0(t)\;\exp\left(\underset{\substack{\text{time}\\\text{constant}} }{\underbrace{\bbox[#3B4252, 2px]{\color{var(--nord15)}{\mathbf x_i}^\top} \boldsymbol\beta^{(tc)}}} + \underset{\text{time varying}}{\underbrace{\sum_{k=1}^K \eta_{ki}(t)^\top\beta^{(tv)}_k}}\right)

Longitudinal (mixed) model for each biomarker k = 1, ... K: \begin{align*} \mathbb E(y_{ki}(t)\mid \mathbf b_{ki}) &= \eta_{ki}(t)\\ &= \underset{\text{fixed effects}}{\underbrace{\bbox[#3B4252, 2px]{\color{var(--nord15)}{\mathbf x_{ki}(t)}^\top}\boldsymbol\beta^{(k)}}} + \underset{\text{random effects}}{\underbrace{\mathbf z_{ki}(t)^\top \mathbf b_{ki}}} \qquad\scriptsize\text{with } \begin{pmatrix}\mathbf b_{1i}\\\vdots\\\mathbf b_{Ki}\end{pmatrix} \sim N(\mathbf 0, \mathbf D) \end{align*}

5

In both model parts, we might want to use baseline covariates. However, in our motivating data, those variables have missing values.

[3:22 min]

How are Missing Data Handled?

Shivasabesan et al. (2018)

  • all trauma registry-based manuscripts (2015 - 2017)
  • 241/539 (45%) described how missing data was handled:
    • 234 (43%) complete case analysis (CCA)
    • 18 (3%) multiple imputation (MI)
    • 34 (6%) combination of CCA & MI
6

Missing data are a widespread issue, especially in observational data. But how are they usually dealt with?

It is not easy to get a very good overview of how Missing Values are handled in practice.

But there are a few studies that have looked into how missing data are handled and reported in specific types of studies or fields of application.


A systematic review that was published in the clinical journal "Injury" investigated missing data in trauma registries.

About half of the 539 included papers described how missing data was handled, and the most common approach was again to perform a complete case analysis, followed by a combination with multiple imputation or just multiple imputation.


How are Missing Data Handled?

Shivasabesan et al. (2018)

  • all trauma registry-based manuscripts (2015 - 2017)
  • 241/539 (45%) described how missing data was handled:
    • 234 (43%) complete case analysis (CCA)
    • 18 (3%) multiple imputation (MI)
    • 34 (6%) combination of CCA & MI

 

Carroll et al. (2020)

  • observational time-to-event studies in oncology using prop. hazards model (2012 - 2018)
  • approx. 130/148 (88%) reported on the missing data method
    • 79 (53%) complete case analysis
    • 33 (22%) multiple imputation
6

Missing data are a widespread issue, especially in observational data. But how are they usually dealt with?

It is not easy to get a very good overview of how Missing Values are handled in practice.

But there are a few studies that have looked into how missing data are handled and reported in specific types of studies or fields of application.


A systematic review that was published in the clinical journal "Injury" investigated missing data in trauma registries.

About half of the 539 included papers described how missing data was handled, and the most common approach was again to perform a complete case analysis, followed by a combination with multiple imputation or just multiple imputation.


In 2020, Carroll and colleagues published their investigation on missing data methods in observational time-to-event studies that used proportional hazards models. They included 148 studies, of which most reported on how missing data was handled, and, again, half of the studies performed complete case analyses, and the next most popular method was multiple imputation.

How are Missing Data Handled?

Hunt et al. (2021)

  • multi-database pharmacoepidemiologic studies (2018 - 2019)
  • 19/62 (31%) reported the missing data method:
    • 13 (21%) complete case analysis
    • 4 (6%) multiple imputation
7

Even more recently, a review on multi-database studies in pharmacoepidemiology found that in that area of research, only on third of the studies reported on the missing data method, and also there, complete cases analysis, followed by multiple imputation was the most commonly reported method.


More and more clinical researchers are aware that in most settings complete case analysis is not appropriate, and multiple imputation is somehow considered the universal remedy to the missing data problem.

Complete Case Analysis

8

The issue with complete case analysis is that it is at least inefficient.

Depending on the number of variables that have missing values and the proportion of values missing, you may loose quite a large proportion of your data. Especially in observational studies that require the inclusion of many covariates to reduce the bias due to confounding.

I have listed here a couple of resources on the issues associated with complete case analysis.

(Multiple) Imputation & MICE / FCS

9

Development of (Multiple) Imputation

In the 1960s/70s:

Increasing number of missing values in the US census:
⇨ Need to solve the missing data problem.

10

Development of (Multiple) Imputation

In the 1960s/70s:

Increasing number of missing values in the US census:
⇨ Need to solve the missing data problem.

10

Development of (Multiple) Imputation

In the 1960s/70s:

Increasing number of missing values in the US census:
⇨ Need to solve the missing data problem.


Idea / Requirement
⇨ fix the missing data problem once (centrally)
⇨ supply completed data to many analysts

10

Imputation of Missing Values

Rubin (in the 1970s):

  • uncertainty about the missing value
  • some values more likely than others
  • relationship with other available data
11

The basic ideas about imputation go back to Donald Rubin and his work in the 1960s and 70s.

The important issue in working with missing values is the uncertainty about what the value would have been. And so we can't just pick a single value and fill it in because then we would ignore this uncertainty.

But we have some knowledge about the missing values. Usually, some values will be more likely than others, and often there is a relationship between the incomplete variable and the other data.


Imputation of Missing Values

Rubin (in the 1970s):

  • uncertainty about the missing value
  • some values more likely than others
  • relationship with other available data
missing values have a distribution
11

The basic ideas about imputation go back to Donald Rubin and his work in the 1960s and 70s.

The important issue in working with missing values is the uncertainty about what the value would have been. And so we can't just pick a single value and fill it in because then we would ignore this uncertainty.

But we have some knowledge about the missing values. Usually, some values will be more likely than others, and often there is a relationship between the incomplete variable and the other data.


So, it makes sense to assume that missing values have a distribution and that we need a model to describe how an incomplete variable is related to the other data.


Imputation of Missing Values

Rubin (in the 1970s):

  • uncertainty about the missing value
  • some values more likely than others
  • relationship with other available data
missing values have a distribution
Bayesian Concept!!!
11

The basic ideas about imputation go back to Donald Rubin and his work in the 1960s and 70s.

The important issue in working with missing values is the uncertainty about what the value would have been. And so we can't just pick a single value and fill it in because then we would ignore this uncertainty.

But we have some knowledge about the missing values. Usually, some values will be more likely than others, and often there is a relationship between the incomplete variable and the other data.


So, it makes sense to assume that missing values have a distribution and that we need a model to describe how an incomplete variable is related to the other data.


Imputation of Missing Values

Rubin (in the 1970s):

  • uncertainty about the missing value
  • some values more likely than others
  • relationship with other available data
missing values have a distribution
Bayesian Concept!!!




Impute from the predictive distribution of the missing values given the observed values: p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})

11

The basic ideas about imputation go back to Donald Rubin and his work in the 1960s and 70s.

The important issue in working with missing values is the uncertainty about what the value would have been. And so we can't just pick a single value and fill it in because then we would ignore this uncertainty.

But we have some knowledge about the missing values. Usually, some values will be more likely than others, and often there is a relationship between the incomplete variable and the other data.


So, it makes sense to assume that missing values have a distribution and that we need a model to describe how an incomplete variable is related to the other data.


And this means that we can impute the missing values by sampling from the predictive distribution of the missing values conditional on the observed data.

[4:00 min]

Multiple Imputation

12

The principle of multiple imputation is simple:

  • We impute each missing value multiple times, in order to take into account the uncertainty that we have about the missing value, and thereby create a set of completed datasets.

  • Each of these datasets can then be analysed by standard complete data methods, and the results of these separate analyses are then pooled to obtain an overall result.

One often cited advantage of this approach is that we can run this imputation once, and then re-use the imputed data for multiple analyses performed by different researchers.

Multiple Imputation

 

12

The principle of multiple imputation is simple:

  • We impute each missing value multiple times, in order to take into account the uncertainty that we have about the missing value, and thereby create a set of completed datasets.

  • Each of these datasets can then be analysed by standard complete data methods, and the results of these separate analyses are then pooled to obtain an overall result.

One often cited advantage of this approach is that we can run this imputation once, and then re-use the imputed data for multiple analyses performed by different researchers.

Imputation Step


\mathbf y \color{var(--nord15)}{\mathbf x_1} \mathbf x_2 \mathbf x_3

i
\vdots \vdots \vdots \vdots


  • \mathbf y: response variable
  • \color{var(--nord15)}{\mathbf x_1}: incomplete covariate
  • \mathbf x_2, \mathbf x_3: complete covariates

Sampling from the predictive distribution:    

13

The most interesting part of this multiple imputation is the imputation step.

Missing values should be imputed from their predictive distribution.

The idea for imputation of x_1 is similar to predicting the missing values.

We can think of this as fitting a model to the cases where x_1 is observed, for example, this could be a linear regression model that has the response y and the other covariates as its covariates.

From this model, we can get estimates for the regression coefficients \beta and the standard deviation of the residuals \sigma.


Imputation Step


\mathbf y \color{var(--nord15)}{\mathbf x_1} \mathbf x_2 \mathbf x_3

i
\vdots \vdots \vdots \vdots


  • \mathbf y: response variable
  • \color{var(--nord15)}{\mathbf x_1}: incomplete covariate
  • \mathbf x_2, \mathbf x_3: complete covariates

Sampling from the predictive distribution:    

  • Define a model for the incomplete variable \color{var(--nord15)}{\mathbf x_1}, e.g. \color{var(--nord15)}{\mathbf x_1} = \beta_0 + \beta_1 \mathbf y + \beta_2 \mathbf x_2 + \beta_3 \mathbf x_3 + \varepsilon to get estimates \boldsymbol{\hat\beta} (and \hat\sigma)
13

The most interesting part of this multiple imputation is the imputation step.

Missing values should be imputed from their predictive distribution.

The idea for imputation of x_1 is similar to predicting the missing values.

We can think of this as fitting a model to the cases where x_1 is observed, for example, this could be a linear regression model that has the response y and the other covariates as its covariates.

From this model, we can get estimates for the regression coefficients \beta and the standard deviation of the residuals \sigma.


Imputation Step


\mathbf y \color{var(--nord15)}{\mathbf x_1} \mathbf x_2 \mathbf x_3

i
\vdots \vdots \vdots \vdots


  • \mathbf y: response variable
  • \color{var(--nord15)}{\mathbf x_1}: incomplete covariate
  • \mathbf x_2, \mathbf x_3: complete covariates

Sampling from the predictive distribution:    

  • Define a model for the incomplete variable \color{var(--nord15)}{\mathbf x_1}, e.g. \color{var(--nord15)}{\mathbf x_1} = \beta_0 + \beta_1 \mathbf y + \beta_2 \mathbf x_2 + \beta_3 \mathbf x_3 + \varepsilon to get estimates \boldsymbol{\hat\beta} (and \hat\sigma)


  • Sample \color{var(--nord15)}{\hat{x}_{i1}} from the predictive distribution p(\color{var(--nord15)}{x_{i1}} \mid y_i, x_{i2}, x_{i3}, \boldsymbol{\hat\beta}, \hat\sigma)
13

The most interesting part of this multiple imputation is the imputation step.

Missing values should be imputed from their predictive distribution.

The idea for imputation of x_1 is similar to predicting the missing values.

We can think of this as fitting a model to the cases where x_1 is observed, for example, this could be a linear regression model that has the response y and the other covariates as its covariates.

From this model, we can get estimates for the regression coefficients \beta and the standard deviation of the residuals \sigma.


And using these parameter estimates we can then calculate the fitted or predicted value of x_1 for those cases where x_1 is missing.

Multivariate Missingness

Multivariate
Missingness
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots


  • p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs}) is
    multivariate
14

In practice, we usually have missing values in multiple variables, which means that this predictive distribution is multivariate.


Multivariate Missingness

Multivariate
Missingness
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots


  • p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs}) is
    multivariate
When covariates are of mixed type:
no closed form.

⇨ approximate p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})

14

In practice, we usually have missing values in multiple variables, which means that this predictive distribution is multivariate.


When we also have covariates of mixed type, meaning continuous and categorical, this multivariate distribution does not have a closed-form, and so we need to approximate it.


Multivariate Missingness

Multivariate
Missingness
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots


  • p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs}) is
    multivariate
When covariates are of mixed type:
no closed form.

⇨ approximate p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})

 

"Original" approach:
Joint Model (Multiple) Imputation

  • Approximate p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs}) with a known multivariate distribution.
14

In practice, we usually have missing values in multiple variables, which means that this predictive distribution is multivariate.


When we also have covariates of mixed type, meaning continuous and categorical, this multivariate distribution does not have a closed-form, and so we need to approximate it.


Multivariate Missingness

Multivariate
Missingness
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots


  • p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs}) is
    multivariate
When covariates are of mixed type:
no closed form.

⇨ approximate p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs})

 

"Original" approach:
Joint Model (Multiple) Imputation

  • Approximate p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs}) with a known multivariate distribution.

 

Most common approach:
Multivariate Imputation by Chained Equations
Fully Conditional Specification

  • Based on the idea of the Gibbs sampler
14

In practice, we usually have missing values in multiple variables, which means that this predictive distribution is multivariate.


When we also have covariates of mixed type, meaning continuous and categorical, this multivariate distribution does not have a closed-form, and so we need to approximate it.


Joint Model MI

For example, assume all incomplete variables and the outcome are (latent) normal: \mathbf x_k \text{ binary }\rightarrow \text{ latent } \boldsymbol{\hat{\mathbf{x}}_k} \text{ is standard normal: } \left\{\begin{array}{c} \mathbf x_k = 1\\ \mathbf x_k = 0\end{array}\right.\text{ if } \begin{array}{c} \boldsymbol{\hat{\mathbf{x}}_k}\geq \kappa_k\\ \boldsymbol{\hat{\mathbf{x}}_k} < \kappa_k\end{array}

p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}\mid \mathbf{\mathcal D}_{obs}) is multivariate normal
⇨ easy to sample from


package jomo.

15

MICE / FCS

Iterative sampling from the full-conditionals.

16

MICE / FCS

Iterative sampling from the full-conditionals.

  • Specify a model for each incomplete variable with all other variables in the linear predictor.
Example:
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots

\begin{alignat}{10} \color{var(--nord15)}{\mathbf x_1} &= \theta^{(1)}_0 &+& \theta^{(1)}_1 \mathbf y &+& \theta^{(1)}_2 \color{var(--nord15)}{\mathbf x_2} &+& \theta^{(1)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_2} &= \theta^{(2)}_0 &+& \theta^{(2)}_1 \mathbf y &+& \theta^{(2)}_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta^{(2)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_3} &= \theta_0^{(3)} &+& \theta_1^{(3)} \mathbf y &+& \theta_2^{(3)} \color{var(--nord15)}{\mathbf x_1} &+& \theta_3^{(3)} \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon \end{alignat}
16

The most common approach to handle such multivariate missingness is the MICE algorithm, which is also called imputation using a fully conditional specification because that is exactly what it does.

It follows the idea of the Gibbs sampler to obtain a sample from the multivariate distribution by iterative sampling from the full conditional distribution of each of the incomplete variables.

In practice, these full conditionals are typically specified using regression models that include all other variables in the linear predictor.

An implication of this is that we assume linear associations between the dependent variables, here, the incomplete variables, and the independent variables, in our case the other covariates and the response of our analysis model of interest.

[5:15 min]

MICE / FCS

Iterative sampling from the full-conditionals.

  • Specify a model for each incomplete variable with all other variables in the linear predictor.
  • Draw random starting values from the observed data.
Example:
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots

\begin{alignat}{10} \color{var(--nord15)}{\mathbf x_1} &= \theta^{(1)}_0 &+& \theta^{(1)}_1 \mathbf y &+& \theta^{(1)}_2 \color{var(--nord15)}{\mathbf x_2} &+& \theta^{(1)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_2} &= \theta^{(2)}_0 &+& \theta^{(2)}_1 \mathbf y &+& \theta^{(2)}_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta^{(2)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_3} &= \theta_0^{(3)} &+& \theta_1^{(3)} \mathbf y &+& \theta_2^{(3)} \color{var(--nord15)}{\mathbf x_1} &+& \theta_3^{(3)} \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon \end{alignat}
16

The most common approach to handle such multivariate missingness is the MICE algorithm, which is also called imputation using a fully conditional specification because that is exactly what it does.

It follows the idea of the Gibbs sampler to obtain a sample from the multivariate distribution by iterative sampling from the full conditional distribution of each of the incomplete variables.

In practice, these full conditionals are typically specified using regression models that include all other variables in the linear predictor.

An implication of this is that we assume linear associations between the dependent variables, here, the incomplete variables, and the independent variables, in our case the other covariates and the response of our analysis model of interest.

[5:15 min]

MICE / FCS

Iterative sampling from the full-conditionals.

  • Specify a model for each incomplete variable with all other variables in the linear predictor.
  • Draw random starting values from the observed data.
  • Cycle through all incomplete variables until convergence.
Example:
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots

\begin{alignat}{10} \color{var(--nord15)}{\mathbf x_1} &= \theta^{(1)}_0 &+& \theta^{(1)}_1 \mathbf y &+& \theta^{(1)}_2 \color{var(--nord15)}{\mathbf x_2} &+& \theta^{(1)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_2} &= \theta^{(2)}_0 &+& \theta^{(2)}_1 \mathbf y &+& \theta^{(2)}_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta^{(2)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_3} &= \theta_0^{(3)} &+& \theta_1^{(3)} \mathbf y &+& \theta_2^{(3)} \color{var(--nord15)}{\mathbf x_1} &+& \theta_3^{(3)} \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon \end{alignat}
16

The most common approach to handle such multivariate missingness is the MICE algorithm, which is also called imputation using a fully conditional specification because that is exactly what it does.

It follows the idea of the Gibbs sampler to obtain a sample from the multivariate distribution by iterative sampling from the full conditional distribution of each of the incomplete variables.

In practice, these full conditionals are typically specified using regression models that include all other variables in the linear predictor.

An implication of this is that we assume linear associations between the dependent variables, here, the incomplete variables, and the independent variables, in our case the other covariates and the response of our analysis model of interest.

[5:15 min]

MICE / FCS

Iterative sampling from the full-conditionals.

  • Specify a model for each incomplete variable with all other variables in the linear predictor.
  • Draw random starting values from the observed data.
  • Cycle through all incomplete variables until convergence.
  • Keep last value for each missing observation:
    ⇨ one imputed dataset
Example:
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots

\begin{alignat}{10} \color{var(--nord15)}{\mathbf x_1} &= \theta^{(1)}_0 &+& \theta^{(1)}_1 \mathbf y &+& \theta^{(1)}_2 \color{var(--nord15)}{\mathbf x_2} &+& \theta^{(1)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_2} &= \theta^{(2)}_0 &+& \theta^{(2)}_1 \mathbf y &+& \theta^{(2)}_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta^{(2)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_3} &= \theta_0^{(3)} &+& \theta_1^{(3)} \mathbf y &+& \theta_2^{(3)} \color{var(--nord15)}{\mathbf x_1} &+& \theta_3^{(3)} \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon \end{alignat}
16

The most common approach to handle such multivariate missingness is the MICE algorithm, which is also called imputation using a fully conditional specification because that is exactly what it does.

It follows the idea of the Gibbs sampler to obtain a sample from the multivariate distribution by iterative sampling from the full conditional distribution of each of the incomplete variables.

In practice, these full conditionals are typically specified using regression models that include all other variables in the linear predictor.

An implication of this is that we assume linear associations between the dependent variables, here, the incomplete variables, and the independent variables, in our case the other covariates and the response of our analysis model of interest.

[5:15 min]

MICE / FCS

Iterative sampling from the full-conditionals.

  • Specify a model for each incomplete variable with all other variables in the linear predictor.
  • Draw random starting values from the observed data.
  • Cycle through all incomplete variables until convergence.
  • Keep last value for each missing observation:
    ⇨ one imputed dataset

Multiple runs with different starting values
⇨ multiple imputed datasets

Example:
\mathbf y \color{var(--nord15)}{\mathbf x_1} \color{var(--nord15)}{\mathbf x_2} \color{var(--nord15)}{\mathbf x_3} \ldots

\ldots
i \ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots

\begin{alignat}{10} \color{var(--nord15)}{\mathbf x_1} &= \theta^{(1)}_0 &+& \theta^{(1)}_1 \mathbf y &+& \theta^{(1)}_2 \color{var(--nord15)}{\mathbf x_2} &+& \theta^{(1)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_2} &= \theta^{(2)}_0 &+& \theta^{(2)}_1 \mathbf y &+& \theta^{(2)}_2 \color{var(--nord15)}{\mathbf x_1} &+& \theta^{(2)}_3 \color{var(--nord15)}{\mathbf x_3} &+& \ldots &+& \boldsymbol\varepsilon \\ \color{var(--nord15)}{\mathbf x_3} &= \theta_0^{(3)} &+& \theta_1^{(3)} \mathbf y &+& \theta_2^{(3)} \color{var(--nord15)}{\mathbf x_1} &+& \theta_3^{(3)} \color{var(--nord15)}{\mathbf x_2} &+& \ldots &+& \boldsymbol\varepsilon \end{alignat}
16

The most common approach to handle such multivariate missingness is the MICE algorithm, which is also called imputation using a fully conditional specification because that is exactly what it does.

It follows the idea of the Gibbs sampler to obtain a sample from the multivariate distribution by iterative sampling from the full conditional distribution of each of the incomplete variables.

In practice, these full conditionals are typically specified using regression models that include all other variables in the linear predictor.

An implication of this is that we assume linear associations between the dependent variables, here, the incomplete variables, and the independent variables, in our case the other covariates and the response of our analysis model of interest.

[5:15 min]

Potential Issues with MICE

17

A Simple Example

Implied Assumption:
Linear association between \color{var(--nord15)}{\mathbf x_1} and \mathbf y:

\color{var(--nord15)}{\mathbf x_1} = \theta_0 + \bbox[#3B4252, 2pt]{\theta_1 \mathbf y} + \theta_2 \mathbf x_2 + \theta_3 \mathbf x_3 + \ldots

18

One thing that is implied when we use such simple regression models for the imputation is that there is a linear association between incompl. covariate and the response (and other covariates).

A Simple Example

Implied Assumption:
Linear association between \color{var(--nord15)}{\mathbf x_1} and \mathbf y:

\color{var(--nord15)}{\mathbf x_1} = \theta_0 + \bbox[#3B4252, 2pt]{\theta_1 \mathbf y} + \theta_2 \mathbf x_2 + \theta_3 \mathbf x_3 + \ldots


But what if \mathbf y = \beta_0 + \bbox[#3B4252, 2pt]{\beta_1 \color{var(--nord15)}{\mathbf x_1} + \beta_2 \color{var(--nord15)}{\mathbf x_1}^2} + \beta_3 \mathbf x_2 + \ldots

18

One thing that is implied when we use such simple regression models for the imputation is that there is a linear association between incompl. covariate and the response (and other covariates).

But that is of course not always the case. What if we have a setting where we assume that there is a non-linear association, for example quadratic?

Non-linear Associations

  • true association: non-linear
  • imputation assumption: linear

19

But that is, of course, not always the case. We might, for example, have a quadratic association between the response y and the incomplete covariate x_1.


Non-linear Associations

  • true association: non-linear
  • imputation assumption: linear

19

But that is, of course, not always the case. We might, for example, have a quadratic association between the response y and the incomplete covariate x_1.


If we now impute missing values in x_1 using a standard regression model, all the imputed values follow a linear relationship.


Non-linear Associations

  • true association: non-linear
  • imputation assumption: linear

} ⇨ bias!

19

But that is, of course, not always the case. We might, for example, have a quadratic association between the response y and the incomplete covariate x_1.


If we now impute missing values in x_1 using a standard regression model, all the imputed values follow a linear relationship.


And even if we correctly assume the non-linear association in the analysis model that we fit on the imputed data, we will get biased results.

I'm talking about a quadratic effect here because I can visualize it more easily than the non-linear relationship implied by the survival part of our joint model.

Time-to-Event Outcomes

(Simple) Proportional Hazards Model: h_i(t) = h_0(t) \exp\left(\color{var(--nord15)}{\mathbf x_i}^\top \boldsymbol\beta^{(tc)}\right)

Likelihood p(T_i, \delta_i \mid \color{var(--nord15)}{\mathbf x_i}, \boldsymbol\beta^{(tc)}) = \left[h_0(T_i) \exp \left\{ \color{var(--nord15)}{\mathbf x_i} \boldsymbol\beta^{(tc)}\right\}\right]^{\delta_i} \exp \left[-\int_0^{T_i} h_0(s)\exp\left( \color{var(--nord15)}{\mathbf x_i} \boldsymbol\beta^{(tc)}\right)ds\right]

20

But we can see the issue when we look at this simple proportional hazards model (I've excluded the time-varying part for now) and the corresponding likelihood for a subject i.

This likelihood describes a complex and non-linear relationship between the response (in this case, the event time T and event indicator \delta) and incomplete covariates x.


Time-to-Event Outcomes

(Simple) Proportional Hazards Model: h_i(t) = h_0(t) \exp\left(\color{var(--nord15)}{\mathbf x_i}^\top \boldsymbol\beta^{(tc)}\right)

Likelihood p(T_i, \delta_i \mid \color{var(--nord15)}{\mathbf x_i}, \boldsymbol\beta^{(tc)}) = \left[h_0(T_i) \exp \left\{ \color{var(--nord15)}{\mathbf x_i} \boldsymbol\beta^{(tc)}\right\}\right]^{\delta_i} \exp \left[-\int_0^{T_i} h_0(s)\exp\left( \color{var(--nord15)}{\mathbf x_i} \boldsymbol\beta^{(tc)}\right)ds\right]

 

Inconsistent with the (naive) imputation model: \color{var(--nord15)}{\mathbf x_1} = \theta_0 + \theta_1 \mathbf T + \theta_2 \boldsymbol \delta + \theta_3 \mathbf x_2 + \ldots

20

But we can see the issue when we look at this simple proportional hazards model (I've excluded the time-varying part for now) and the corresponding likelihood for a subject i.

This likelihood describes a complex and non-linear relationship between the response (in this case, the event time T and event indicator \delta) and incomplete covariates x.


But the regression model we would use to impute values in, say x_1, would look like this and imply a linear association.

This inconsistency between the analysis model and imputation model will result in bias.

[6:30 min]

Multi-level Data

 
\mathbf y \color{var(--nord15)}{\mathbf x_1} \mathbf x_2 \mathbf x_3

i
i
i
\vdots \vdots \vdots \vdots
21

We have multi-level data when we have measured the same variable repeatedly in the same patient, but also when we have a clustering structure in our data, for example in a multi-center study. In both cases, observations from the same patient, or the same cluster are not independent.

We typically represent this type of data in long format, so that we now have multiple rows that belong to the same patient.

Multi-level Data

For example: p(\color{var(--nord15)}{\text{sex}} \mid \text{age}, ..., \underset{\text{time varying}}{\underbrace{\text{ALT}, \text{AST}, ...}})

age sex time ALT AST \ldots

\ldots
\ldots
i \ldots
i \ldots
i \ldots
\ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots \vdots
22

But also the other component of our data, the longitudinally measured biomarkers, give us an additional challenge when dealing with the missing values in the baseline covariates.

Because we have unbalanced biomarker measurements, our data is in long format where multiple rows belong to the same patient.

To impute missing values in baseline covariates, for example, in sex, we need the distribution of that covariate, conditional on everything else, which includes the longitudinally measured biomarker values, for example, AST and ALT.

Given that sex does not change over time, the imputed values should also all be the same in the rows belonging to the same patient.


Multi-level Data

For example: p(\color{var(--nord15)}{\text{sex}} \mid \text{age}, ..., \underset{\text{time varying}}{\underbrace{\text{ALT}, \text{AST}, ...}})

time-varying imputations for time-constant covariates!
ignore correlation between repeated observations?

age sex time ALT AST \ldots

\ldots
\ldots
i \ldots
i \ldots
i \ldots
\ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots \vdots
22

But also the other component of our data, the longitudinally measured biomarkers, give us an additional challenge when dealing with the missing values in the baseline covariates.

Because we have unbalanced biomarker measurements, our data is in long format where multiple rows belong to the same patient.

To impute missing values in baseline covariates, for example, in sex, we need the distribution of that covariate, conditional on everything else, which includes the longitudinally measured biomarker values, for example, AST and ALT.

Given that sex does not change over time, the imputed values should also all be the same in the rows belonging to the same patient.


If we would now apply the MICE algorithm to our data in long format, we'd get time-varying imputations for the time-constant covariates. And if we're not careful, we might also ignore that rows belonging to the same patient are correlated.


Multi-level Data

For example: p(\color{var(--nord15)}{\text{sex}} \mid \text{age}, ..., \underset{\text{time varying}}{\underbrace{\text{ALT}, \text{AST}, ...}})

time-varying imputations for time-constant covariates!
ignore correlation between repeated observations?

  • Average imputed values?
  • Summarize time-varying variables ⇨ wide format?

 

Imputation of baseline covariates is not straightforward.

age sex time ALT AST \ldots

\ldots
\ldots
i \ldots
i \ldots
i \ldots
\ldots
\ldots
\ldots
\vdots \vdots \vdots \vdots \vdots
22

But also the other component of our data, the longitudinally measured biomarkers, give us an additional challenge when dealing with the missing values in the baseline covariates.

Because we have unbalanced biomarker measurements, our data is in long format where multiple rows belong to the same patient.

To impute missing values in baseline covariates, for example, in sex, we need the distribution of that covariate, conditional on everything else, which includes the longitudinally measured biomarker values, for example, AST and ALT.

Given that sex does not change over time, the imputed values should also all be the same in the rows belonging to the same patient.


If we would now apply the MICE algorithm to our data in long format, we'd get time-varying imputations for the time-constant covariates. And if we're not careful, we might also ignore that rows belonging to the same patient are correlated.


So what could we do?

We could try to average the repeated imputations of baseline covariates, but that would imply that all biomarker values are equally related to baseline variables, irrespective of when they were measured (and in our data we have follow-up of more than 20 years...).

Or we could somehow summarize time-varying variables to get our data into wide format.

But neither strategy really captures the hierarchical structure of the data.

[7:45 min]

Non-linear Associations & Multi-level

In settings with non-linear associations and multi-level data the correct predictive distribution p(\color{var(--nord15)}{\mathbf x_{mis}} \mid \text{everything else}) may not have a closed form.


We cannot easily specify the correct imputation model directly.
23

So, in summary, whenever we have a non-linear association between the response and incomplete covariates, including time-to-event outcomes, or when we have multi-level data, it is not straightforward to specify the correct imputation model, meaning the correct predictive distribution of the missing values, directly because usually they do not have a closed form.

A Fully Bayesian Alternative

24

Getting the Correct Distribution

  • We need \;p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \mathbf y, \ldots)
  • We know \;p(\mathbf y \mid \color{var(--nord15)}{\mathbf X_{mis}}, \ldots)
  • \mathbf X_{mis}: incomplete covariate(s)
  • \mathbf y: outcome(s)
  • \ldots: everything else
25

The interesting question now is: How do we get this predictive distribution? What we need is the distribution of the incomplete variable x conditional on the response and everything else.

And what we do know is the distribution of the response conditional on the covariates and everything else, because this is the analysis model that we have specified already.

And here Bayes comes to the rescue.

Getting the Correct Distribution

  • We need \;p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \mathbf y, \ldots)
  • We know \;p(\mathbf y \mid \color{var(--nord15)}{\mathbf X_{mis}}, \ldots)
  • \mathbf X_{mis}: incomplete covariate(s)
  • \mathbf y: outcome(s)
  • \ldots: everything else

Bayes Theorem:

\begin{align} p(\color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta \mid \mathbf y, \mathbf X_{obs}) = \frac{p(\mathbf y, \mathbf X_{obs} \mid \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\; p(\color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}{p\left(\mathbf y, \mathbf X_{obs}\right)} &\propto \underset{\text{joint distribution}}{\underbrace{p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}} \end{align}

25

The interesting question now is: How do we get this predictive distribution? What we need is the distribution of the incomplete variable x conditional on the response and everything else.

And what we do know is the distribution of the response conditional on the covariates and everything else, because this is the analysis model that we have specified already.

And here Bayes comes to the rescue.

And so, in the Bayesian approach, we go a different route.

From Bayes theorem, we can derive that the posterior distribution of the missing values and parameters \theta is proportional to the joint distribution of all our data, observed and unobserved, and the parameters.

I use y here to represent the response of an analysis model of interest, whatever that might be.


Getting the Correct Distribution

  • We need \;p(\color{var(--nord15)}{\mathbf X_{mis}} \mid \mathbf y, \ldots)
  • We know \;p(\mathbf y \mid \color{var(--nord15)}{\mathbf X_{mis}}, \ldots)
  • \mathbf X_{mis}: incomplete covariate(s)
  • \mathbf y: outcome(s)
  • \ldots: everything else

Bayes Theorem:

\begin{align} p(\color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta \mid \mathbf y, \mathbf X_{obs}) = \frac{p(\mathbf y, \mathbf X_{obs} \mid \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\; p(\color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}{p\left(\mathbf y, \mathbf X_{obs}\right)} &\propto \underset{\text{joint distribution}}{\underbrace{p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}} \end{align}


This joint distribution does not have a closed form either ⇨ factorization!

25

The interesting question now is: How do we get this predictive distribution? What we need is the distribution of the incomplete variable x conditional on the response and everything else.

And what we do know is the distribution of the response conditional on the covariates and everything else, because this is the analysis model that we have specified already.

And here Bayes comes to the rescue.

And so, in the Bayesian approach, we go a different route.

From Bayes theorem, we can derive that the posterior distribution of the missing values and parameters \theta is proportional to the joint distribution of all our data, observed and unobserved, and the parameters.

I use y here to represent the response of an analysis model of interest, whatever that might be.


This joint distribution does not have a closed-form either, but we can specify it conveniently as a product of conditional distributions.

[8:30 min]

Factorization of the Joint Distribution


p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\;\; p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)\;\; p(\boldsymbol\theta)

26

We can, for instance, split it into the distribution for the response y conditional on the covariates, the distribution of the covariates, observed and missing, and the distribution of the parameter vector \boldsymbol\theta.


Factorization of the Joint Distribution


p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\;\; p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)\;\; p(\boldsymbol\theta)

p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}\;\; \underset{\text{covariate model(s)}}{\underbrace{p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)}}\;\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}
26

We can, for instance, split it into the distribution for the response y conditional on the covariates, the distribution of the covariates, observed and missing, and the distribution of the parameter vector \boldsymbol\theta.


By factorizing the joint distribution like this, we have written it as the product of our analysis model, a model for the covariates, and the prior distributions for the parameters.

The nice thing about this division is that it works irrespective of how complex our analysis model of interest is.


Factorization of the Joint Distribution


p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)\;\; p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)\;\; p(\boldsymbol\theta)

p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}\;\; \underset{\text{covariate model(s)}}{\underbrace{p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)}}\;\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}


For Example:

p(\mathbf y, \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3, \boldsymbol\theta)}}\; \underset{\text{covariate model(s)}}{\underbrace{p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_3 \mid \boldsymbol\theta)}}\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}

26

We can, for instance, split it into the distribution for the response y conditional on the covariates, the distribution of the covariates, observed and missing, and the distribution of the parameter vector \boldsymbol\theta.


By factorizing the joint distribution like this, we have written it as the product of our analysis model, a model for the covariates, and the prior distributions for the parameters.

The nice thing about this division is that it works irrespective of how complex our analysis model of interest is.


And we can use this principle for our example. So we can write the joint distribution of y and x_1, x_2 and x_3 as the conditional distribution of y given the covariates., the distribution of the covariates conditional on the parameters, and, of course, prior distributions for those parameters.

Factorization of the Joint Distribution


p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}\;\; \underset{\text{covariate model(s)}}{\underbrace{p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)}}\;\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}


(Multivariate) joint model for longitudinal and survival data:

p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf X}, \mathbf b, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{ \underset{\substack{\text{survival}\\\text{model}}}{\underbrace{ p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol \theta)}}\;\; \underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{ p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)\; p(\mathbf b\mid \boldsymbol \theta)}} }}\; \bbox[#242933, 3pt, border: 2px solid #242933]{\underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{ p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta)}}}\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}

27

For example, for our multivariate joint model, the analysis model consists of multiple parts, the time-to-event sub-model and the model for the longitudinal biomarkers, which also involves the specification of the distribution of the random effects \mathbf b.

Factorization of the Joint Distribution


p(\mathbf y, \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}, \boldsymbol\theta)}}\;\; \underset{\text{covariate model(s)}}{\underbrace{p(\mathbf X_{obs}, \color{var(--nord15)}{\mathbf X_{mis}}\mid \boldsymbol\theta)}}\;\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}


(Multivariate) joint model for longitudinal and survival data:

p(\mathbf T, \boldsymbol\delta, \mathbf y, \color{var(--nord15)}{\mathbf X}, \mathbf b, \boldsymbol\theta) = \underset{\text{analysis model}}{\underbrace{ \underset{\substack{\text{survival}\\\text{model}}}{\underbrace{ p(\mathbf T, \boldsymbol\delta \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol \theta)}}\;\; \underset{\substack{\text{(multivariate)}\\\text{longitudinal}\\\text{model}}}{\underbrace{ p(\mathbf y \mid \mathbf b, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta)\; p(\mathbf b\mid \boldsymbol \theta)}} }}\; \bbox[#2E3440, 3pt, border: 2px solid #88C0D0]{\underset{\substack{\text{covariate}\\\text{models}}}{\underbrace{ p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta)}}}\; \underset{\text{priors}}{\underbrace{p(\boldsymbol\theta)}}

27

Most of this is straightforward to specify because it is the same that we'd need to specify when doing a Bayesian analysis on complete data.

The only extra thing we need to do is specify the distribution of the covariates.

Covariate Models

When \color{var(--nord15)}{\mathbf X} is multivariate & of mixed type ⇨ no closed form

⇨ Factorization!
28

But we still have a multivariate distribution with no closed-form. The trick is to apply the same strategy as for the joint distribution and to specify it as a product of conditional distributions.


Covariate Models

When \color{var(--nord15)}{\mathbf X} is multivariate & of mixed type ⇨ no closed form

⇨ Factorization!

For example

\begin{align} p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \color{var(--nord15)}{\mathbf x_3(t)}, \mathbf x_4(t) \mid \boldsymbol\theta) =\, & p(\color{var(--nord15)}{\mathbf x_3(t)} \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_4(t), \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\ & p(\mathbf x_4(t) \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\ & p(\color{var(--nord15)}{\mathbf x_1} \mid \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLM}}\\ & \color{var(--nord3)}{p(\mathbf x_2 \mid \boldsymbol\theta)} & \color{var(--nord3)}{\small\text{(not needed)}} \end{align}

28

But we still have a multivariate distribution with no closed-form. The trick is to apply the same strategy as for the joint distribution and to specify it as a product of conditional distributions.


For example, if we had 4 covariates, x_1 and x_3 having missing values, and x_1 and x_2 being baseline covariates while x_3 and x_4 are time-varying, we could specify the multivariate distribution using this sequence of models.

Here I always condition on those other covariates for which I haven't specified a model yet.

And when we are smart about the order in which we take the covariates, we can start with the models for time-varying covariates because then we don't need to include those anymore into the linear predictors of the subsequent models for the baseline covariates.

This solves our issue with the hierarchical structure of the data.


Covariate Models

When \color{var(--nord15)}{\mathbf X} is multivariate & of mixed type ⇨ no closed form

⇨ Factorization!

For example

\begin{align} p(\color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \color{var(--nord15)}{\mathbf x_3(t)}, \mathbf x_4(t) \mid \boldsymbol\theta) =\, & p(\color{var(--nord15)}{\mathbf x_3(t)} \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \mathbf x_4(t), \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\ & p(\mathbf x_4(t) \mid \color{var(--nord15)}{\mathbf x_1}, \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLMM}}\\ & p(\color{var(--nord15)}{\mathbf x_1} \mid \mathbf x_2, \boldsymbol\theta) & \color{var(--nord3)}{\small\text{e.g., GLM}}\\ & \color{var(--nord3)}{p(\mathbf x_2 \mid \boldsymbol\theta)} & \color{var(--nord3)}{\small\text{(not needed)}} \end{align}

 

covariate models \neq imputation models!
28

But we still have a multivariate distribution with no closed-form. The trick is to apply the same strategy as for the joint distribution and to specify it as a product of conditional distributions.


For example, if we had 4 covariates, x_1 and x_3 having missing values, and x_1 and x_2 being baseline covariates while x_3 and x_4 are time-varying, we could specify the multivariate distribution using this sequence of models.

Here I always condition on those other covariates for which I haven't specified a model yet.

And when we are smart about the order in which we take the covariates, we can start with the models for time-varying covariates because then we don't need to include those anymore into the linear predictors of the subsequent models for the baseline covariates.

This solves our issue with the hierarchical structure of the data.


An important thing to realize is that these models here are not the imputation models. They are still part of the specification of the joint distribution of all the data.

[10:45 min]

Imputation Models

imputation
models
 
\propto
joint
distribution
 
=
analysis
model
 
covariate
models
 
priors

Gibbs sampler
29

The imputation models are then derived as full-conditional distributions from this joint distribution in the same way we do it with the posterior distributions of the parameters, typically using Gibbs sampling,


Imputation Models

imputation
models
 
\propto
joint
distribution
 
=
analysis
model
 
covariate
models
 
priors

Gibbs sampler
MICE / FCS

29

The imputation models are then derived as full-conditional distributions from this joint distribution in the same way we do it with the posterior distributions of the parameters, typically using Gibbs sampling,


The important difference to the MICE algorithm is that in the Bayesian approach, we do specify the joint distribution and derive the imputation models as full conditionals from this joint distribution. The models are usually much more complex than the ones we specify when using MICE and can provide a sample from the joint distribution of the missing values, even in complex settings.

In MICE, we specify the imputation models directly, which means they typically have a relatively simple structure, and they could be incompatible with each other and the analysis model. Hence they may not produce a joint distribution that approximates the correct joint distribution well enough.

[11:30 min]

Imputation Models

imputation
models
 
\propto
joint
distribution
 
=
analysis
model
 
covariate
models
 
priors

Gibbs sampler
MICE / FCS

Metropolis-Hastings
29

The imputation models are then derived as full-conditional distributions from this joint distribution in the same way we do it with the posterior distributions of the parameters, typically using Gibbs sampling,


The important difference to the MICE algorithm is that in the Bayesian approach, we do specify the joint distribution and derive the imputation models as full conditionals from this joint distribution. The models are usually much more complex than the ones we specify when using MICE and can provide a sample from the joint distribution of the missing values, even in complex settings.

In MICE, we specify the imputation models directly, which means they typically have a relatively simple structure, and they could be incompatible with each other and the analysis model. Hence they may not produce a joint distribution that approximates the correct joint distribution well enough.

[11:30 min]

Advantages of the Bayesian Approach

p(\mathbf y, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta) = \bbox[#2E3440, 5pt]{\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf X}, \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}})}}}\; p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta_x)\; p(\color{var(--nord14}{\boldsymbol\theta_{y\mid x}}, \boldsymbol\theta_x)\qquad \color{var(--nord3)}{\text{with } \boldsymbol\theta = (\boldsymbol\theta_{y\mid x}, \boldsymbol\theta_x)}

  • specification of the joint distribution
    ⇨ assures compatibility of imputation models
30

This is one of the advantages of the Bayesian approach. Because we specify the joint distribution, we can ensure that all imputation models are compatible with each other.


Advantages of the Bayesian Approach

p(\mathbf y, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta) = \bbox[#2E3440, 5pt]{\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf X}, \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}})}}}\; p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta_x)\; p(\color{var(--nord14}{\boldsymbol\theta_{y\mid x}}, \boldsymbol\theta_x)\qquad \color{var(--nord3)}{\text{with } \boldsymbol\theta = (\boldsymbol\theta_{y\mid x}, \boldsymbol\theta_x)}

  • specification of the joint distribution
    ⇨ assures compatibility of imputation models

  • use of the analysis model in the specification of the joint distribution
    parameters of interest \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}} are estimated directly
    non-linear associations are taken into account
    ⇨ assures congeniality

30

This is one of the advantages of the Bayesian approach. Because we specify the joint distribution, we can ensure that all imputation models are compatible with each other.


And because the analysis model is part of the specification of the joint distribution, we can be sure that the imputation models are also compatible with the analysis model, even when there are non-linear associations with covariates.

In addition, the parameters of interest, so the parameters of the analysis model, are directly estimated. We do not have separate steps for imputation and analysis, but everything happens simultaneously in the same procedure.


Advantages of the Bayesian Approach

p(\mathbf y, \color{var(--nord15)}{\mathbf X}, \boldsymbol\theta) = \bbox[#2E3440, 5pt]{\underset{\text{analysis model}}{\underbrace{p(\mathbf y \mid \color{var(--nord15)}{\mathbf X}, \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}})}}}\; p(\color{var(--nord15)}{\mathbf X} \mid \boldsymbol\theta_x)\; p(\color{var(--nord14}{\boldsymbol\theta_{y\mid x}}, \boldsymbol\theta_x)\qquad \color{var(--nord3)}{\text{with } \boldsymbol\theta = (\boldsymbol\theta_{y\mid x}, \boldsymbol\theta_x)}

  • specification of the joint distribution
    ⇨ assures compatibility of imputation models

  • use of the analysis model in the specification of the joint distribution
    parameters of interest \color{var(--nord14)}{\boldsymbol\theta_{y\mid x}} are estimated directly
    non-linear associations are taken into account
    ⇨ assures congeniality

  • response is not in any linear predictor
    ⇨ no problem to use complex outcomes

30

This is one of the advantages of the Bayesian approach. Because we specify the joint distribution, we can ensure that all imputation models are compatible with each other.


And because the analysis model is part of the specification of the joint distribution, we can be sure that the imputation models are also compatible with the analysis model, even when there are non-linear associations with covariates.

In addition, the parameters of interest, so the parameters of the analysis model, are directly estimated. We do not have separate steps for imputation and analysis, but everything happens simultaneously in the same procedure.


Because we can specify the joint distribution so that one of the conditional models is the analysis model, we can usually make sure that the outcome does not need to appear in the linear predictor of models for covariates. This makes it possible to use this approach in settings where we have complex outcomes that could not be easily included in the linear predictor of some or all of the covariates.

[12:30 min]

In practice: Package JointAI

Specification
of the joint
distribution:
User
analysis
model

 

JointAI
covariate
models
 
priors

31

To use this Bayesian approach, for example, to fit the multivariate joint model in the motivating example, you can use the R package JointAI, short for Joint Analysis and Imputation.

To specify the joint distribution, the user only has to specify the analysis model of interest, and the software sets defaults for the covariate models and priors.

Several different analysis models are available, and the specification is almost the same as for standard functions in R like lm() or glm().


In practice: Package JointAI

Specification
of the joint
distribution:
User
analysis
model

 

JointAI
covariate
models
 
priors


Estimation:
JointAI
pre-processing

 

rjags / JAGS
MCMC sampling

 

JointAI
post-processing
31

To use this Bayesian approach, for example, to fit the multivariate joint model in the motivating example, you can use the R package JointAI, short for Joint Analysis and Imputation.

To specify the joint distribution, the user only has to specify the analysis model of interest, and the software sets defaults for the covariate models and priors.

Several different analysis models are available, and the specification is almost the same as for standard functions in R like lm() or glm().


The model estimation is done via MCMC with the help of JAGS, an external and freely available software that does Gibbs sampling.

[13:15]

JointAI: Model Types & Specification

Univariate Models:

  • lm_imp()
  • glm_imp()
  • clm_imp()
  • mlogit_imp()
  • betareg_imp()
  • lognorm_imp()

Mixed Models

  • lme_imp()
  • glme_imp()
  • clmm_imp()
  • mlogitmm_imp()
  • betamm_imp()
  • lognormmm_imp()

Survival Models

  • coxph_imp()
  • survreg_imp()
  • JM_imp()

For Example:

mod <- lm_imp(SBP ~ gender + age + alc * creat, data = NHANES, n.iter = 200)

 

32

Summary

33

In Comparison

MICE Joint Model MI Bayesian Analysis
separate imputation & analysis simultaneous analysis & imputation
direct specification of imputation model indirect specification of the imputation model
  • simple settings
  • simple settings
  • multi-level data
  • non-linear associations
  • time-to-event outcomes
  • multi-level data
  • more complex analyses
  • non-linear associations
  • complex outcomes / data structure
  • non-linear associations
  • many incomplete variables / complex random effects
  • very large datasets
mice jomo JointAI
34

MICE and Joint Model imputation are two different options to perform the imputation step in a multiple imputation procedure. This means, that in both cases the imputation is completely separate from the analysis. This separation can be convenient, when the same incomplete data is used in multiple analyses, but it is also introduces the risk of having imputation models that are not compatible with the analysis, as for example when there is a non-linear association in the analysis model.

In the Bayesian approach, analysis and imputation are combined. This combination assures that the imputation and analysis models do not contradict each other. It is, however, possible to extract the imputed values sampled in the Bayesian approach so that this method could also serve as the imputation step in a multiple imputation.

In MICE and the Joint model imputation we specify the imputation models directly. This is what makes these approaches difficult to use when the incomplete variables do not just have simple linear associations with the other variables.

In the Bayesian approach we specify the likelihood for the data, but the imputed values are sampled from the posterior distribution that is derived from the likelihood and the prior. With all the advanced sampling techniques available nowadays, this also works when the posterior does not have a closed form, and so this approach is well suited for settings with complex associations., while MICE is better suited for simpler settings.

Joint model imputation can handle multi-level settings, but assumes linear associations between all sub-models.

Because the Bayesian approach is more computationally intensive, it may be less well suited for very large datasets.

All three approaches are available in R, as the R packages mice, jomo and JointAI.

Summary

Imputation requires sampling from p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs}).

35

Summary

Imputation requires sampling from p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs}).

Multiple Imputation:

  • Practical approach do handle the uncertainty about the missing values:
    ⇨ Sample just a "few" values from p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs}).
  • Pooling to take into account within & between imputation variation.
35

Summary

Imputation requires sampling from p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs}).

Multiple Imputation:

  • Practical approach do handle the uncertainty about the missing values:
    ⇨ Sample just a "few" values from p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs}).
  • Pooling to take into account within & between imputation variation.

 

For models that involve

  • survival outcomes
  • other non-linear associations
  • multi-level data

it is difficult to specify p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs}) directly.

35

Summary

MICE / FCS
Approximate p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs}) by sampling from directly specified full-conditionals p(\color{var(--nord15)}{\mathbf x_k} \mid \cdot).

  • incompatibility / uncongeniality issues
  • bias
36

MICE tries to approximate this distribution using the idea of the Gibbs sampler. But because the full-conditionals may not be a good approximation of the correct full conditionals, it has issues with incompatibility and uncongeniality in complex settings and may result in bias.


Summary

MICE / FCS
Approximate p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}} \mid \mathbf{\mathcal D}_{obs}) by sampling from directly specified full-conditionals p(\color{var(--nord15)}{\mathbf x_k} \mid \cdot).

  • incompatibility / uncongeniality issues
  • bias

 

Fully Bayesian approach

  • Specification of the joint distribution p(\color{var(--nord15)}{\mathbf{\mathcal D}_{mis}}, \mathbf{\mathcal D}_{obs}, \boldsymbol \theta) via factorization.
  • Derive p(\color{var(--nord15)}{\mathbf x_k} \mid \cdot) from this joint distribution.
    • theoretically valid, compatible & congenial

... MAR, model fit, other model assumptions, ...

36

MICE tries to approximate this distribution using the idea of the Gibbs sampler. But because the full-conditionals may not be a good approximation of the correct full conditionals, it has issues with incompatibility and uncongeniality in complex settings and may result in bias.


Using a fully Bayesian approach, we can specify the joint distribution as a sequence of conditional models that gives us a number of advantages. The imputation models are then derived from this joint distribution, resulting in a theoretically valid approach that assures compatibility and congeniality.

Thank you for your attention!

36

Outline

  • Motivation / Setting the Stage
  • (Multiple) Imputation & MICE / FCS
  • Potential Issues with MICE
  • A Fully Bayesian Alternative
  • Summary
1
Paused

Help

Keyboard shortcuts

, , Pg Up, k Go to previous slide
, , Pg Dn, Space, j Go to next slide
Home Go to first slide
End Go to last slide
Number + Return Go to specific slide
b / m / f Toggle blackout / mirrored / fullscreen mode
c Clone slideshow
p Toggle presenter mode
t Restart the presentation timer
?, h Toggle this help
oTile View: Overview of Slides
Esc Back to slideshow