Software
The output in this practical was generated using R version 4.2.1 (2022-06-23 ucrt) and JointAI version 1.0.3.9000 (with rjags version 4.13 and JAGS version 4.3.1).

Data

load("Data/datHCV.RData")

We continue to work with the simulated dataset that is based on a cohort of patients with chronic hepatitis C, "datHCV":

  • 250 patients (id) from 10 hospitals (center)

  • 985 rows in the data

  • potential additional grouping factor Genotype (crossed with id)

  • 4 repeatedly measured variables: logCreatinin, Albumin, logFIB4, Creatinin

  • time variable: time

  • 10 baseline covariates: Age0, Sex, alc_pwk, AntiHBc, DM, race, Weight, Height, Alcoholabuse, BMI

  • time-to-event outcome: event (values: FALSE, TRUE) and corresponding event/censoring time etime

The data has the following distribution

library("JointAI")
par(mar = c(3, 3, 2, 1), mgp = c(2, 0.6, 0))
plot_all(datHCV, idvars = c("id", "center", "Genotype"), breaks = 50)

and patterns of missingness:

Missing data pattern of the time-constant variables:

md_pattern(subset(datHCV, select = -c(logFIB4, Albumin, logCreatinin, time)))

Missing data pattern of the longitudinal variables:

md_pattern(subset(datHCV, select = c(logFIB4, Albumin, logCreatinin, time)))

(Generalized) Linear Mixed Models

The specification of (generalized) linear mixed models in JointAI is analogue to the specification in the packages nlme and lme4.

JointAI provides functions

  • lme_imp(), lmer_imp(): linear mixed model
  • glme_imp(), glmer_imp(): generalized linear mixed model
  • betamm_imp(): beta mixed model
  • lognormmm_imp(): log-normal mixed model
  • clmm_imp(): cumulative logit mixed model
  • mlogitmm_imp(): multinomial logit mixed model

Random Effects Specification

Random effects can be specified either as in nlme using arguments fixed and random, or as in lme4 (e.g., ... + (time | id)).

A few examples:

  • random = ~ 1 | id: random intercept only, with id as grouping variable
  • random = ~ time | id: random intercept and slope for variable time
  • random = ~ time + I(time^2) | id: random intercept, slope and quadratic random effect for time
  • random = ~ time + x | id random intercept, random slope for time and random effect for variable x

The corresponding specifications using the argument formula would be

  • <fixed effects> + (1 | id)
  • <fixed effects> + (time | id)
  • <fixed effects> + (time + I(time^2) | id)
  • <fixed effects> + (time + x | id)

For hierarchical data with ≥ 2 levels, the lme4 specification has to be used.

Crossed vs nested random effects

There is no distinction between crossed and nested random effects in how they are specified (i.e., (1 | id) + (1 | center) is the same as (1 | center/id)), however, it is important that the grouping variables are coded appropriately: nested random effects cannot re-use the same IDs in different clusters.

This data structure will result in nested random effects:
id center
1 A
2 A
3 A
4 B
5 B
6 B
This data structure will result in crossed random effects:
id center
1 A
2 A
3 A
1 B
2 B
3 B

Covariate Model Types

For repeatedly measured covariates the following model types are available:
lmm linear (normal) mixed model with identity link (alternatively: glmm_gaussian_identity); default for continuous variables
glmm_gaussian_log linear (normal) mixed model with log link
glmm_gaussian_inverse linear (normal) mixed model with inverse link
glmm_logit logistic mixed model for binary data (alternatively: glmm_binomial_logit); default for binary variables
glmm_probit probit model for binary data (alternatively: glmm_binomial_probit)
glmm_binomial_log binomial mixed model with log link
glmm_binomial_cloglog binomial mixed model with complementary log-log link
glmm_gamma_inverse gamma mixed model with inverse link for skewed continuous data
glmm_gamma_identity gamma mixed model with identity link for skewed continuous data
glmm_gamma_log gamma mixed model with log link for skewed continuous data
glmm_poisson_log Poisson mixed model with log link for count data
glmm_poisson_identity Poisson mixed model with identity link for count data
glmm_lognorm log-normal mixed model for skewed covariates
glmm_beta beta mixed model for continuous data in (0, 1)
mlogitmm multinomial logit mixed model for unordered categorical variables; default for unordered factors with >2 levels
clmm cumulative logit mixed model for ordered factors; default for ordered factors

A Simple Linear Mixed Model

In the datHCV data, there are 250 subjects (id) nested within 10 hospitals (center), i.e., there 250 unique values of id.

In addition, the variable Genotype (the genotype of the hepatitis C virus – admittedly a bit of an artificial example…) can be seen as grouping factor which is crossed with center, i.e., all repeated measurements with the same id have the same value for Genotype, but subjects within the same center may have different Genotypes, and all values of Genotype may occur at any center.

Task

Fit a linear mixed model for the response logFIB4 with fixed effects for Age0, Sex, time and AntiHBc, Albumin and DM with a random intercept for id and a random effect for time.

Inspect which models for the covariates JointAI has specified.

You could either use the function list_models() or get just the types of models via the models element of the fitted JointAI object.

Solution

The model can be specified as

lme1 <- lme_imp(logFIB4 ~ Age0 + Sex + time + AntiHBc + Albumin + DM +  (time | id),
                data = datHCV, n.iter = 200)

The following specifications also work:

# using lmer_imp(), and lme4-type random effects
lmer_imp(logFIB4 ~ Age0 + Sex + time + AntiHBc + Albumin + DM + (time | id),
         data = datHCV, n.iter = 200)

# using lme_imp() and the argument "random"
lme_imp(logFIB4 ~ Age0 + Sex + time + AntiHBc + Albumin + DM,
        random = ~ time | id, data = datHCV, n.iter = 200)

# using lmer_imp() and the argument "random"
lmer_imp(logFIB4 ~ Age0 + Sex + time + AntiHBc + Albumin + DM,
         random = ~ time | id, data = datHCV, n.iter = 200)

Via the models element of the fitted JointAI object, we can see the types of models that are used. The order in which the models for the covariates given is the sequence in which they are specified, i.e., this indicates which covariate goes into the linear predictor of which other covariate (variables later in the sequence are part of the models for variables that occur earlier in the sequence, but not for variables later in the sequence):

lme1$models
##                  logFIB4                  Albumin                     time                       DM 
## "glmm_gaussian_identity"                    "lmm"                    "lmm"     "glm_binomial_logit" 
##                  AntiHBc 
##     "glm_binomial_logit"

This becomes more obvious when we use list_models()

list_models(lme1, priors = FALSE, regcoef = FALSE, otherpars = FALSE, refcat = FALSE)
## Linear mixed model for "logFIB4" 
##    family: gaussian 
##    link: identity 
## * Predictor variables:
##   (Intercept), Age0, SexFemale, AntiHBcPositive, DMYes, time, Albumin 
## 
## 
## Linear mixed model for "Albumin" 
##    family: gaussian 
##    link: identity 
## * Predictor variables:
##   (Intercept), Age0, SexFemale, AntiHBcPositive, DMYes, time 
## 
## 
## Linear mixed model for "time" 
##    family: gaussian 
##    link: identity 
## * Predictor variables:
##   (Intercept), Age0, SexFemale, AntiHBcPositive, DMYes 
## 
## 
## Binomial model for "DM" 
##    family: binomial 
##    link: logit 
## * Predictor variables:
##   (Intercept), Age0, SexFemale, AntiHBcPositive 
## 
## 
## Binomial model for "AntiHBc" 
##    family: binomial 
##    link: logit 
## * Predictor variables:
##   (Intercept), Age0, SexFemale

We see

  • the model for the response logFIB4 comes first
  • then the models for the repeatedly measured variables Albumin and time
  • last the models for the incomplete baseline covariates DM and AntiHBc

Note that JointAI has automatically specified a model for time, even though there are no missing values in time (but there are no models for the complete variables Age0 and Sex).

Sequence of Covariate Models

In JointAI, models for covariates are arranged

  1. by the hierarchical level of the variable
  2. according to the proportion of missing values (so that variables with the most missing values have models with the most covariates).

Level-1 models:

  • Albumin | time, DM, AntiHBc, Age0, Sex  (28.5% NA)
  • time | DM, AntiHBc, Age0, Sex   (0% NA)

Level-2 models:

  • DM | AntiHBc, Age0, Sex   (31.2% NA)
  • AntiHBc | Age0, Sex   (18.4% NA)

Question

Why do we need models for complete level-1 variables but not for complete level-2 variables?

Answer

The model for time provides the link between this level-1 variable and the (incomplete) level-2 variables.

For example, missing values in DM are imputed from the model \[\begin{eqnarray*} p(\texttt{DM} \mid \texttt{Albumin}, \texttt{time}, \texttt{AntiHBc}, \texttt{Age0}, \texttt{Sex}, \boldsymbol\theta, \mathbf b) &\propto& p(\texttt{Albumin} \mid \texttt{time}, \texttt{DM}, \texttt{AntiHBc}, \texttt{Age0}, \texttt{Sex}, \boldsymbol\theta, \mathbf b) \;\times\\ && p(\texttt{time} \mid \texttt{DM}, \texttt{AntiHBc}, \texttt{Age0}, \texttt{Sex}, \theta, \mathbf b)\;\times\\ && p(\texttt{DM} \mid \texttt{AntiHBc}, \texttt{Age0}, \texttt{Sex}, \boldsymbol\theta)\;\times\\ p(\boldsymbol\theta) \end{eqnarray*}\]

If there was no model for time, this would imply the assumption that DM is independent of time.

Specifically for time variables it may be reasonable to assume that the values of incomplete baseline covariates are independent of the timing of (future) measurements of other variables. However, they are likely associated with the values of other (completely observed) time-varying covariates, e.g., repeatedly measured biomarkers.

Task

By default, JointAI includes models for all lower level variables, complete or incomplete, whenever there are missing values in higher level variables.

The argument no_model can be used to specify the names of variables for which no model should be specified.

Re-fit lme1, but now without the model for time.

Have a look at the JAGS model that JointAI has specified. You can obtain this model via the jagsmodel element of the fitted model object.

Solution

(I set n.iter and n.adapt to 0 to avoid MCMC sampling to save time and because we don’t need the MCMC samples from this model.)

lme2 <- update(lme1, no_model = "time", n.adapt = 0, n.iter = 0)
## 
## Note: No MCMC sample will be created when n.iter is set to 0.
lme2$jagsmodel
## model { 
## 
##    # Normal mixed effects model for logFIB4 ----------------------------------------
##   for (i in 1:985) {
##     M_lvlone[i, 1] ~ dnorm(mu_logFIB4[i], tau_logFIB4)
##     mu_logFIB4[i] <- b_logFIB4_id[group_id[i], 1] +
##                      b_logFIB4_id[group_id[i], 2] * (M_lvlone[i, 3] - spM_lvlone[3, 1])/spM_lvlone[3, 2] +
##                      beta[7] * (M_lvlone[i, 2] - spM_lvlone[2, 1])/spM_lvlone[2, 2]
##   }
## 
##   for (ii in 1:250) {
##     b_logFIB4_id[ii, 1:2] ~ dmnorm(mu_b_logFIB4_id[ii, ], invD_logFIB4_id[ , ])
##     mu_b_logFIB4_id[ii, 1] <- M_id[ii, 3] * beta[1] +
##                               (M_id[ii, 4] - spM_id[4, 1])/spM_id[4, 2] * beta[2] +
##                               M_id[ii, 5] * beta[3] + M_id[ii, 6] * beta[4] +
##                               M_id[ii, 7] * beta[5]
##     mu_b_logFIB4_id[ii, 2] <- beta[6]
##   }
## 
##   # Priors for the model for logFIB4
##   for (k in 1:7) {
##     beta[k] ~ dnorm(mu_reg_norm, tau_reg_norm)
##   }
##   tau_logFIB4 ~ dgamma(shape_tau_norm, rate_tau_norm)
##   sigma_logFIB4 <- sqrt(1/tau_logFIB4)
## 
##   for (k in 1:2) {
##     RinvD_logFIB4_id[k, k] ~ dgamma(shape_diag_RinvD, rate_diag_RinvD)
##   }
##   invD_logFIB4_id[1:2, 1:2] ~ dwish(RinvD_logFIB4_id[ , ], KinvD_logFIB4_id)
##   D_logFIB4_id[1:2, 1:2] <- inverse(invD_logFIB4_id[ , ])
## 
## 
##   # Normal mixed effects model for Albumin ----------------------------------------
##   for (i in 1:985) {
##     M_lvlone[i, 2] ~ dnorm(mu_Albumin[i], tau_Albumin)
##     mu_Albumin[i] <- b_Albumin_id[group_id[i], 1] +
##                      alpha[6] * (M_lvlone[i, 3] - spM_lvlone[3, 1])/spM_lvlone[3, 2]
##   }
## 
##   for (ii in 1:250) {
##     b_Albumin_id[ii, 1:1] ~ dnorm(mu_b_Albumin_id[ii, ], invD_Albumin_id[ , ])
##     mu_b_Albumin_id[ii, 1] <- M_id[ii, 3] * alpha[1] +
##                               (M_id[ii, 4] - spM_id[4, 1])/spM_id[4, 2] * alpha[2] +
##                               M_id[ii, 5] * alpha[3] + M_id[ii, 6] * alpha[4] +
##                               M_id[ii, 7] * alpha[5]
##   }
## 
##   # Priors for the model for Albumin
##   for (k in 1:6) {
##     alpha[k] ~ dnorm(mu_reg_norm, tau_reg_norm)
##   }
##   tau_Albumin ~ dgamma(shape_tau_norm, rate_tau_norm)
##   sigma_Albumin <- sqrt(1/tau_Albumin)
## 
##   invD_Albumin_id[1, 1] ~ dgamma(shape_diag_RinvD, rate_diag_RinvD)T(1e-16, 1e16)
##   D_Albumin_id[1, 1] <- 1 / (invD_Albumin_id[1, 1])
## 
## 
## 
  # Binomial model for DM ---------------------------------------------------------
##   for (ii in 1:250) {
##     M_id[ii, 1] ~ dbern(max(1e-16, min(1 - 1e-16, mu_DM[ii])))
##     logit(mu_DM[ii]) <- M_id[ii, 3] * alpha[7] +
##                        (M_id[ii, 4] - spM_id[4, 1])/spM_id[4, 2] * alpha[8] +
##                        M_id[ii, 5] * alpha[9] + M_id[ii, 6] * alpha[10]
## 
##     M_id[ii, 7] <- ifelse(M_id[ii, 1] == 1, 1, 0)
## 
##   }
## 
##   # Priors for the model for DM
##   for (k in 7:10) {
##     alpha[k] ~ dnorm(mu_reg_binom, tau_reg_binom)
##   }
## 
## 
## 
## 
  # Binomial model for AntiHBc ----------------------------------------------------
##   for (ii in 1:250) {
##     M_id[ii, 2] ~ dbern(max(1e-16, min(1 - 1e-16, mu_AntiHBc[ii])))
##     logit(mu_AntiHBc[ii]) <- M_id[ii, 3] * alpha[11] +
##                             (M_id[ii, 4] - spM_id[4, 1])/spM_id[4, 2] * alpha[12] +
##                             M_id[ii, 5] * alpha[13]
## 
##     M_id[ii, 6] <- ifelse(M_id[ii, 2] == 1, 1, 0)
## 
##   }
## 
##   # Priors for the model for AntiHBc
##   for (k in 11:13) {
##     alpha[k] ~ dnorm(mu_reg_binom, tau_reg_binom)
##   }
##  
##  
}

The JAGS model

The JAGS model that JointAI automatically generated is returned in the jagsmodel element of a JointAI object.

The model consists of the set of sub-models together with specifications of the prior distributions, and calculation of interactions and functions of incomplete variables.

This is all wrapped inside a “model” environment:

model {
  
  ...
  
}

In JointAI, the JAGS model notation follows the following systematic:

  • The index i is used for level-1, index ii for level-2, index iii for level-3, …
  • Vectors group_<level> and pos_<level> (not used in this example) are used to link indexing of different levels of the hierarchy.
  • There is one design matrix per level, named M_<level>, where the first level is called lvlone, and the other levels are named by the corresponding grouping variables (e.g., “id”).
  • Each design matrix has an associated matrix containing scaling parameters, named spM_<level>.
  • Random effects are named b_<response>_<level>
  • beta is the vector of regression coefficients in the analysis model(s), alpha the vector of regression coefficients in models for covariates.


JointAI uses hierarchical centring, i.e., the random effects are centred around a linear predictor and not around zero.

A linear mixed model that would typically be written as

\[\begin{aligned} y_{ij} &\sim \text{N}(\mu_{ij},\; \sigma^2)\\ \mu_{ij} &= \underset{\substack{\text{time constant}\\\text{covariates}}}{\underbrace{\mathbf X_i^{(tc)}\boldsymbol\beta^{(tc)}}} + \beta^{(time)}\text{time}_{ij} + \underset{\substack{\text{time-varying}\\\text{covariates}}}{\underbrace{\mathbf X_{ij}^{(tv)}\boldsymbol\beta^{(tv)}}} + \underset{\substack{\text{interactions of time-}\\\text{constant covariates}\\\text{with time}}}{\underbrace{\mathbf X_{ij}^{(tci)}\boldsymbol\beta^{(tci)}\text{time}_{ij}}} + \underset{\substack{\text{interactions of time-}\\\text{varying covariates}\\\text{with time}}}{\underbrace{\mathbf X_{ij}^{(tvi)}\boldsymbol\beta^{(tvi)}\text{time}_{ij}}} + \underset{\text{random effects}}{\underbrace{b_{i0} + b_{i1}\text{time}_{ij}}}\\ \left[\begin{array}{c} b_{i0}\\b_{i1} \end{array}\right] &\sim \text{MVN}\left(\left[\begin{array}{c} 0\\0 \end{array}\right],\; \mathbf D\right) \end{aligned}\]


is instead specified as


\[\begin{aligned} y_{ij} &\sim \text{N}(\mu_{ij},\; \sigma_y^2)\\ \mu_{ij} &= b_{i0} + b_{i1}\text{time}_{ij} + \mathbf X_{ij}^{(tv)}\boldsymbol\beta^{(tv)} + \mathbf X_{ij}^{(tvi)}\boldsymbol\beta^{(tvi)}\text{time}_{ij}\\ \left[\begin{array}{c} b_{i0}\\b_{i1} \end{array}\right] &\sim \text{MVN}\left(\left[\begin{array}{c} \mathbf X_i^{(tc)}\boldsymbol\beta^{(tc)}\\ \beta^{(time)} + \mathbf X_i^{(tci)}\boldsymbol\beta^{(tci)}\end{array}\right],\; \mathbf D\right) \end{aligned}\]

in JointAI.

In most cases, this results in better convergence/mixing.


Hyper-parameters
The parameters in the prior distributions for regression coefficients and precision or variance-covariance parameters are fixed hyper-parameters. The list of default hyper-parameters is returned by the function default_hyperpars() (⇨ see vignette) and alternative hyper-parameters can be specified via the hyperpars argument (in the main analysis functions *_imp()) (⇨ see example).

Design and scaling matrices and hyper-parameters are part of the data_list element of a JointAI object, i.e.,

head(lme1$data_list$M_id)
##         DM AntiHBc (Intercept)     Age0 SexFemale AntiHBcPositive DMYes
## 960.1    0       0           1 24.02857         0              NA    NA
## 512.112  0       0           1 35.28016         0              NA    NA
## 704.445  0       1           1 42.65734         1              NA    NA
## 960.889  0       1           1 49.15190         0              NA    NA
## 671      0       0           1 47.18493         1              NA    NA
## 936.46  NA       0           1 22.20862         0              NA    NA

Let’s have a look at the model syntax for the analysis model, i.e, the model for logFIB4.

##    # Normal mixed effects model for logFIB4 ----------------------------------------
##   for (i in 1:985) {
##     M_lvlone[i, 1] ~ dnorm(mu_logFIB4[i], tau_logFIB4)
##     mu_logFIB4[i] <- b_logFIB4_id[group_id[i], 1] +
##                      b_logFIB4_id[group_id[i], 2] * (M_lvlone[i, 3] - spM_lvlone[3, 1])/spM_lvlone[3, 2] +
##                      beta[7] * (M_lvlone[i, 2] - spM_lvlone[2, 1])/spM_lvlone[2, 2]
##   }
## 
##   for (ii in 1:250) {
##     b_logFIB4_id[ii, 1:2] ~ dmnorm(mu_b_logFIB4_id[ii, ], invD_logFIB4_id[ , ])
##     mu_b_logFIB4_id[ii, 1] <- M_id[ii, 3] * beta[1] +
##                               (M_id[ii, 4] - spM_id[4, 1])/spM_id[4, 2] * beta[2] +
##                               M_id[ii, 5] * beta[3] + M_id[ii, 6] * beta[4] +
##                               M_id[ii, 7] * beta[5]
##     mu_b_logFIB4_id[ii, 2] <- beta[6]
##   }
## [...]

The first for-loop (with index i) runs over the 985 repeated measurements in the data and specifies that M_lvlone[ , 1] (that is where the variable logFIB4 is stored) is normally distributed with mean mu_logFIB4 and precision tau_logFIB4.

The linear predictor mu_logFIB4 consists of the random intercept b_logFIB4_id[ , 1], the random slope b_logFIB4_id[, 2] which is multiplied with a scaled version of the variable time (M_lvlone[, 3]), and the effect (beta[7]) of a scaled version of Albumin (M_lvlone[, 2]).

The second for-loop (with index ii) runs over the 250 subjects and specifies that the random effects b_logFIB4_id[, 1:2] follow a bivariate normal distribution with mean mu_b_log_FIB4_id and precision matrix invD_logFIB4_id.

The random intercept b_logFIB4_id[, 1] has as its mean the linear combination of the baseline covariates, the random slope b_logFIB4_id[,2] has as its mean the fixed-effects parameter for time.

In mathematical notation, this would be

\[\begin{eqnarray*} \texttt{logFIB4}_i &\sim& N(\mu_i, \tau_\text{FIB4}) \qquad \color{lightgrey}{\text{(where } \tau_{FIB4} \text{ is the precision)}}\\ \mu_{i} &=& b_{g(i), 1} + b_{g(i), 2} \frac{\texttt{time}_i - \text{mean}(\texttt{time})}{\text{sd}(\texttt{time})} + \beta_7 \frac{\texttt{Albumin}_i - \text{mean}(\texttt{Albumin})}{\text{sd}(\texttt{Albumin})}\\\\ (b_{ii1}, b_{ii2}) &\sim& N(\boldsymbol\mu_{b, ii}, \mathbf D^{-1})\\ \mu_{b, ii, 1} &=& \mathbf{1}\beta_1 + \frac{\texttt{Age0}_{ii} - \text{mean}(\texttt{Age0})}{\text{sd}(\texttt{Age0})}\beta_2 + \texttt{Sex}_{ii}\beta_3 +\texttt{AntiHBc}_{ii}\beta_4 + \texttt{DM}_{ii}\beta_5\\ \mu_{b, ii, 2} &=& \beta_6 \end{eqnarray*}\] where \(i\) indexes the repeated measurements, \(ii\) indexes the subjects, and \(g(i)\) is a function that links repeated measurements to the corresponding subject.

The JAGS model then continues with the specification of the prior distributions for the model for logFIB4:

## [...]
##   # Priors for the model for logFIB4
##   for (k in 1:7) {
##     beta[k] ~ dnorm(mu_reg_norm, tau_reg_norm)
##   }
##   tau_logFIB4 ~ dgamma(shape_tau_norm, rate_tau_norm)
##   sigma_logFIB4 <- sqrt(1/tau_logFIB4)
## 
##   for (k in 1:2) {
##     RinvD_logFIB4_id[k, k] ~ dgamma(shape_diag_RinvD, rate_diag_RinvD)
##   }
##   invD_logFIB4_id[1:2, 1:2] ~ dwish(RinvD_logFIB4_id[ , ], KinvD_logFIB4_id)
##   D_logFIB4_id[1:2, 1:2] <- inverse(invD_logFIB4_id[ , ])
## 
## [...]

The 7 regression coefficients beta follow independent normal distributions with mean mu_reg_norm and precision tau_reg_norm and the precision tau_logFIB4 of the model for logFIB4 has a Gamma distribution with shape shape_tau_norm and rate_tau_norm.

In mathematical notation, this part of the model would be

\[\begin{eqnarray*} \beta_k &\sim& N(\mu_\beta, \tau_\beta) \quad k = 1, \ldots, 7, \qquad \color{lightgrey}{\text{(where } \tau_\beta \text{ is the precision)}}\\ \tau_\text{FIB4} &\sim& \text{Gamma}(s_\tau, r_\tau) \qquad \color{lightgrey}{\text{(where } s_\tau \text{ is the shape and } r_\tau \text{ is the rate parameter)}}\\ \sigma_\text{FIB4} &=& \sqrt{1/\tau_\text{FIB4}}\\\\ R_{1,1}, R_{2,2} &\sim& \text{Gamma}(s_R, r_R)\qquad \color{lightgrey}{\text{(where } s_R \text{ is the shape and } r_R \text{ is the rate parameter)}}\\ \mathbf D^{-1} &\sim& \text{Wishart}(\mathbf R, K) \end{eqnarray*}\]

The default values for the hyper-parameters are

default_hyperpars()$norm
##    mu_reg_norm   tau_reg_norm shape_tau_norm  rate_tau_norm 
##          0e+00          1e-04          1e-02          1e-02

The variance-covariance matrix of the random effects, D_logFIB4_id is assumed to follow an inverse Wishart distribution. In JAGS, we specify that that the inverse of this matrix, i.e., invD_logFIB4_id has a Wishart distribution with scale matrix RinvD_logFIB4_id and degrees of freedom KinvD_logFIB4_id.

The scale matrix is diagonal, with off-diagonal elements equal to zero (this is pre-specified in the data passed to JAGS and the diagonal elements have a hyper-prior that is a Gamma distribution.

The degrees of freedom are set to be one larger than the number of random effects:

default_hyperpars()$ranef
## shape_diag_RinvD  rate_diag_RinvD       KinvD_expr 
##           "0.01"          "0.001"   "nranef + 1.0"

By default, continuous covariates are scaled. This usually improves convergence/mixing (⇨ see vignette). The regression coefficient reported in the model summary() etc. are scaled back to the original scale of the data.

By default, the model is written to a temporary file and deleted when MCMC sampling is complete. Arguments modelname, modeldir can be used to specify a name and directory for the model file. When keep_model = TRUE the file is not deleted and by setting the argument overwrite = FALSE it is possible to use an existing model instead of the one that is automatically created by JointAI.

Task

Re-fit lme2, but now also include random intercepts for center and for Genotype and use a natural cubic spline with 3 degrees of freedom for time, i.e., ns(time, df = 3) (in the fixed and random effects).

To specify a hierarchical model with more than two levels, you need to use the lme4 type specification of the random effects.


  • Run this model for a couple of iterations (i.e., n.iter should not be zero) so that you can also inspect the model summary().
  • Investigate how the JAGS model changes compared to what we saw in lme2.

Solution

(I only run the model for 200 iterations to keep the computational time short. Because this model is more complex, we would need more iterations to get reliable results.)

lme3 <- lme_imp(logFIB4 ~ Age0 + Sex + ns(time, df = 2) + AntiHBc + Albumin + DM + 
                  (ns(time, df = 2) | id) + (1 | center) + (1 | Genotype),
                no_model = "time", data = datHCV, n.iter = 200, seed = 1234)

Let’s first look at the model summary(), and focus there on the results from the random effect variance-covariance matrices.

summary(lme3)
## [...]
## Posterior summary of random effects covariance matrix:
## 
## * For level "id":
##                      Mean     SD    2.5%   97.5% tail-prob. GR-crit MCE/SD
## D_logFIB4_id[1,1]  0.0923 0.0224  0.0527  0.1392               1.09  0.125
## D_logFIB4_id[1,2] -0.1350 0.0629 -0.2476 -0.0214    0.00667    1.12  0.210
## D_logFIB4_id[2,2]  0.6995 0.1854  0.3959  1.1024               1.06  0.220
## D_logFIB4_id[1,3]  0.0318 0.0877 -0.1177  0.2267    0.71333    2.18  0.256
## D_logFIB4_id[2,3]  0.2369 0.1926 -0.0817  0.6662    0.21667    1.06  0.147
## D_logFIB4_id[3,3]  1.3209 0.6056  0.4688  2.7605               3.03  0.329
## 
## * For level "center":
##                        Mean    SD  2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_logFIB4_center[1,1] 0.555 0.384 0.202  1.49               1.07 0.0408
## 
## * For level "Genotype":
##                            Mean      SD     2.5%  97.5% tail-prob. GR-crit MCE/SD
## D_logFIB4_Genotype[1,1] 0.00419 0.00697 0.000408 0.0183               1.16 0.0782
## 
## [...]

In the model summary we now see separate entries for the variance covariance matrices of the random effects on the different levels.

At the end of the output, the total number of observations and the number of groups per level is reported:

## [...]
## 
## Number of observations: 985 
## Number of groups:
##  - center: 10
##  - Genotype: 7
##  - id: 250

When we inspect the JAGS model from lme3 we see the following:

In the model part for the response logFIB4, we see the additional random intercepts b_logFIB4_center and b_logFIB4_Genotype, and there are now 4 random effects for id because we used a spline with 2 df:

## [...]
##    # Normal mixed effects model for logFIB4 ----------------------------------------
##   for (i in 1:985) {
##     M_lvlone[i, 1] ~ dnorm(mu_logFIB4[i], tau_logFIB4)
##     mu_logFIB4[i] <- b_logFIB4_center[group_center[i], 1] +
##                      b_logFIB4_Genotype[group_Genotype[i], 1] +
##                      b_logFIB4_id[group_id[i], 1] +
##                      b_logFIB4_id[group_id[i], 2] * (M_lvlone[i, 3] - spM_lvlone[3, 1])/spM_lvlone[3, 2] +
##                      b_logFIB4_id[group_id[i], 3] * (M_lvlone[i, 4] - spM_lvlone[4, 1])/spM_lvlone[4, 2] +
##                      beta[8] * (M_lvlone[i, 2] - spM_lvlone[2, 1])/spM_lvlone[2, 2]
##   }
## [...]

The random intercept for center uses index iii (level-3) and index iiii is used for Genotype:

## [...]
## 
##   for (iii in 1:10) {
##     b_logFIB4_center[iii, 1:1] ~ dnorm(mu_b_logFIB4_center[iii, ], invD_logFIB4_center[ , ])
##     mu_b_logFIB4_center[iii, 1] <- M_center[iii, 1] * beta[1]
##   }
## 
##   for (iiii in 1:7) {
##     b_logFIB4_Genotype[iiii, 1:1] ~ dnorm(mu_b_logFIB4_Genotype[iiii, ], invD_logFIB4_Genotype[ , ])
##     mu_b_logFIB4_Genotype[iiii, 1] <- 0
##   }
## 
##   for (ii in 1:250) {
##     b_logFIB4_id[ii, 1:3] ~ dmnorm(mu_b_logFIB4_id[ii, ], invD_logFIB4_id[ , ])
##     mu_b_logFIB4_id[ii, 1] <- (M_id[ii, 3] - spM_id[3, 1])/spM_id[3, 2] * beta[2] +
##                               M_id[ii, 4] * beta[3] + M_id[ii, 5] * beta[4] +
##                               M_id[ii, 6] * beta[5]
##     mu_b_logFIB4_id[ii, 2] <- beta[6]
##     mu_b_logFIB4_id[ii, 3] <- beta[7]
##   }
## 
## [...]

Moreover, the random intercept for center is centred around the fixed effects intercept (beta[1]), and the intercept has disappeared from the linear predictor in the random intercept for id. The random intercept for Genotype has mean 0.

⇨ There is only one fixed effects intercept, hence, it can only appear in one of the random intercepts. It does not really matter in which.

Because we only have a random intercept for the levels center and Genotype a Gamma prior is used instead of the Wishart distribution.

## [...]
##   D_logFIB4_center[1, 1] <- 1 / (invD_logFIB4_center[1, 1])
## 
##   invD_logFIB4_Genotype[1, 1] ~ dgamma(shape_diag_RinvD, rate_diag_RinvD)T(1e-16, 1e16)
##   D_logFIB4_Genotype[1, 1] <- 1 / (invD_logFIB4_Genotype[1, 1])
## 
##   for (k in 1:3) {
##     RinvD_logFIB4_id[k, k] ~ dgamma(shape_diag_RinvD, rate_diag_RinvD)
##   }
##   invD_logFIB4_id[1:3, 1:3] ~ dwish(RinvD_logFIB4_id[ , ], KinvD_logFIB4_id)
##   D_logFIB4_id[1:3, 1:3] <- inverse(invD_logFIB4_id[ , ])
## 
## [...]

The truncation T(1e-16, 1e16) is used to prevent numerical problems when the sampler moves too close to 0 or \(\infty\) .

Parameters to follow

JAGS allows us to obtain MCMC samples for any node of the model (any parameter or other unknowns), but it has to be specified in advance for which of the nodes the MCMC samples should be monitored.

In JointAI, the nodes are grouped under the following names:
name/key word what is monitored
analysis_main betas and sigma_main, tau_main (for beta regression) or shape_main (for parametric survival models), gamma_main (for cumulative logit models), D_main (for multi-level models) and basehaz in proportional hazards models)
analysis_random ranef_main, D_main, invD_main, RinvD_main
other_models alphas, tau_other, gamma_other, delta_other
imps imputed values
betas regression coefficients of the main analysis model
tau_main precision of the residuals from the main analysis model(s)
sigma_main standard deviation of the residuals from the main analysis model(s)
gamma_main intercepts in ordinal main model(s)
delta_main increments of ordinal main model(s)
ranef_main random effects from the main analysis model(s) b
D_main covariance matrix of the random effects from the main model(s)
invD_main inverse(s) of D_main
RinvD_main matrices in the priors for invD_main
alphas regression coefficients in the covariate models
tau_other precision parameters of the residuals from covariate models
gamma_other intercepts in ordinal covariate models
delta_other increments of ordinal intercepts
ranef_other random effects from the other models b
D_other covariance matrix of the random effects from the other models
invD_other inverses of D_other
RinvD_other matrices in the priors for invD_other
other additional parameters

Which nodes are monitored can be specified via the argument monitor_params. By default, only the main parameters of the analysis model are monitored, i.e., monitor_params = c(analysis_main = TRUE) (⇨ see vignette and reference).

This argument is available in the main analysis functions (*_imp()) and in add_samples().

Subset of MCMC samples
The functions summary(), coef(), confint(), traceplot(), densplot(), predict(), … have an argument subset which allows us to select which subset of the monitored nodes to include in the output. This argument follows the same principle as monitor_params.

These functions also have the additional arguments start, end, thin and exclude_chains which allow you to select a subset of the iterations to be used in the output.

start and end specify the first and last iteration to be used.

thin is the thinning interval, i.e., thin = 5 would only use every fifth iteration, and exclude_chains takes a vector of integers specifying which of the MCMC chains should be excluded, e.g., because they did not converge.

Task

Continue sampling from lme3 using add_samples() but now also monitor the parameters of the models for the covariates (other_models). When changing what is monitored, the argument add = FALSE has to be set in add_samples().

Explore the traceplot() or densplot() and summary() (with argument subset = c(other_models = TRUE)) for the covariate models.

Solution

lme3b <- add_samples(lme3, add = FALSE,
                     monitor_params = c(other_models = TRUE),
                     n.iter = 200)
## NOTE: Stopping adaptation

To be able to distinguish the plots of the regression coefficients from the different models, the names of the response variables are added to the plot titles:

traceplot(lme3b, subset = c(analysis_main = FALSE, other_models = TRUE),
          ncol = 5, nrow = 5)

To also include the results from the covariate models into the model summary, we additionally need to specifysubset = c(other_models = TRUE):

summary(lme3b, subset = c(other_models = TRUE))
## 
## Bayesian joint model fitted with JointAI 
## 
## Call:
## list(lme_imp(fixed = logFIB4 ~ Age0 + Sex + ns(time, df = 2) + 
##     AntiHBc + Albumin + DM + (ns(time, df = 2) | id) + (1 | center) + 
##     (1 | Genotype), data = datHCV, n.iter = 200, no_model = "time", 
##     seed = 1234), add_samples(object = lme3, n.iter = 200, add = FALSE, 
##     monitor_params = c(other_models = TRUE)))
## 
## 
## # --------------------------------------------------------------------- #
##   Bayesian linear mixed model for "logFIB4"
## # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #
## 
## Posterior summary:
##                      Mean      SD    2.5%    97.5% tail-prob. GR-crit MCE/SD
## (Intercept)        1.0251 0.34501  0.2773  1.69183    0.00667    1.02 0.0833
## Age0               0.0263 0.00220  0.0219  0.03048    0.00000    1.04 0.0929
## SexFemale         -0.0930 0.04726 -0.1916 -0.00948    0.03333    1.07 0.1284
## AntiHBcPositive   -0.0941 0.05679 -0.2056  0.01370    0.09333    1.50 0.1532
## DMYes             -0.0646 0.06654 -0.1942  0.08228    0.30333    1.05 0.1181
## ns(time, df = 2)1  0.9117 0.10369  0.7287  1.11861    0.00000    1.21 0.1478
## ns(time, df = 2)2  0.4777 0.22114  0.0426  0.95329    0.01333    1.20 0.2233
## Albumin           -0.0484 0.00364 -0.0553 -0.04036    0.00000    1.05 0.1045
## 
## 
## Posterior summary of random effects covariance matrix:
## 
## * For level "id":
##                      Mean     SD    2.5%    97.5% tail-prob. GR-crit MCE/SD
## D_logFIB4_id[1,1]  0.0801 0.0206  0.0468  0.12385               1.74  0.187
## D_logFIB4_id[1,2] -0.1162 0.0661 -0.2645 -0.00164     0.0467    2.42  0.244
## D_logFIB4_id[2,2]  0.6373 0.2267  0.2913  1.13695               2.46  0.256
## D_logFIB4_id[1,3] -0.0146 0.0742 -0.1616  0.12518     0.8433    1.04  0.229
## D_logFIB4_id[2,3]  0.3067 0.2129 -0.0882  0.79711     0.1133    1.08  0.160
## D_logFIB4_id[3,3]  1.3193 0.4339  0.6126  2.23555               1.29  0.188
## 
## * For level "center":
##                        Mean    SD  2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_logFIB4_center[1,1] 0.558 0.372 0.203  1.66               1.01 0.0408
## 
## * For level "Genotype":
##                            Mean      SD     2.5%  97.5% tail-prob. GR-crit MCE/SD
## D_logFIB4_Genotype[1,1] 0.00547 0.00648 0.000534 0.0203               1.16 0.0893
## 
## 
## Posterior summary of residual std. deviation:
##                Mean     SD  2.5% 97.5% GR-crit MCE/SD
## sigma_logFIB4 0.411 0.0134 0.384 0.438    1.39 0.0863
## 
## 
## # --------------------------------------------------------------------- #
##   Bayesian linear mixed model for "Albumin"
## # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #
## 
## Posterior summary:
##                    Mean     SD   2.5%   97.5% tail-prob. GR-crit MCE/SD
## (Intercept)     50.6092 1.8448 47.140 54.2956    0.00000    1.08 0.0688
## Age0            -0.0846 0.0260 -0.136 -0.0324    0.00667    1.02 0.0972
## SexFemale       -0.7492 0.6047 -1.828  0.5376    0.19000    1.38 0.1214
## AntiHBcPositive -0.4117 0.7497 -1.784  1.0967    0.55333    1.54 0.1890
## DMYes            0.3402 0.9070 -1.360  2.0047    0.71667    1.34 0.1406
## time            -0.0739 0.0303 -0.133 -0.0159    0.02000    1.02 0.0770
## 
## 
## Posterior summary of random effects covariance matrix:
## 
## * For level "id":
##                   Mean   SD 2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_Albumin_id[1,1] 9.46 1.55 6.76  12.6               1.04  0.103
## 
## * For level "center":
##                       Mean   SD 2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_Albumin_center[1,1] 16.3 12.3 5.11  40.8               1.18 0.0449
## 
## * For level "Genotype":
##                          Mean    SD    2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_Albumin_Genotype[1,1] 0.084 0.182 0.00111 0.612               1.36  0.127
## 
## 
## Posterior summary of residual std. deviation:
##               Mean    SD 2.5% 97.5% GR-crit MCE/SD
## sigma_Albumin  4.5 0.134 4.23  4.76    1.06 0.0609
## 
## 
## # --------------------------------------------------------------------- #
##   Bayesian binomial mixed model for "DM"
## # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #
## 
## Posterior summary:
##                    Mean     SD    2.5%   97.5% tail-prob. GR-crit MCE/SD
## (Intercept)     -2.4601 0.7811 -3.8923 -0.8492    0.00333    1.01 0.0856
## Age0             0.0193 0.0164 -0.0149  0.0495    0.23000    1.00 0.0728
## SexFemale        0.1005 0.4015 -0.7381  0.8206    0.81000    1.01 0.1114
## AntiHBcPositive -0.2346 0.4608 -1.2744  0.5704    0.65000    1.01 0.1192
## 
## 
## Posterior summary of random effects covariance matrix:
## 
## * For level "center":
##                    Mean    SD    2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_DM_center[1,1] 0.0529 0.134 0.00059 0.501               1.78  0.194
## 
## * For level "Genotype":
##                     Mean    SD   2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_DM_Genotype[1,1] 0.122 0.224 0.0011  0.74               1.32  0.105
## 
## 
## 
## # --------------------------------------------------------------------- #
##   Bayesian binomial mixed model for "AntiHBc"
## # - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - - #
## 
## Posterior summary:
##                Mean     SD    2.5%    97.5% tail-prob. GR-crit MCE/SD
## (Intercept)  0.7725 0.6195 -0.4687  1.91576     0.2267    1.05 0.0656
## Age0        -0.0295 0.0132 -0.0539 -0.00441     0.0233    1.06 0.0678
## SexFemale    0.0366 0.3178 -0.6153  0.66609     0.8767    1.02 0.0946
## 
## 
## Posterior summary of random effects covariance matrix:
## 
## * For level "center":
##                       Mean    SD    2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_AntiHBc_center[1,1] 0.17 0.239 0.00175 0.834               1.46  0.154
## 
## * For level "Genotype":
##                           Mean     SD     2.5% 97.5% tail-prob. GR-crit MCE/SD
## D_AntiHBc_Genotype[1,1] 0.0326 0.0786 0.000474 0.241               1.28  0.113
## 
## 
## 
## # ----------------------------------------------------------- #
## 
## MCMC settings:
## Iterations = 101:300
## Sample size per chain = 200 
## Thinning interval = 1 
## Number of chains = 3 
## 
## Number of observations: 985 
## Number of groups:
##  - center: 10
##  - Genotype: 7
##  - id: 250

Note that these are the models used in the specification of the joint distribution and not the models the missing values are imputed from!

Predicted values

Whenever a model contains non-linear effects or interactions, it can be helpful for interpretation to visualize these effects.

JointAI provides two useful functions to obtain fitted values, e.g., to create an effect plot.

predDF() creates a data.frame containing all variables used in a model, where variables specified in the vars argument (as a one-sided formula) vary and all other variables are set to either the median (for continuous variables) or the reference category.

The values of the varying variables are either each category or a sequence of values from the minimum to maximum observed value in the data. This can be changed using the ... argument, in which vectors of alternative values per variable in vars can be provided.

The function predict() then calculates fitted values, either for the original data, or for the data provided via the newdata argument. It returns the fitted values and 95% credible intervals, added as columns to newdata (element newdata) as well as separately (element fitted) (⇨ vignette).

Task

Create an effect plot visualizing the effects of time, Sex and Age0 in lme3, by plotting the fitted value across time for all combinations of Sex and ages 35, 45 and 55.

Getting a data.frame containing the covaritate values for which we want the fitted values is easy using predDF(). You need to specify the argument vars as ~ time + Age0 + Sex.

To make sure that Age0 only uses the values 35, 45 and 55, additionally specify (as an argument) Age0 = c(35, 45, 55).


There is no automatic plotting of the fitted values in JointAI. I find using ggplot2 the most convenient.

(Note: there is nothing particularly interesting about the effects of Age0 and Sex in this model. This is just for demonstration of the functionality.)

Solution

We can get the data.frame containing the covariates for the scenarios we want to show in the effect plot using the following syntax:

newDF <- predDF(lme3, vars = ~ time + Sex + Age0,
                Age0 = c(35, 45, 55))

newDF is a data.frame with 600 rows (100 different values for time \(\times\) 2 different values for Sex \(\times\) 3 different values for Age0).

head(newDF, 10)
##       logFIB4 Age0    Sex     time  AntiHBc  Albumin DM  id center Genotype
## 1  0.03907563   35   Male 0.000000 Negative 46.65857 No 124      5        4
## 2  0.03907563   45   Male 0.000000 Negative 46.65857 No 124      5        4
## 3  0.03907563   55   Male 0.000000 Negative 46.65857 No 124      5        4
## 4  0.03907563   35 Female 0.000000 Negative 46.65857 No 124      5        4
## 5  0.03907563   45 Female 0.000000 Negative 46.65857 No 124      5        4
## 6  0.03907563   55 Female 0.000000 Negative 46.65857 No 124      5        4
## 7  0.03907563   35   Male 0.298107 Negative 46.65857 No 124      5        4
## 8  0.03907563   45   Male 0.298107 Negative 46.65857 No 124      5        4
## 9  0.03907563   55   Male 0.298107 Negative 46.65857 No 124      5        4
## 10 0.03907563   35 Female 0.298107 Negative 46.65857 No 124      5        4

Using newDF, we can obtain fitted values using the function predict():

pred <- predict(lme3, newdata = newDF)
## Warning: 
## Prediction in multi-level settings currently only takes into account the fixed effects,
## i.e., assumes that the random effect realizations are equal to zero.

You will get a warning message stating that prediction in multi-level models only takes into account the fixed effects, i.e., the fitted values are calculated as \(\mathbf X_i \boldsymbol\beta\), not as \(\mathbf X_i \boldsymbol\beta + \mathbf Z_i \mathbf b_i\).

Getting subject-specific fitted or predicted values for observations in the data the model was fitted on would require monitoring the random effects (and then calculating the fitted values yourself since this is not yet implemented).

For observations not in the original data, random effects would have to be sampled from their posterior distribution, which is not yet implemented in JointAI.

predict() returns a list with two elements:

head(pred$newdata)
##      logFIB4 Age0    Sex time  AntiHBc  Albumin DM  id center Genotype         fit       2.5%
## 1 0.03907563   35   Male    0 Negative 46.65857 No 124      5        4 -0.33399567 -0.8121508
## 2 0.03907563   45   Male    0 Negative 46.65857 No 124      5        4 -0.06494344 -0.5450195
## 3 0.03907563   55   Male    0 Negative 46.65857 No 124      5        4  0.20410879 -0.3001955
## 4 0.03907563   35 Female    0 Negative 46.65857 No 124      5        4 -0.41517233 -0.8874518
## 5 0.03907563   45 Female    0 Negative 46.65857 No 124      5        4 -0.14612009 -0.6237139
## 6 0.03907563   55 Female    0 Negative 46.65857 No 124      5        4  0.12293214 -0.3871148
##        97.5%
## 1 0.13530395
## 2 0.39068868
## 3 0.65100043
## 4 0.04208335
## 5 0.30526889
## 6 0.56403345
head(pred$fitted)
##           fit       2.5%      97.5%
## 1 -0.33399567 -0.8121508 0.13530395
## 2 -0.06494344 -0.5450195 0.39068868
## 3  0.20410879 -0.3001955 0.65100043
## 4 -0.41517233 -0.8874518 0.04208335
## 5 -0.14612009 -0.6237139 0.30526889
## 6  0.12293214 -0.3871148 0.56403345

Using ggplot2 we can then create the effect plot:

library("ggplot2")

ggplot(pred$newdata,
       aes(x = time, y = fit, color = factor(Age0),
           fill = factor(Age0))) +
  geom_ribbon(aes(ymin = `2.5%`, ymax = `97.5%`), alpha = 0.3, color = NA) +
  geom_line() +
  facet_wrap("Sex") +
  theme(legend.position = "top")

Export of imputed values

Although JointAI does not perform multiple imputation in the classical sense, it is possible to use it to obtain imputed values, which can then be used in a multiple imputation framework.

First, the imputed values have to be monitored by setting monitor_params = c(imps = TRUE) when running the model.

Then, the function get_MIdat() can be used to extract the imputed values from the fitted JointAI object (⇨ vignette).

Arguments of get_MIdat():
Using the argument m we can select the number of imputed datasets that should be created.

The imputed values are the MCMC samples of the missing values from m randomly selected iterations. The argument start specifies the first iteration that may be used (to discard a burn-in). Because subsequent MCMC samples (from the same chain) are correlated but multiple imputed values should be independent, the argument and minspace (default is 50) sets the minimum number of iterations between iterations that may be selected.

Output of get_MIdat():
get_MIdat() returns a data.frame in which the m imputed datasets are stacked on top of each other. If include = TRUE, the original, incomplete data is included as well.

The newly created variable Imputation_ identifies the different imputed datasets (where 0 refers to the original data).

Task

Re-fit model lme3 but now monitor the imputed values.

You can visualize the (marginal) distributions of the observed and imputed values using the function plot_imp_distr(). You will have to set the argument id = "id" in this function.

Solution

lme3_imps <- update(lme3, monitor_params = c(imps = TRUE), n.iter = 200)

impDF <- get_MIdat(lme3_imps, m = 10)

We can visualize the distribution of observed and imputed values:

plot_imp_distr(impDF, id = "id")

Keep in mind that these are the marginal distributions. To properly investigate the fit of the (covariate) models we would have to investigate the conditional distributions, e.g., perform posterior predictive checks.

To continue working with the imputed data in a multiple imputation framework using the package mice we can convert the stacked imputed data into a mids object with the help of the function as.mids() from the mice package. For multi-level data, the .id argument should not be the subject ID (that would lead to an error about duplicate row.names) but the variable .rownr.

library("mice")
my_mids <- as.mids(impDF, .imp = "Imputation_", .id = ".rownr")

We could then continue with analysis and pooling as if the data had been imputed using mice:

library("lme4")

my_mira <- with(my_mids,
             lmer(logFIB4 ~ Age0 + Sex + ns(time, df = 2) + AntiHBc + Albumin + 
                    DM + (ns(time, df = 2) | id) + (1 | center) + 
                    (1 | Genotype))
)


library("broom.mixed")
summary(mice::pool(my_mira))
##                term    estimate   std.error   statistic        df      p.value
## 1       (Intercept)  1.02952392 0.306011941   3.3643260 230.32590 0.0008986698
## 2              Age0  0.02638457 0.002147383  12.2868475 283.64236 0.0000000000
## 3         SexFemale -0.08763096 0.048633314  -1.8018711 613.31547 0.0720566140
## 4 ns(time, df = 2)1  0.95311581 0.106016877   8.9902272 894.26077 0.0000000000
## 5 ns(time, df = 2)2  0.53028313 0.208356485   2.5450762 781.82618 0.0111164482
## 6   AntiHBcPositive -0.07901239 0.052995282  -1.4909325 174.86080 0.1377809482
## 7           Albumin -0.04886846 0.003772610 -12.9534868  90.86330 0.0000000000
## 8             DMYes -0.04531812 0.069786480  -0.6493824  96.13983 0.5176396589