JointAI: Joint Analysis and Imputation of Incomplete Data

The JointAI package performs simultaneous imputation and inference for incomplete or complete data under the Bayesian framework. Models for incomplete covariates, conditional on other covariates, are specified automatically and modelled jointly with the analysis model. MCMC sampling is performed in 'JAGS' via the R package rjags.

Main functions

JointAI provides the following main functions that facilitate analysis with different models:

lm_imp for linear regression
glm_imp for generalized linear regression
betareg_imp for regression using a beta distribution
lognorm_imp for regression using a log-normal distribution
clm_imp for (ordinal) cumulative logit models
mlogit_imp for multinomial models
lme_imp or lmer_imp for linear mixed models
glme_imp or glmer_imp for generalized linear mixed models
betamm_imp for mixed models using a beta distribution
lognormmm_imp for mixed models using a log-normal distribution
clmm_imp for (ordinal) cumulative logit mixed models
survreg_imp for parametric (Weibull) survival models
coxph_imp for (Cox) proportional hazard models
JM_imp for joint models of longitudinal and survival data

As far as possible, the specification of these functions is analogous to the specification of widely used functions for the analysis of complete data, such as lm, glm, lme (from the package nlme), survreg (from the package survival) and coxph (from the package survival).

Computations can be performed in parallel to reduce computational time, using the package future, the argument shrinkage allows the user to impose a penalty on the regression coefficients of some or all models involved, and hyper-parameters can be changed via the argument hyperpars.

To obtain summaries of the results, the functions summary(), coef() and confint() are available, and results can be visualized with the help of traceplot() or densplot().

The function predict() allows prediction (including credible intervals) from JointAI models.

Evaluation and export

Two criteria for evaluation of convergence and precision of the posterior estimate are available:

GR_crit implements the Gelman-Rubin criterion ('potential scale reduction factor') for convergence
MC_error calculates the Monte Carlo error to evaluate the precision of the MCMC sample

Imputed data can be extracted (and exported to SPSS) using get_MIdat(). The function plot_imp_distr() allows visual comparison of the distribution of observed and imputed values.

Other useful functions

parameters and list_models to gain insight in the specified model
plot_all and md_pattern to visualize the distribution of the data and the missing data pattern

Vignettes

The following vignettes are available

Minimal Example:
A minimal example demonstrating the use of lm_imp, summary.JointAI, traceplot and densplot.
Visualizing Incomplete Data:
Demonstrations of the options in plot_all (plotting histograms and bar plots for all variables in the data) and md_pattern (plotting or printing the missing data pattern).
Model Specification:
Explanation and demonstration of all parameters that are required or optional to specify the model structure in lm_imp, glm_imp and lme_imp. Among others, the functions parameters, list_models and set_refcat are used.
Parameter Selection:
Examples on how to select the parameters/variables/nodes to follow using the argument monitor_params and the parameters/variables/nodes displayed in the summary, traceplot, densplot or when using GR_crit or MC_error.
MCMC Settings:
Examples demonstrating how to set the arguments controlling settings of the MCMC sampling, i.e., n.adapt, n.iter, n.chains, thin, inits.
After Fitting:
Examples on the use of functions to be applied after the model has been fitted, including traceplot, densplot, summary, GR_crit, MC_error, predict, predDF and get_MIdat.
Theoretical Background:
Explanation of the statistical method implemented in JointAI.

References

Erler NS, Rizopoulos D, Lesaffre EMEH (2021). "JointAI: Joint Analysis and Imputation of Incomplete Data in R." Journal of Statistical Software, 100(20), 1-56. doi:10.18637/jss.v100.i20 .

Erler, N.S., Rizopoulos, D., Rosmalen, J., Jaddoe, V.W.V., Franco, O. H., & Lesaffre, E.M.E.H. (2016). Dealing with missing covariates in epidemiologic studies: A comparison between multiple imputation and a full Bayesian approach. Statistics in Medicine, 35(17), 2955-2974. doi:10.1002/sim.6944

Erler, N.S., Rizopoulos D., Jaddoe, V.W.V., Franco, O.H. & Lesaffre, E.M.E.H. (2019). Bayesian imputation of time-varying covariates in linear mixed models. Statistical Methods in Medical Research, 28(2), 555–568. doi:10.1177/0962280217730851

Author

Maintainer: Nicole S. Erler n.s.erler@umcutrecht.nl (ORCID)