model_imp.Rd
Functions to estimate (generalized) linear and (generalized) linear mixed models, ordinal and ordinal mixed models, and parametric (Weibull) as well as Cox proportional hazards survival models using MCMC sampling, while imputing missing values.
lm_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, n.cores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) glm_imp(formula, family, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, n.cores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) clm_imp(fixed, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, n.cores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) lme_imp(fixed, data, random, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, n.cores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) glme_imp(fixed, data, random, family, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, n.cores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) clmm_imp(fixed, data, random, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, n.cores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) survreg_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, n.cores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) coxph_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, n.cores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...)
formula  a two sided model formula (see 

data  a 
n.chains  the number of MCMC chains to be used 
n.adapt  the number of iterations for adaptation of the MCMC samplers
(see also 
n.iter  the number of iterations of the MCMC chain (after adaptation;
see also 
thin  thinning interval (see 
monitor_params  named vector specifying which parameters should be monitored (see details) 
auxvars  optional onesided formula of variables that should be used as predictors in the imputation procedure (and will be imputed if necessary) but are not part of the analysis model 
refcats  optional; either one of 
models  optional named vector specifying the types of models for
(incomplete) covariates.
This arguments replaces the argument 
no_model  names of variables for which no model should be specified. Note that this is only possible for completely observed variables and implies the assumptions of independence between the excluded variable and the incomplete variables. 
trunc  optional named list specifying the limits of truncation for the distribution of the named incomplete variables (see the vignette ModelSpecification) 
ridge  logical; should the parameters of the main model be penalized using ridge regression? Default is 
ppc  logical: should monitors for posterior predictive checks be set? (not yet used) 
seed  optional seed value for reproducibility 
inits  optional specification of initial values in the form of a list
or a function (see 
parallel  logical; should the chains be sampled using parallel computation? Default is 
n.cores  number of cores to use for parallel computation; if left empty all except two cores will be used 
scale_vars  optional; named vector of (continuous) variables that will
be scaled (such that mean = 0 and sd = 1) to improve
convergence of the MCMC sampling. Default is that all
continuous variables that are not transformed by a function
(e.g. 
scale_pars  optional matrix of parameters used for centering and
scaling of continuous covariates. If not specified, this will
be calculated automatically. If 
hyperpars  list of hyperparameters, as obtained by 
modelname  optional; character string specifying the name of the model file (including the ending, either .R or .txt). If unspecified a random name will be generated. 
modeldir  optional; directory containing the model file or directory in which the model file should be written. If unspecified a temporary directory will be created. 
keep_model  logical; whether the created JAGS model should be saved
or removed from the disk ( 
overwrite  logical; whether an existing model file with the specified

quiet  if 
progress.bar  character string specifying the type of progress bar.
Possible values are "text", "gui", and "none" (see

warn  logical; should warnings be given? Default is

mess  logical; should messages be given? Default is

keep_scaled_mcmc  should the "original" MCMC sample
(i.e., the scaled version returned by 
...  additional, optional arguments 
family  only for 
fixed  a two sided formula describing the fixedeffects part of the
model (see 
random  only for multilevel models:
a onesided formula of the form 
An object of class JointAI.
See also the vignettes Model Specification, MCMC Settings and Parameter Selection.
glm_imp()
and glme_imp()
gaussian  with links: identity , log 
binomial  with links: logit , probit , log , cloglog 
Gamma  with links: inverse , identity , log 
poisson  with links: log , identity 
models
are:
norm  linear model 
lognorm  lognormal model for skewed continuous data 
gamma  gamma model (with loglink) for skewed continuous data 
beta  beta model (with logitlink) for skewed continuous data in (0, 1) 
logit  logistic model for binary data 
multilogit  multinomial logit model for unordered categorical variables 
cumlogit  cumulative logit model for ordered categorical variables 
lmm  linear mixed model for continuous longitudinal covariates 
glmm_lognorm  lognormal mixed model for skewed longitudinal covariates 
glmm_gamma  Gamma mixed model for skewed longitudinal covariates 
glmm_logit  logit mixed model for binary longitudinal covariates 
glmm_poisson  Poisson mixed model for longitudinal count covariates 
clmm  cumulative logit mixed model for longitudinal ordered factors 
monitor_params
)other
,
in which parameter names are specified directly, parameter (groups) are just
set as TRUE
or FALSE
.
If left unspecified, monitor_params = c("analysis_main" = TRUE)
will be used.
name/key word  what is monitored 
analysis_main  betas and sigma_y (and D in multilevel models) 
analysis_random  ranef , D , invD , RinvD 
imp_pars  alphas , tau_imp , gamma_imp , delta_imp 
imps  imputed values 
betas  regression coefficients of the analysis model 
tau_y  precision of the residuals from the analysis model 
sigma_y  standard deviation of the residuals from the analysis model 
ranef  random effects b 
D  covariance matrix of the random effects 
invD  inverse of D 
RinvD  matrix in the prior for invD 
alphas  regression coefficients in the covariate models 
tau_imp  precision parameters of the residuals from covariate models 
gamma_imp  intercepts in ordinal covariate models 
delta_imp  increments of ordinal intercepts 
other  additional parameters 
monitor_params = c(analysis_main = TRUE, tau_y = TRUE, sigma_y = FALSE)
would monitor the regression parameters betas
and the
residual precision tau_y
instead of the residual standard
deviation sigma_y
.
monitor_params = c(imps = TRUE)
would monitor betas
, tau_y
,
and sigma_y
(because analysis_main = TRUE
by default) as well as
the imputed values.
class
of each
of the incomplete variables, distinguishing between numeric
,
factor
with two levels, unordered factor
with >2 levels and
ordered factor
with >2 levels.models
.logical
are automatically converted to unordered factors.contr.treatment
contrasts)
are used for ordered factors in any linear predictor.
It is not possible to overwrite this behavior using the base R contrasts specification.
However, since the order of levels in an ordered factor contains information relevant
to the imputation of missing values, it is important that incomplete ordinal
variables are coded as such.
log(x)
and
x
has missing values, x
will be imputed and used in the linear
predictor of models for covariates, i.e., it is assumed that
the other variables have a linear association with x
but not with
log(x)
. The log()
of the observed and imputed values of
x
is calculated and used in the linear predictor of the analysis model.log(x)
in the model formula, a precalculated
variable logx
is used instead, this variable is imputed directly
and used in the linear predictors of all models, implying that
variables that have logx
in their linear predictors have a linear
association with logx
but not with x
.x
and x2
(where
x2
= x^2
), they are treated as separate variables and imputed
with separate models. Imputed values of x2
are thus not equal to the
square of imputed values of x
.
Instead, x
and I(x^2)
should be used in the model formula. Then only
x
is imputed and used in the linear predictor of models for other
incomplete variables, and x^2
is calculated from the imputed values
of x
internally.
The same applies to interactions involving incomplete variables.
multiple nesting levels of random effects (nested or crossed)
prediction (using predict
) conditional on random effects
the use of splines for incomplete variables
the use of pspline
,
frailty
, cluster
or strata
in survival models
left censored or interval censored data
set_refcat
, get_models
,
traceplot
, densplot
,
summary.JointAI
, MC_error
,
GR_crit
,
predict.JointAI
, add_samples
,
JointAIObject
, add_samples
,
parameters
, list_models
Vignettes
# Example 1: Linear regression with incomplete covariates mod1 < lm_imp(y ~ C1 + C2 + M1 + B1, data = wideDF, n.iter = 100) # Example 2: Logistic regression with incomplete covariats mod2 < glm_imp(B1 ~ C1 + C2 + M1, data = wideDF, family = binomial(link = "logit"), n.iter = 100) # Example 3: Linear mixed model with incomplete covariates mod3 < lme_imp(y ~ C1 + B2 + c1 + time, random = ~ timeid, data = longDF, n.iter = 300)