model_imp.Rd
Functions to estimate (generalized) linear and (generalized) linear mixed models, ordinal and ordinal mixed models, and parametric (Weibull) as well as Cox proportional hazards survival models using MCMC sampling, while imputing missing values.
lm_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, ncores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) glm_imp(formula, family, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, ncores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) clm_imp(fixed, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, ncores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) lme_imp(fixed, data, random, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, ncores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) glme_imp(fixed, data, random, family, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, ncores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) clmm_imp(fixed, data, random, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, ncores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) survreg_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, ncores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...) coxph_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL, models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE, ncores = NULL, scale_vars = NULL, scale_pars = NULL, hyperpars = NULL, modelname = NULL, modeldir = NULL, keep_model = FALSE, overwrite = NULL, quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE, keep_scaled_mcmc = FALSE, ...)
formula  a two sided model formula (see 

data  a 
n.chains  the number of parallel chains for the model 
n.adapt  the number of iterations for adaptation. See

n.iter  number of iterations to monitor 
thin  thinning interval for monitors 
monitor_params  named vector specifying which parameters should be monitored 
auxvars  optional vector of variable names that should be used as predictors in the imputation procedure (and will be imputed if necessary) but are not part of the analysis model 
refcats  optional; a named list specifying which category should be
used as reference category for each of the categorical variables.
Options are the category label, the category number,
"first" (the first category), "last" (the last category)
or "largest" (chooses the category with the most observations).
Default is "first". (See also 
models  optional named vector specifying the order and types of the
models for incomplete covariates and longitudinal covariates.
This arguments replaces the argument 
no_model  names of variables for which no model should be specified. Note that this is only possible for completely observed variables and may imply assumptions of independence between the excluded variable and incomplete variables. 
trunc  optional named list specifying the limits of truncation for the distribution of the named incomplete variables 
ridge  logical; should the parameters of the main model be penalized using ridge regression? Default is 
ppc  logical: should monitors for posterior predictive checks be set? (not yet used) 
seed  optional seed value for reproducibility 
inits  optional specification of initial values in the form of a list
or a function (see 
parallel  logical; should the chains be sampled using parallel computation? Default is 
ncores  number of cores to use for parallel computation; if left empty all except two cores will be used 
scale_vars  optional; named vector of (continuous) variables that will
be scaled (such that mean = 0 and sd = 1) to improve
convergence of the MCMC sampling. Default is that all
continuous variables that are not transformed by a function
(e.g. 
scale_pars  optional matrix of parameters used for centering and
scaling continuous covariates. If not specified, this will
be calculated automatically. If 
hyperpars  list of hyperparameters, as obtained by 
modelname  optional; character string specifying the name of the model file (including the ending, either .R or .txt). If unspecified a random name will be generated. 
modeldir  optional; directory containing model file or directory in which the model file will be written. If unspecified a temporary directory will be created. 
keep_model  logical; whether the created JAGS model should be saved
or removed from the disk ( 
overwrite  logical; whether an existing model file with the specified

quiet  if 
progress.bar  character string specifying the type of progress bar.
Possible values are "text", "gui", and "none".
See 
warn  logical; should warnings be given? Default is

mess  logical; should messages be given? Default is

keep_scaled_mcmc  should the "original" MCMC sample
(i.e., the scaled version returned by 
...  additional, optional arguments 
family  only for 
fixed  a two sided formula describing the fixedeffects part of the
model (see 
random  only for 
An object of class JointAI.
See also the vignette: Model Specification
glm_imp()
and glme_imp()
gaussian  
with links:  
identity  , 
log  
binomial  
with links:  
logit  , 
probit  , 
log  , 
cloglog  
Gamma  
with links:  
identity  , 
log  
poisson  
with links:  
log  , 
identity 
models
are:
norm  
linear model  
lognorm  
loglinear model for skewed continuous data  
gamma  
gamma model (with loglink) for skewed continuous data  
beta  
beta model (with logitlink) for skewed continuous data in (0, 1)  
logit  
logistic model for binary data  
multilogit  
multinomial logit model for unordered categorical variables  
cumlogit  
cumulative logit model for ordered categorical variables  
lmm  
linear mixed model for continuous longitudinal covariates  
glmm_gamma  
Gamma mixed model for skewed longitudinal covariates  
glmm_logit  
logit mixed model for binary longitudinal covariates  
glmm_poisson  
Poisson mixed model for longitudinal count covariates  
clmm  
cumulative logit mixed model for longitudinal ordered factors 
monitor_params
)other
,
in which parameter names are specified directly, parameter (groups) are just
set as TRUE
or FALSE
.
If left unspecified, monitor_params = c("analysis_main" = TRUE)
will be used.
name/key word  
what is monitored  
analysis_main  
betas  and 
sigma_y  (and 
D  in mixed models) 
analysis_random  
ranef  , 
D  , 
invD  , 
RinvD  
imp_pars  
alphas  , 
tau_imp  , 
gamma_imp  , 
delta_imp  
imps  
imputed values  
betas  
regression coefficients of the analysis model  
tau_y  
precision of the residuals from the analysis model  
sigma_y  
standard deviation of the residuals from the analysis model  
ranef  
random effects  
b  
D  
covariance matrix of the random effects  
invD  
inverse of  
D  
RinvD  
matrix in the prior for  
invD  
alphas  
regression coefficients in the imputation models  
tau_imp  
precision parameters of the residuals from imputation models  
gamma_imp  
intercepts in ordinal imputation models  
delta_imp  
increments of ordinal intercepts  
other  
additional parameters 
monitor_params = c(analysis_main = TRUE, tau_y = TRUE, sigma_y = FALSE)
would monitor the regression parameters betas
and the
residual precision tau_y
instead of the residual standard
deviation sigma_y
.
monitor_params = c(imps = TRUE)
would monitor betas
, tau_y
,
and sigma_y
(because analysis_main = TRUE
by default) as well as
the imputed values.
class
of each
of the incomplete variables, distinguishing between numeric
,
factor
with two levels, unordered factor
with >2 levels and
ordered factor
with >2 levels.
When a continuous incomplete variable has only two different values it is
assumed to be binary and its coding and default imputation model will be
changed accordingly. This behavior can be overwritten when the imputation
method for that variable is specified directly by the user.
Variables of type `logical` are automatically be converted to unordered factors.
Contrary to base R behavior, dummy coding (i.e., contr.treatment
contrasts)
are used for ordered factors in any linear predictor.
However, since the order of levels in an ordered factor contains information relevant
to the imputation of missing values, it is important that incomplete ordinal
variables are coded as such.
log(x)
and
x
has missing values, x
will be imputed and used in the linear
predictor of models for other incomplete variables, i.e., it is assumed that
the other variables have a linear association with x
but not with
log(x)
. The log()
of the observed and imputed values of
x
is calculated and used in the linear predictor of the analysis model.
If, instead of using log(x)
in the model formula a precalculated
variable logx
would be used instead, this variable is imputed directly
and used in the linear predictors of all models, implying that other incomplete
variables that have logx
in their linear predictors have a linear
association with logx
but not with x
.
When different transformations of the same incomplete variable are used in
one model it is strongly discouraged to calculate these transformations beforehand
and supply them as different variables.
If, for example, a model formula contains both x
and x2
(where
x2
= x^2
), they are treated as separate variables and imputed
with separate models. Imputed values of x2
are thus not equal to the
square of imputed values of x
.
Instead, x + I(x^2)
should be used in the model formula. Then, only
x
is imputed and used in the linear predictor of models for other
incomplete variables, and x^2
is calculated from the imputed values
of x
internally.
The same applies to interactions involving incomplete variables.
multiple nesting levels of random effects (nested or crossed)
prediction (using predict
) conditional on random effects
the use of splines for incomplete variables
the use of pspline
,
frailty
, cluster
or strata
in survival models
left censored or interval censored data
set_refcat
, get_models
,
traceplot
, densplot
,
summary.JointAI
, MC_error
,
GR_crit
, jags.model
,
coda.samples
, predict.JointAI
,
JointAIObject
, add_samples
,
parameters
, list_impmodels
Vignettes
# Example 1: Linear regression with incomplete covariates mod1 < lm_imp(y~C1 + C2 + M2, data = wideDF, n.iter = 100)#># Example 2: Logistic regression with incomplete covariats mod2 < glm_imp(B1 ~ C1 + C2 + M2, data = wideDF, family = binomial(link = "logit"), n.iter = 100)#># Example 3: Linear mixed model with incomplete covariates mod3 < lme_imp(y ~ C1 + B2 + c1 + time, random = ~ timeid, data = longDF, n.iter = 500)#>