Functions to estimate (generalized) linear and (generalized) linear mixed models, ordinal and ordinal mixed models, and parametric (Weibull) as well as Cox proportional hazards survival models using MCMC sampling, while imputing missing values.

lm_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0,
  thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL,
  models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE,
  ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE,
  ncores = NULL, scale_vars = NULL, scale_pars = NULL,
  hyperpars = NULL, modelname = NULL, modeldir = NULL,
  keep_model = FALSE, overwrite = NULL, quiet = TRUE,
  progress.bar = "text", warn = TRUE, mess = TRUE,
  keep_scaled_mcmc = FALSE, ...)

glm_imp(formula, family, data, n.chains = 3, n.adapt = 100,
  n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL,
  refcats = NULL, models = NULL, no_model = NULL, trunc = NULL,
  ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL,
  parallel = FALSE, ncores = NULL, scale_vars = NULL,
  scale_pars = NULL, hyperpars = NULL, modelname = NULL,
  modeldir = NULL, keep_model = FALSE, overwrite = NULL,
  quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE,
  keep_scaled_mcmc = FALSE, ...)

clm_imp(fixed, data, n.chains = 3, n.adapt = 100, n.iter = 0,
  thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL,
  models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE,
  ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE,
  ncores = NULL, scale_vars = NULL, scale_pars = NULL,
  hyperpars = NULL, modelname = NULL, modeldir = NULL,
  keep_model = FALSE, overwrite = NULL, quiet = TRUE,
  progress.bar = "text", warn = TRUE, mess = TRUE,
  keep_scaled_mcmc = FALSE, ...)

lme_imp(fixed, data, random, n.chains = 3, n.adapt = 100, n.iter = 0,
  thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL,
  models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE,
  ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE,
  ncores = NULL, scale_vars = NULL, scale_pars = NULL,
  hyperpars = NULL, modelname = NULL, modeldir = NULL,
  keep_model = FALSE, overwrite = NULL, quiet = TRUE,
  progress.bar = "text", warn = TRUE, mess = TRUE,
  keep_scaled_mcmc = FALSE, ...)

glme_imp(fixed, data, random, family, n.chains = 3, n.adapt = 100,
  n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL,
  refcats = NULL, models = NULL, no_model = NULL, trunc = NULL,
  ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL,
  parallel = FALSE, ncores = NULL, scale_vars = NULL,
  scale_pars = NULL, hyperpars = NULL, modelname = NULL,
  modeldir = NULL, keep_model = FALSE, overwrite = NULL,
  quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE,
  keep_scaled_mcmc = FALSE, ...)

clmm_imp(fixed, data, random, n.chains = 3, n.adapt = 100,
  n.iter = 0, thin = 1, monitor_params = NULL, auxvars = NULL,
  refcats = NULL, models = NULL, no_model = NULL, trunc = NULL,
  ridge = FALSE, ppc = TRUE, seed = NULL, inits = NULL,
  parallel = FALSE, ncores = NULL, scale_vars = NULL,
  scale_pars = NULL, hyperpars = NULL, modelname = NULL,
  modeldir = NULL, keep_model = FALSE, overwrite = NULL,
  quiet = TRUE, progress.bar = "text", warn = TRUE, mess = TRUE,
  keep_scaled_mcmc = FALSE, ...)

survreg_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0,
  thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL,
  models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE,
  ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE,
  ncores = NULL, scale_vars = NULL, scale_pars = NULL,
  hyperpars = NULL, modelname = NULL, modeldir = NULL,
  keep_model = FALSE, overwrite = NULL, quiet = TRUE,
  progress.bar = "text", warn = TRUE, mess = TRUE,
  keep_scaled_mcmc = FALSE, ...)

coxph_imp(formula, data, n.chains = 3, n.adapt = 100, n.iter = 0,
  thin = 1, monitor_params = NULL, auxvars = NULL, refcats = NULL,
  models = NULL, no_model = NULL, trunc = NULL, ridge = FALSE,
  ppc = TRUE, seed = NULL, inits = NULL, parallel = FALSE,
  ncores = NULL, scale_vars = NULL, scale_pars = NULL,
  hyperpars = NULL, modelname = NULL, modeldir = NULL,
  keep_model = FALSE, overwrite = NULL, quiet = TRUE,
  progress.bar = "text", warn = TRUE, mess = TRUE,
  keep_scaled_mcmc = FALSE, ...)

Arguments

formula

a two sided model formula (see formula)

data

a data.frame

n.chains

the number of parallel chains for the model

n.adapt

the number of iterations for adaptation. See adapt for details. If n.adapt = 0 then no adaptation takes place.

n.iter

number of iterations to monitor

thin

thinning interval for monitors

monitor_params

named vector specifying which parameters should be monitored

auxvars

optional vector of variable names that should be used as predictors in the imputation procedure (and will be imputed if necessary) but are not part of the analysis model

refcats

optional; a named list specifying which category should be used as reference category for each of the categorical variables. Options are the category label, the category number, "first" (the first category), "last" (the last category) or "largest" (chooses the category with the most observations). Default is "first". (See also set_refcat)

models

optional named vector specifying the order and types of the models for incomplete covariates and longitudinal covariates. This arguments replaces the argument meth used in earlier versions. If NULL (default) models will be determined automatically based on the class of the respective columns of data (see Details). The default order is incomplete baseline covariates, complete longitudinal covariates, incomplete longitudinal covariates, and within each group variables are ordered according to the proportion of missing values (increasing).

no_model

names of variables for which no model should be specified. Note that this is only possible for completely observed variables and may imply assumptions of independence between the excluded variable and incomplete variables.

trunc

optional named list specifying the limits of truncation for the distribution of the named incomplete variables

ridge

logical; should the parameters of the main model be penalized using ridge regression? Default is FALSE

ppc

logical: should monitors for posterior predictive checks be set? (not yet used)

seed

optional seed value for reproducibility

inits

optional specification of initial values in the form of a list or a function (see jags.model). If omitted, initial values will be generated automatically by JAGS. It is an error to supply an initial value for an observed node.

parallel

logical; should the chains be sampled using parallel computation? Default is FALSE

ncores

number of cores to use for parallel computation; if left empty all except two cores will be used

scale_vars

optional; named vector of (continuous) variables that will be scaled (such that mean = 0 and sd = 1) to improve convergence of the MCMC sampling. Default is that all continuous variables that are not transformed by a function (e.g. log(), ns()) will be scaled. Variables for which "lognorm" is set as imputation model are only scaled with regards to the standard deviation, but not centered. Variables to be imputed with "gamma", "beta" or "glmm_gamma" are not scaled. If set to FALSE no scaling will be done.

scale_pars

optional matrix of parameters used for centering and scaling continuous covariates. If not specified, this will be calculated automatically. If FALSE, no scaling will be done.

hyperpars

list of hyperparameters, as obtained by default_hyperpars(); only needs to be supplied if hyperparameters other than the default should be used

modelname

optional; character string specifying the name of the model file (including the ending, either .R or .txt). If unspecified a random name will be generated.

modeldir

optional; directory containing model file or directory in which the model file will be written. If unspecified a temporary directory will be created.

keep_model

logical; whether the created JAGS model should be saved or removed from the disk (FALSE; default) when the sampling has finished.

overwrite

logical; whether an existing model file with the specified <modeldir>/<modelname> should be overwritten. If set to FALSE and a model already exists, that model will be used. If unspecified (NULL) and a file exists, the user is asked for input on how to proceed

quiet

if TRUE then messages generated during compilation will be suppressed, as well as the progress bar during adaptation.

progress.bar

character string specifying the type of progress bar. Possible values are "text", "gui", and "none". See update.

warn

logical; should warnings be given? Default is TRUE. Note: this applies only to warnings given directly by JointAI.

mess

logical; should messages be given? Default is TRUE. Note: this applies only to messages given directly by JointAI.

keep_scaled_mcmc

should the "original" MCMC sample (i.e., the scaled version returned by coda.samples()) be kept? (The MCMC sample that is re-scaled to the scale of the data is always kept.)

...

additional, optional arguments

family

only for glm_imp and glmm_imp: a description of the distribution and link function to be used in the model. This can be a character string naming a family function, a family function or the result of a call to a family function. (See family and the `Details` section below.)

fixed

a two sided formula describing the fixed-effects part of the model (see formula)

random

only for lme_imp, glme_imp and clmm_imp: a one-sided formula of the form ~x1 + ... + xn | g, where x1 + ... + xn specifies the model for the random effects and g the grouping variable

Value

An object of class JointAI.

Details

See also the vignette: Model Specification

Implemented distribution families and link functions for glm_imp() and glme_imp()

gaussian
with links:
identity,
log
binomial
with links:
logit,
probit,
log,
cloglog
Gamma
with links:
identity,
log
poisson
with links:
log,
identity

Imputation methods

Implemented imputation models that can be chosen in the argument models are:
norm
linear model
lognorm
log-linear model for skewed continuous data
gamma
gamma model (with log-link) for skewed continuous data
beta
beta model (with logit-link) for skewed continuous data in (0, 1)
logit
logistic model for binary data
multilogit
multinomial logit model for unordered categorical variables
cumlogit
cumulative logit model for ordered categorical variables
lmm
linear mixed model for continuous longitudinal covariates
glmm_gamma
Gamma mixed model for skewed longitudinal covariates
glmm_logit
logit mixed model for binary longitudinal covariates
glmm_poisson
Poisson mixed model for longitudinal count covariates
clmm
cumulative logit mixed model for longitudinal ordered factors
When models are specified for only a subset of the incomplete or longitudinal covariates involved in a model, the default choices are used for all unspecified variables.

Parameters to follow (monitor_params)

See also the vignette: Parameter Selection
Named vector specifying which parameters should be monitored. This can be done either directly by specifying the name of the parameter or indirectly by one of the key words selecting a set of parameters. Except for other, in which parameter names are specified directly, parameter (groups) are just set as TRUE or FALSE. If left unspecified, monitor_params = c("analysis_main" = TRUE) will be used.
name/key word
what is monitored
analysis_main
betasand
sigma_y(and
Din mixed models)
analysis_random
ranef,
D,
invD,
RinvD
imp_pars
alphas,
tau_imp,
gamma_imp,
delta_imp
imps
imputed values
betas
regression coefficients of the analysis model
tau_y
precision of the residuals from the analysis model
sigma_y
standard deviation of the residuals from the analysis model
ranef
random effects
b
D
covariance matrix of the random effects
invD
inverse of
D
RinvD
matrix in the prior for
invD
alphas
regression coefficients in the imputation models
tau_imp
precision parameters of the residuals from imputation models
gamma_imp
intercepts in ordinal imputation models
delta_imp
increments of ordinal intercepts
other
additional parameters
For example: monitor_params = c(analysis_main = TRUE, tau_y = TRUE, sigma_y = FALSE) would monitor the regression parameters betas and the residual precision tau_y instead of the residual standard deviation sigma_y. monitor_params = c(imps = TRUE) would monitor betas, tau_y, and sigma_y (because analysis_main = TRUE by default) as well as the imputed values.

Note

Coding of variables:

The default imputation methods are chosen based on the class of each of the incomplete variables, distinguishing between numeric, factor with two levels, unordered factor with >2 levels and ordered factor with >2 levels. When a continuous incomplete variable has only two different values it is assumed to be binary and its coding and default imputation model will be changed accordingly. This behavior can be overwritten when the imputation method for that variable is specified directly by the user. Variables of type `logical` are automatically be converted to unordered factors. Contrary to base R behavior, dummy coding (i.e., contr.treatment contrasts) are used for ordered factors in any linear predictor. However, since the order of levels in an ordered factor contains information relevant to the imputation of missing values, it is important that incomplete ordinal variables are coded as such.

Non-linear effects and transformation of variables:

JointAI handles non-linear effects, transformation of covariates and interactions the following way: When, for instance, the model formula contains the function log(x) and x has missing values, x will be imputed and used in the linear predictor of models for other incomplete variables, i.e., it is assumed that the other variables have a linear association with x but not with log(x). The log() of the observed and imputed values of x is calculated and used in the linear predictor of the analysis model. If, instead of using log(x) in the model formula a pre-calculated variable logx would be used instead, this variable is imputed directly and used in the linear predictors of all models, implying that other incomplete variables that have logx in their linear predictors have a linear association with logx but not with x. When different transformations of the same incomplete variable are used in one model it is strongly discouraged to calculate these transformations beforehand and supply them as different variables. If, for example, a model formula contains both x and x2 (where x2 = x^2), they are treated as separate variables and imputed with separate models. Imputed values of x2 are thus not equal to the square of imputed values of x. Instead, x + I(x^2) should be used in the model formula. Then, only x is imputed and used in the linear predictor of models for other incomplete variables, and x^2 is calculated from the imputed values of x internally. The same applies to interactions involving incomplete variables.

Not (yet) possible:

  • multiple nesting levels of random effects (nested or crossed)

  • prediction (using predict) conditional on random effects

  • the use of splines for incomplete variables

  • the use of pspline, frailty, cluster or strata in survival models

  • left censored or interval censored data

See also

Examples

# Example 1: Linear regression with incomplete covariates mod1 <- lm_imp(y~C1 + C2 + M2, data = wideDF, n.iter = 100)
#> This is new software. Please report any bugs to the package maintainer.
# Example 2: Logistic regression with incomplete covariats mod2 <- glm_imp(B1 ~ C1 + C2 + M2, data = wideDF, family = binomial(link = "logit"), n.iter = 100)
#> This is new software. Please report any bugs to the package maintainer.
# Example 3: Linear mixed model with incomplete covariates mod3 <- lme_imp(y ~ C1 + B2 + c1 + time, random = ~ time|id, data = longDF, n.iter = 500)
#> This is new software. Please report any bugs to the package maintainer.