Preface

R packages

In this practical, a number of R packages are used. If any of them are not installed you may be able to follow the practical but will not be able to run all of the code. The packages used (with versions that were used to generate the solutions) are:

  • R version 3.6.0 (2019-04-26)
  • mice (version: 3.4.0)
  • RColorBrewer (version: 1.1.2)
  • reshape2 (version: 1.4.3)
  • ggplot2 (version: 3.1.1)

Help files

You can find help files for any function by adding a ? before the name of the function.

Alternatively, you can look up the help pages online at https://www.rdocumentation.org/ or find the whole manual for a package at https://cran.r-project.org/web/packages/available_packages_by_name.html

Dataset

For this practical, we will again use the NHANES dataset that we have seen in the previous practical.

To load this dataset, you can use the command file.choose() which opens the explorer and allows you to navigate to the location of the file NHANES_for_practicals.RData on your computer. If you know the path to the file, you can also use load("<path>/NHANES_for_practicals.RData").

If you have not followed the first practical or if you re-loaded the NHANES data you need to re-code the variable educ again:

Imputed data

The imputed data are stored in a mids object called imp that we created in the previous practical.

You can load it into your workspace by clicking the object imps.RData if you are using RStudio. Alternatively, you can load this workspace using load("<path>/imps.RData"). You then need to run:

The help file tells us that a mids object is a list with several elements:

data: Original (incomplete) data set.
imp: The imputed values: A list of ncol(data) components, each list component is a matrix with nmis[j] rows and m columns.
m: The number of imputations.
where: The missingness indicator matrix.
blocks The blocks argument of the mice() function.
call: The call that created the mids object.
nmis: The number of missing observations per variable.
method: The vector imputation methods.
predictorMatrix: The predictor matrix.
visitSequence: The sequence in which columns are visited during imputation.
formulas A named list of formulas corresponding the the imputed variables (blocks).
post: A vector of strings of length length(blocks) with commands for post-processing.
seed: The seed value of the solution.
iteration: The number of iterations.
lastSeedValue: The most recent seed value.
chainMean: The mean of imputed values per variable and iteration: a list of m components. Each component is a matrix with maxitcolumns and length(visitSequence) rows.
chainVar: The variances of imputed values per variable and iteration(same structure as chainMean).
loggedEvents: A data.frame with the record of automatic corrective actions and warnings; (NULL if no action was made).
version Version number of the mice package that created the object.
date Date at which the object was created.

Details of the loggedEvents:

mice() does some pre-processing of the data:

  • variables containing missing values, that are not imputed but used as predictor are removed
    • constant variables are removed
    • collinear variables are removed

Furthermore, during each iteration

  • variables (or dummy variables) that are linearly dependent are removed
    • polr imputation that does not converge is replaced by polyreg.
The data.frame in loggedEvents has the following columns:
it iteration number
im imputation number
dep name of the name of the variable being imputed
meth imputation method used
out character vector with names of altered/removed predictors

Evaluate the imputation

Checking the settings

It is good practice to make sure that mice() has not done any processing of the data that was not planned or that you are not aware of. This means checking that the correct method, predictorMatrix and visitSequence were used.

Task

Do these checks for imp.

Solution

##           occup             BMI             HDL         hypchol          hypten             SBP 
##       "polyreg" "~I(wgt/hgt^2)"          "norm"        "logreg"        "logreg"          "norm" 
##            bili            educ        HyperMed           creat          cohort             age 
##          "norm"              ""              ""           "pmm"              ""              "" 
##             DBP           smoke             wgt             alc              WC             hgt 
##          "norm"          "polr"          "norm"          "polr"          "norm"          "norm" 
##            race            chol        uricacid              DM          gender            albu 
##              ""          "norm"          "norm"              ""              ""          "norm"
##          occup BMI HDL hypchol hypten SBP bili educ HyperMed creat cohort age DBP smoke wgt alc WC
## occup        0   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## BMI          1   0   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## HDL          1   1   0       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## hypchol      1   1   1       0      1   1    1    1        0     1      0   1   1     1   0   1  1
## hypten       1   1   1       1      0   1    1    1        0     1      0   1   1     1   0   1  1
## SBP          1   1   1       1      1   0    1    1        0     1      0   1   1     1   0   1  1
## bili         1   1   1       1      1   1    0    1        0     1      0   1   1     1   0   1  1
## educ         1   1   1       1      1   1    1    0        0     1      0   1   1     1   0   1  1
## HyperMed     1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## creat        1   1   1       1      1   1    1    1        0     0      0   1   1     1   0   1  1
## cohort       1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## age          1   1   1       1      1   1    1    1        0     1      0   0   1     1   0   1  1
## DBP          1   1   1       1      1   1    1    1        0     1      0   1   0     1   0   1  1
## smoke        1   1   1       1      1   1    1    1        0     1      0   1   1     0   0   1  1
## wgt          1   0   1       1      1   1    1    1        0     1      0   1   1     1   0   1  0
## alc          1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   0  1
## WC           1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  0
## hgt          1   0   1       1      1   1    1    1        0     1      0   1   1     1   1   1  1
## race         1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## chol         1   1   1       0      1   1    1    1        0     1      0   1   1     1   0   1  1
## uricacid     1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## DM           1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## gender       1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
## albu         1   1   1       1      1   1    1    1        0     1      0   1   1     1   0   1  1
##          hgt race chol uricacid DM gender albu
## occup      0    1    1        1  1      1    1
## BMI        0    1    1        1  1      1    1
## HDL        0    1    1        1  1      1    1
## hypchol    0    1    1        1  1      1    1
## hypten     0    1    1        1  1      1    1
## SBP        0    1    1        1  1      1    1
## bili       0    1    1        1  1      1    1
## educ       0    1    1        1  1      1    1
## HyperMed   0    1    1        1  1      1    1
## creat      0    1    1        1  1      1    1
## cohort     0    1    1        1  1      1    1
## age        0    1    1        1  1      1    1
## DBP        0    1    1        1  1      1    1
## smoke      0    1    1        1  1      1    1
## wgt        1    1    1        1  1      1    1
## alc        0    1    1        1  1      1    1
## WC         0    1    1        1  1      1    1
## hgt        0    1    1        1  1      1    1
## race       0    0    1        1  1      1    1
## chol       0    1    0        1  1      1    1
## uricacid   0    1    1        0  1      1    1
## DM         0    1    1        1  0      1    1
## gender     0    1    1        1  1      0    1
## albu       0    1    1        1  1      1    0
##  [1] "occup"    "HDL"      "hypchol"  "hypten"   "SBP"      "bili"     "educ"     "HyperMed"
##  [9] "creat"    "cohort"   "age"      "DBP"      "smoke"    "wgt"      "alc"      "WC"      
## [17] "hgt"      "race"     "chol"     "uricacid" "DM"       "gender"   "albu"     "BMI"

Logged events

Checking the loggedEvent shows us if mice() detected any problems during the imputation.

Task 1

Check the loggedEvents for imp.

Solution 1

## NULL

There are no logged events, great!

Task 2

Let’s see what would have happened if we had not prepared the predictorMatrix, method and visitSequence before imputation.

Run the imputation without setting any additional arguments:
impnaive <- mice(NHANES, m = 5, maxit = 30)

Take a look at the loggedEvents of impnaive.

Solution 2