Preface

R packages

In this practical, a number of R packages are used. If any of them are not installed you may be able to follow the practical but will not be able to run all of the code. The packages used (with versions that were used to generate the solutions) are:

  • R version 4.3.0 (2023-04-21 ucrt)
  • mice (version: 3.15.0)
  • RColorBrewer (version: 1.1.3)
  • reshape2 (version: 1.4.4)
  • ggplot2 (version: 3.4.2)

Help files

You can find help files for any function by adding a ? before the name of the function.

Alternatively, you can look up the help pages online at https://www.rdocumentation.org/ or find the whole manual for a package at https://cran.r-project.org/web/packages/available_packages_by_name.html

Dataset

For this practical, we will use the EP16dat1 dataset, which is a subset of the NHANES (National Health and Nutrition Examination Survey) data.

To get the EP16dat1 dataset, load the file EP16dat1.RData. You can download it here.

To load this dataset into R, you can use the command file.choose() which opens the explorer and allows you to navigate to the location of the file on your computer.

If you know the path to the file, you can also use load("<path>/EP16dat1.RData").

If you have not followed the first practical or if you re-loaded the EP16dat1 data, you need to re-code the variable educ again:

EP16dat1$educ <- as.ordered(EP16dat1$educ)

Imputed data

The imputed data are stored in a mids object called imp that we created in the previous practical.

You can load it into your workspace by clicking the object imps.RData if you are using RStudio. Alternatively, you can load this workspace using load("<path>/imps.RData").

The help file tells us that a mids object is a list with several elements:

data: Original (incomplete) data set.
imp: The imputed values: A list of ncol(data) components, each list component is a matrix with nmis[j] rows and m columns.
m: The number of imputations.
where: The missingness indicator matrix.
blocks The blocks argument of the mice() function.
call: The call that created the mids object.
nmis: The number of missing observations per variable.
method: The vector imputation methods.
predictorMatrix: The predictor matrix.
visitSequence: The sequence in which columns are visited during imputation.
formulas A named list of formulas corresponding the the imputed variables (blocks).
post: A vector of strings of length length(blocks) with commands for post-processing.
seed: The seed value of the solution.
iteration: The number of iterations.
lastSeedValue: The most recent seed value.
chainMean: The mean of imputed values per variable and iteration: a list of m components. Each component is a matrix with maxitcolumns and length(visitSequence) rows.
chainVar: The variances of imputed values per variable and iteration(same structure as chainMean).
loggedEvents: A data.frame with the record of automatic corrective actions and warnings; (NULL if no action was made).
version Version number of the mice package that created the object.
date Date at which the object was created.

Details of the loggedEvents:

mice() does some pre-processing of the data:

  • variables containing missing values, that are not imputed but used as predictor are removed
  • constant variables are removed
  • collinear variables are removed

Furthermore, during each iteration

  • variables (or dummy variables) that are linearly dependent are removed
  • polr imputation that does not converge is replaced by polyreg.
The data.frame in loggedEvents has the following columns:
it iteration number
im imputation number
dep name of the name of the variable being imputed
meth imputation method used
out character vector with names of altered/removed predictors

Evaluate the imputation

Checking the settings

It is good practice to make sure that mice() has not done any processing of the data that was not planned or that you are not aware of. This means checking that the correct method, predictorMatrix and visitSequence were used.

Task

Do these checks for imp.

Solution

imp$method
##             HDL            race             DBP            bili           smoke              DM 
##          "norm"              ""          "norm"          "norm"          "polr"              "" 
##          gender              WC            chol        HyperMed             alc             SBP 
##              ""          "norm"          "norm"              ""          "polr"          "norm" 
##             wgt          hypten          cohort           occup             age            educ 
##          "norm"        "logreg"              ""       "polyreg"              ""          "polr" 
##            albu           creat        uricacid             BMI         hypchol             hgt 
##          "norm"           "pmm"          "norm" "~I(wgt/hgt^2)"        "logreg"          "norm"
imp$predictorMatrix
##          HDL race DBP bili smoke DM gender WC chol HyperMed alc SBP wgt hypten cohort occup age
## HDL        0    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## race       1    0   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## DBP        1    1   0    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## bili       1    1   1    0     1  1      1  1    1        0   1   1   0      1      0     1   1
## smoke      1    1   1    1     0  1      1  1    1        0   1   1   0      1      0     1   1
## DM         1    1   1    1     1  0      1  1    1        0   1   1   0      1      0     1   1
## gender     1    1   1    1     1  1      0  1    1        0   1   1   0      1      0     1   1
## WC         1    1   1    1     1  1      1  0    1        0   1   1   0      1      0     1   1
## chol       1    1   1    1     1  1      1  1    0        0   1   1   0      1      0     1   1
## HyperMed   1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## alc        1    1   1    1     1  1      1  1    1        0   0   1   0      1      0     1   1
## SBP        1    1   1    1     1  1      1  1    1        0   1   0   0      1      0     1   1
## wgt        1    1   1    1     1  1      1  0    1        0   1   1   0      1      0     1   1
## hypten     1    1   1    1     1  1      1  1    1        0   1   1   0      0      0     1   1
## cohort     1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## occup      1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     0   1
## age        1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   0
## educ       1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## albu       1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## creat      1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## uricacid   1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## BMI        1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## hypchol    1    1   1    1     1  1      1  1    1        0   1   1   0      1      0     1   1
## hgt        1    1   1    1     1  1      1  1    1        0   1   1   1      1      0     1   1
##          educ albu creat uricacid BMI hypchol hgt
## HDL         1    1     1        1   1       1   0
## race        1    1     1        1   1       1   0
## DBP         1    1     1        1   1       1   0
## bili        1    1     1        1   1       1   0
## smoke       1    1     1        1   1       1   0
## DM          1    1     1        1   1       1   0
## gender      1    1     1        1   1       1   0
## WC          1    1     1        1   1       1   0
## chol        1    1     1        1   1       0   0
## HyperMed    1    1     1        1   1       1   0
## alc         1    1     1        1   1       1   0
## SBP         1    1     1        1   1       1   0
## wgt         1    1     1        1   0       1   1
## hypten      1    1     1        1   1       1   0
## cohort      1    1     1        1   1       1   0
## occup       1    1     1        1   1       1   0
## age         1    1     1        1   1       1   0
## educ        0    1     1        1   1       1   0
## albu        1    0     1        1   1       1   0
## creat       1    1     0        1   1       1   0
## uricacid    1    1     1        0   1       1   0
## BMI         1    1     1        1   0       1   0
## hypchol     1    1     1        1   1       0   0
## hgt         1    1     1        1   0       1   0
imp$visitSequence
##  [1] "HDL"      "race"     "DBP"      "bili"     "smoke"    "DM"       "gender"   "WC"      
##  [9] "chol"     "HyperMed" "alc"      "SBP"      "wgt"      "hypten"   "cohort"   "occup"   
## [17] "age"      "educ"     "albu"     "creat"    "uricacid" "hypchol"  "hgt"      "BMI"
# you can also try
# identical(imp$method, meth)
# identical(imp$predictorMatrix, pred)
# identical(imp$visitSequence, visSeq)

Logged events

Checking the loggedEvent shows us if mice() detected any problems during the imputation.

Task 1

Check the loggedEvents for imp.

Solution 1

imp$loggedEvents
## NULL

There are no logged events, great!

Task 2

Let’s see what would have happened if we had not prepared the predictorMatrix, method and visitSequence before imputation.

Run the imputation without setting any additional arguments:
impnaive <- mice(EP16dat1, m = 5, maxit = 30)

Take a look at the loggedEvents of impnaive.

Solution 2

impnaive$loggedEvents
##     it im      dep     meth       out
## 1    0  0          constant    cohort
## 2    1  1 HyperMed     polr hyptenyes
## 3    1  2 HyperMed     polr hyptenyes
## 4    1  3 HyperMed     polr hyptenyes
## 5    1  4 HyperMed     polr hyptenyes
## 6    1  5 HyperMed     polr hyptenyes
## 7    2  1 HyperMed     polr hyptenyes
## 8    2  2 HyperMed     polr hyptenyes
## 9    2  3 HyperMed     polr hyptenyes
## 10   2  4 HyperMed     polr hyptenyes
## 11   2  5 HyperMed     polr hyptenyes
## 12   3  1 HyperMed     polr hyptenyes
## 13   3  2 HyperMed     polr hyptenyes
## 14   3  3 HyperMed     polr hyptenyes
## 15   3  4 HyperMed     polr hyptenyes
## 16   3  5 HyperMed     polr hyptenyes
## 17   4  1 HyperMed     polr hyptenyes
## 18   4  2 HyperMed     polr hyptenyes
## 19   4  3 HyperMed     polr hyptenyes
## 20   4  4 HyperMed     polr hyptenyes
## 21   4  5 HyperMed     polr hyptenyes
## 22   5  1 HyperMed     polr hyptenyes
## 23   5  2 HyperMed     polr hyptenyes
## 24   5  3 HyperMed     polr hyptenyes
## 25   5  4 HyperMed     polr hyptenyes
## 26   5  5 HyperMed     polr hyptenyes
## 27   6  1 HyperMed     polr hyptenyes
## 28   6  2 HyperMed     polr hyptenyes
## 29   6  3 HyperMed     polr hyptenyes
## 30   6  4 HyperMed     polr hyptenyes
## 31   6  5 HyperMed     polr hyptenyes
## 32   7  1 HyperMed     polr hyptenyes
## 33   7  2 HyperMed     polr hyptenyes
## 34   7  3 HyperMed     polr hyptenyes
## 35   7  4 HyperMed     polr hyptenyes
## 36   7  5 HyperMed     polr hyptenyes
## 37   8  1 HyperMed     polr hyptenyes
## 38   8  2 HyperMed     polr hyptenyes
## 39   8  3 HyperMed     polr hyptenyes
## 40   8  4 HyperMed     polr hyptenyes
## 41   8  5 HyperMed     polr hyptenyes
## 42   9  1 HyperMed     polr hyptenyes
## 43   9  2 HyperMed     polr hyptenyes
## 44   9  3 HyperMed     polr hyptenyes
## 45   9  4 HyperMed     polr hyptenyes
## 46   9  5 HyperMed     polr hyptenyes
## 47  10  1 HyperMed     polr hyptenyes
## 48  10  2 HyperMed     polr hyptenyes
## 49  10  3 HyperMed     polr hyptenyes
## 50  10  4 HyperMed     polr hyptenyes
## 51  10  5 HyperMed     polr hyptenyes
## 52  11  1 HyperMed     polr hyptenyes
## 53  11  2 HyperMed     polr hyptenyes
## 54  11  3 HyperMed     polr hyptenyes
## 55  11  4 HyperMed     polr hyptenyes
## 56  11  5 HyperMed     polr hyptenyes
## 57  12  1 HyperMed     polr hyptenyes
## 58  12  2 HyperMed     polr hyptenyes
## 59  12  3 HyperMed     polr hyptenyes
## 60  12  4 HyperMed     polr hyptenyes
## 61  12  5 HyperMed     polr hyptenyes
## 62  13  1 HyperMed     polr hyptenyes
## 63  13  2 HyperMed     polr hyptenyes
## 64  13  3 HyperMed     polr hyptenyes
## 65  13  4 HyperMed     polr hyptenyes
## 66  13  5 HyperMed     polr hyptenyes
## 67  14  1 HyperMed     polr hyptenyes
## 68  14  2 HyperMed     polr hyptenyes
## 69  14  3 HyperMed     polr hyptenyes
## 70  14  4 HyperMed     polr hyptenyes
## 71  14  5 HyperMed     polr hyptenyes
## 72  15  1 HyperMed     polr hyptenyes
## 73  15  2 HyperMed     polr hyptenyes
## 74  15  3 HyperMed     polr hyptenyes
## 75  15  4 HyperMed     polr hyptenyes
## 76  15  5 HyperMed     polr hyptenyes
## 77  16  1 HyperMed     polr hyptenyes
## 78  16  2 HyperMed     polr hyptenyes
## 79  16  3 HyperMed     polr hyptenyes
## 80  16  4 HyperMed     polr hyptenyes
## 81  16  5 HyperMed     polr hyptenyes
## 82  17  1 HyperMed     polr hyptenyes
## 83  17  2 HyperMed     polr hyptenyes
## 84  17  3 HyperMed     polr hyptenyes
## 85  17  4 HyperMed     polr hyptenyes
## 86  17  5 HyperMed     polr hyptenyes
## 87  18  1 HyperMed     polr hyptenyes
## 88  18  2 HyperMed     polr hyptenyes
## 89  18  3 HyperMed     polr hyptenyes
## 90  18  4 HyperMed     polr hyptenyes
## 91  18  5 HyperMed     polr hyptenyes
## 92  19  1 HyperMed     polr hyptenyes
## 93  19  2 HyperMed     polr hyptenyes
## 94  19  3 HyperMed     polr hyptenyes
## 95  19  4 HyperMed     polr hyptenyes
## 96  19  5 HyperMed     polr hyptenyes
## 97  20  1 HyperMed     polr hyptenyes
## 98  20  2 HyperMed     polr hyptenyes
## 99  20  3 HyperMed     polr hyptenyes
## 100 20  4 HyperMed     polr hyptenyes
## 101 20  5 HyperMed     polr hyptenyes
## 102 21  1 HyperMed     polr hyptenyes
## 103 21  2 HyperMed     polr hyptenyes
## 104 21  3 HyperMed     polr hyptenyes
## 105 21  4 HyperMed     polr hyptenyes
## 106 21  5 HyperMed     polr hyptenyes
## 107 22  1 HyperMed     polr hyptenyes
## 108 22  2 HyperMed     polr hyptenyes
## 109 22  3 HyperMed     polr hyptenyes
## 110 22  4 HyperMed     polr hyptenyes
## 111 22  5 HyperMed     polr hyptenyes
## 112 23  1 HyperMed     polr hyptenyes
## 113 23  2 HyperMed     polr hyptenyes
## 114 23  3 HyperMed     polr hyptenyes
## 115 23  4 HyperMed     polr hyptenyes
## 116 23  5 HyperMed     polr hyptenyes
## 117 24  1 HyperMed     polr hyptenyes
## 118 24  2 HyperMed     polr hyptenyes
## 119 24  3 HyperMed     polr hyptenyes
## 120 24  4 HyperMed     polr hyptenyes
## 121 24  5 HyperMed     polr hyptenyes
## 122 25  1 HyperMed     polr hyptenyes
## 123 25  2 HyperMed     polr hyptenyes
## 124 25  3 HyperMed     polr hyptenyes
## 125 25  4 HyperMed     polr hyptenyes
## 126 25  5 HyperMed     polr hyptenyes
## 127 26  1 HyperMed     polr hyptenyes
## 128 26  2 HyperMed     polr hyptenyes
## 129 26  3 HyperMed     polr hyptenyes
## 130 26  4 HyperMed     polr hyptenyes
## 131 26  5 HyperMed     polr hyptenyes
## 132 27  1 HyperMed     polr hyptenyes
## 133 27  2 HyperMed     polr hyptenyes
## 134 27  3 HyperMed     polr hyptenyes
## 135 27  4 HyperMed     polr hyptenyes
## 136 27  5 HyperMed     polr hyptenyes
## 137 28  1 HyperMed     polr hyptenyes
## 138 28  2 HyperMed     polr hyptenyes
## 139 28  3 HyperMed     polr hyptenyes
## 140 28  4 HyperMed     polr hyptenyes
## 141 28  5 HyperMed     polr hyptenyes
## 142 29  1 HyperMed     polr hyptenyes
## 143 29  2 HyperMed     polr hyptenyes
## 144 29  3 HyperMed     polr hyptenyes
## 145 29  4 HyperMed     polr hyptenyes
## 146 29  5 HyperMed     polr hyptenyes
## 147 30  1 HyperMed     polr hyptenyes
## 148 30  2 HyperMed     polr hyptenyes
## 149 30  3 HyperMed     polr hyptenyes
## 150 30  4 HyperMed     polr hyptenyes
## 151 30  5 HyperMed     polr hyptenyes

The loggedEvents of the “naive” imputation show that the constant variable cohort was excluded before the imputation (as it should be). Furthermore, in the imputation model for HyperMed, the variable hyptenyes was excluded (hyptenyes is the dummy variable belonging to hypten).


Task 3

We did not change the visitSequence in impnaive. Find out how that affected the imputed values of BMI.

You can get an imputed datasets from a mids object using the function complete().

Solution 3

naiveDF1 <- complete(impnaive, 1)
naivecalcBMI <- with(naiveDF1, wgt/hgt^2)

impDF1 <- complete(imp, 1)
impcalcBMI <- with(impDF1, wgt/hgt^2)

cbind(naiveBMI = naiveDF1$BMI, naivecalcBMI,
      impBMI = impDF1$BMI, impcalcBMI)[which(is.na(EP16dat1$BMI)), ]
##       naiveBMI naivecalcBMI   impBMI impcalcBMI
##  [1,] 29.28804     29.28669 12.29198   12.29198
##  [2,] 45.77853     46.16466 26.31271   26.31271
##  [3,] 37.58893     39.85655 21.32477   21.32477
##  [4,] 28.24545     27.92749 12.11641   12.11641
##  [5,] 28.50083     29.28615 26.55661   26.55661
##  [6,] 46.22079     54.49122 37.22307   37.22307
##  [7,] 29.81680     30.72272 19.89599   19.89599
##  [8,] 26.36511     26.52057 28.69099   28.69099
##  [9,] 22.31422     22.70998 25.80886   25.80886
## [10,] 41.53878     40.16697 38.69418   38.69418
## [11,] 26.62512     26.52057 29.93671   29.93671
## [12,] 24.03070     23.29698 22.30932   22.30932
## [13,] 24.27618     24.21038 22.70058   22.70058
## [14,] 30.84897     31.17668 26.87031   26.87031
## [15,] 26.25750     25.79309 36.32201   36.32201
## [16,] 34.54235     37.10645 31.52410   31.52410
## [17,] 25.74718     24.18514 22.72177   22.72177
## [18,] 24.53319     24.53319 21.04168   21.04168
## [19,] 20.19399     20.71344 22.31184   22.31184
## [20,] 27.45674     27.01934 26.08453   26.08453
## [21,] 26.38655     20.85066 21.31324   21.31324
## [22,] 24.40756     25.05929 25.05159   25.05159
## [23,] 29.13142     28.21474 25.19507   25.19507
## [24,] 24.69459     22.67395 20.61340   20.61340
## [25,] 29.53453     29.26408 31.60415   31.60415
## [26,] 39.57982     45.89482 44.32482   44.32482
## [27,] 36.31558     36.02272 34.53010   34.53010
## [28,] 25.24027     24.41214 23.56695   23.56695
## [29,] 25.39970     26.25655 26.99558   26.99558
## [30,] 28.27628     26.45250 24.79892   24.79892
## [31,] 26.49935     25.15391 26.77051   26.77051
## [32,] 21.78935     22.95737 33.93099   33.93099
## [33,] 31.93006     33.12522 36.44915   36.44915
## [34,] 27.43844     26.45250 43.50957   43.50957
## [35,] 20.36067     21.41254 21.04539   21.04539
## [36,] 41.14373     44.28506 22.62897   22.62897
## [37,] 29.95326     29.75790 31.75090   31.75090

When we compare the imputed and calculated values of BMI from impnaive we can see that the imputed hgt and wgt give a different BMI than is imputed. This is because BMI is imputed before wgt, which means that the most recent imputed value of wgt is from the previous iteration.

Changing the visitSequence in imp prevented this inconsistency.

Convergence

In order to obtain correct results, the MICE algorithm needs to have converged. This can be checked visually by plotting summaries of the imputed values accross the iterations.

The mean and variance of the imputed values per iteration and variable are stored in the elements chainMean and chainVar of the mids object.

Task

Plot them to see if our imputation imp has converged.

Solution

# implemented plotting function (use layout to change the number of rows and columns)
plot(imp, layout = c(6, 6))

The chains in imp seem to have converged, however it is difficult to judge this based on only 10 iterations. In practice, more iterations should be done.

To save you some time, I ran the imputation again with 30 iterations and the traceplots confirm convergence:

plot(imp30, layout = c(6, 6))

Continue

In comparison, impnaive had some convergence problems:

plot(impnaive, layout = c(6, 7))

hgt, and wgt show a clear trend and the chains do not mix well, i.e., there is more variation between the chains than within each chain. (the same is the case for BMI).

These are clear signs that there is correlation or identification problems between these variables and some other variables (which is why we made adjustments to the predictorMatrix for imp).

Imputed values

Now that we know that imp has converged, we can compare the distribution of the imputed values against the distribution of the observed values. When our imputation models fit the data well, they should have similar distributions (conditional on the covariates used in the imputation model).

Task 1

  • Plot the distributions of the imputed variables (continuous and categorical).
  • Make sure the imputed values are realistic (e.g., height of 2.50m or weight of 10kg for adults).

You can use densityplot() and propplot() to get plots for all continuous and categorical variables.

propplot() is not part of any package. Copy the following syntax that defines this function:

To check all imputed values you can either get a summary of the imp element of the mids object or create a complete dataset containing all imputations using the function complete() and get the summary of that.

Solution 1

# plot densities of continuous variables
densityplot(imp)

# plot for all categorical variables
propplot(imp, legend.position = 'bottom')

# get the summary of the "the" imputed values
sapply(Filter(function(x) nrow(x) > 0, imp$imp),
       function(x) summary(unlist(x))
)
## $HDL
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.2946  1.1362  1.4342  1.4106  1.6614  2.6618 
## 
## $DBP
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   38.71   63.39   71.77   71.44   78.46  111.44 
## 
## $bili
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
## -0.2331  0.5785  0.7556  0.7684  0.9569  1.7881 
## 
## $smoke
##   never  former current 
##       4       2       4 
## 
## $WC
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   57.63   87.10   98.25   99.52  109.17  149.89 
## 
## $chol
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.964   4.169   5.007   4.946   5.701   7.757 
## 
## $HyperMed
##    Mode    NA's 
## logical    4065 
## 
## $alc
##   0 <=1 1-3 3-7  >7 
## 243 730 221 161 240 
## 
## $SBP
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   54.63  111.00  125.98  124.95  138.58  171.71 
## 
## $wgt
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   28.14   62.81   73.79   76.09   92.02  125.26 
## 
## $hypten
##  no yes 
## 167  43 
## 
## $occup
##          working looking for work      not working 
##               56                4               25 
## 
## $educ
##  Less than 9th grade         9-11th grade High school graduate         some college 
##                    0                    0                    1                    2 
##     College or above 
##                    2 
## 
## $albu
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   3.293   4.075   4.301   4.290   4.527   5.285 
## 
## $creat
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##  0.4500  0.7150  0.8600  0.8689  0.9800  2.3400 
## 
## $uricacid
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.090   4.382   5.213   5.295   6.231  10.074 
## 
## $BMI
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   12.12   24.25   27.47   28.46   31.33   48.05 
## 
## $hypchol
##  no yes 
## 363  67 
## 
## $hgt
##    Min. 1st Qu.  Median    Mean 3rd Qu.    Max. 
##   1.386   1.532   1.615   1.620   1.680   1.961

Unfortunately, we have some negative imputed values for bili. Often, this would not result in bias in the analysis, but may be difficult to explain when providing a summary of the imputed data in a publication. In the present example we can see that the observed values have a slightly right-skewed distribution compared to the imputed values. Re-doing the imputation with pmm instead of norm for bili should fix this. (However, since the imputations seem fine overall, and there is little knowledge gain in re-doing the previous steps, we will skip this repetition in this practical.)

The distributions of the imputed values for hgt and SBP differ a bit from the distributions of the observed data.

We also imputed a larger proportion than might have been expected in the highest category of alc, and the distribution of values for smoke looks a bit weird (but smoke only has one missing value, which makes it difficult to judge the distribution of the imputed values).

Task 2

Investigate if differences in the distributions of observed and imputed values, can be explained by other variables. Check this for

  • SBP conditional on gender and hypten
  • hgt conditional on gender
  • alc conditional on gender or smoke

Solution 2

densityplot(imp, ~SBP|hypten + gender)

densityplot(imp, ~hgt|gender)

propplot(imp, alc ~ gender + smoke, legend.position = 'bottom')

 

© Nicole Erler