In this practical, a number of R packages are used. If any of them are not installed you may be able to follow the practical but will not be able to run all of the code. The packages used (with versions that were used to generate the solutions) are:

- R version 3.6.0 (2019-04-26)
`mice`

(version: 3.4.0)`JointAI`

(version: 0.5.1)`ggplot2`

(version: 3.1.1)`reshape2`

(version: 1.4.3)`ggpubr`

(version: 0.2)

You can find help files for any function by adding a `?`

before the name of the function.

Alternatively, you can look up the help pages online at https://www.rdocumentation.org/ or find the whole manual for a package at https://cran.r-project.org/web/packages/available_packages_by_name.html

For this practical, we will use a subset of the **NHANES** dataset that we have seen in the previous practicals. It contains only those cases that have observed `wgt`

and some columns that are not needed were excluded.

Download the file `NHANES_for_practicals_2.RData`

from here. To load this dataset, you can use the command `file.choose()`

which opens the explorer and allows you to navigate to the location of the file `NHANES_for_practicals_2.RData`

on your computer. If you know the path to the file, you can also use `load("<path>/NHANES_for_practicals_2.RData")`

.

The focus of this practical is the imputation of data that has features that require special attention.

In the interest of time, we will focus on these features and **abbreviate steps that are the same as in any imputation setting** (e.g., getting to know the data or checking that imputed values are realistic). **Nevertheless, these steps are of course required when analysing data in practice.**

Our aim is to fit the following **linear regression model for weight**:

We expect that the effects of cholesterol and HDL may differ with age, and, hence, include **interaction terms** between `age`

and `chol`

and `HDL`

, respectively.

Additionally, we want to include the other variables in the dataset as auxiliary variables.

When the analysis model of interest involves interaction terms between incomplete variables, **mice** has limited options to reduce the bias that may be introduced by naive handling of the missing values.

Use of the “Just Another Variable” approach can in some settings reduce bias. Alternatively, we can use passive imputation, i.e., calculate the interaction terms in each iteration of the MICE algorithm. Furthermore, predictive mean matching tends to lead to less bias than normal imputation models.

- Calculate the interaction terms in the incomplete data.
- Perform the setup-run of
`mice()`

without any iterations.

```
# calculate the interaction terms
NHANES$agechol <- NHANES$age * NHANES$chol
NHANES$ageHDL <- NHANES$age * NHANES$HDL
# setup run
imp0 <- mice(NHANES, maxit = 0,
defaultMethod = c('norm', 'logreg', 'polyreg', 'polr'))
imp0
```

```
## Class: mids
## Number of multiple imputations: 5
## Imputation methods:
## wgt gender bili age chol HDL hgt educ race SBP hypten
## "" "" "norm" "" "norm" "norm" "norm" "polr" "" "norm" "logreg"
## WC agechol ageHDL
## "norm" "norm" "norm"
## PredictorMatrix:
## wgt gender bili age chol HDL hgt educ race SBP hypten WC agechol ageHDL
## wgt 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## gender 1 0 1 1 1 1 1 1 1 1 1 1 1 1
## bili 1 1 0 1 1 1 1 1 1 1 1 1 1 1
## age 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## chol 1 1 1 1 0 1 1 1 1 1 1 1 1 1
## HDL 1 1 1 1 1 0 1 1 1 1 1 1 1 1
```

Apply the necessary change to the imputation method and predictor matrix.

Since the interaction terms are calculated from the orignal variables, these interaction terms should not be used to impute the original variables.

```
meth <- imp0$method
pred <- imp0$predictorMatrix
# change imputation for "bili" to pmm (to prevent negative values)
meth["bili"] <- 'pmm'
# changes in predictor matrix to prevent original variables being imputer based
# on the interaction terms
pred["chol", "agechol"] <- 0
pred["HDL", "ageHDL"] <- 0
meth
```

```
## wgt gender bili age chol HDL hgt educ race SBP hypten
## "" "" "pmm" "" "norm" "norm" "norm" "polr" "" "norm" "logreg"
## WC agechol ageHDL
## "norm" "norm" "norm"
```

```
## wgt gender bili age chol HDL hgt educ race SBP hypten WC agechol ageHDL
## wgt 0 1 1 1 1 1 1 1 1 1 1 1 1 1
## gender 1 0 1 1 1 1 1 1 1 1 1 1 1 1
## bili 1 1 0 1 1 1 1 1 1 1 1 1 1 1
## age 1 1 1 0 1 1 1 1 1 1 1 1 1 1
## chol 1 1 1 1 0 1 1 1 1 1 1 1 0 1
## HDL 1 1 1 1 1 0 1 1 1 1 1 1 1 0
## hgt 1 1 1 1 1 1 0 1 1 1 1 1 1 1
## educ 1 1 1 1 1 1 1 0 1 1 1 1 1 1
## race 1 1 1 1 1 1 1 1 0 1 1 1 1 1
## SBP 1 1 1 1 1 1 1 1 1 0 1 1 1 1
## hypten 1 1 1 1 1 1 1 1 1 1 0 1 1 1
## WC 1 1 1 1 1 1 1 1 1 1 1 0 1 1
## agechol 1 1 1 1 1 1 1 1 1 1 1 1 0 1
## ageHDL 1 1 1 1 1 1 1 1 1 1 1 1 1 0
```

Run the imputation using the **JAV approach** and check the traceplot.

We skip the more detailed evaluation of the imputed values. With the settings given in the solution the chains have converged and distributions of the imputed values match the distributions of the observed data closely enough.

- Analyse the imputed data and pool the results.

For the passive imputation, we can re-use the adjusted versions of `meth`

and `pred`

we created for the JAV approach, but additional changes to `meth`

are necessary.

Specify the new imputation method, i.e., adapt `meth`

and save it as `methPAS`

.

For passive imputation instead of an imputation method you need to specify the formula used to calculate the value that is imputed passively.

Run the imputation using **passive imputation** and check the traceplot.