In this practical, a number of R packages are used. If any of them are not installed you can still follow the practical but will not be able to run all of the code. The packages used (with versions that were used to generate the solutions) are:
mice
(version: 3.4.0)You can find help files for any function by adding a ?
before the name of the function.
Alternatively, you can look up the help pages online at https://www.rdocumentation.org/ or find the whole manual for a package at https://cran.rproject.org/web/packages/available_packages_by_name.html
For this practical, we will again use the NHANES dataset that we have seen in the previous practicals.
To load this dataset, you can use the command file.choose()
which opens the explorer and allows you to navigate to the location of the file NHANES_for_practicals.RData
on your computer. If you know the path to the file, you can also use load("<path>/NHANES_for_practicals.RData")
.
If you have not followed the first practical or if you reloaded the NHANES data you need to recode the variable educ
again:
The imputed data are stored in a mids
object called imp
that we created in the previous practical.
You can load it into your workspace by clicking the object imps.RData
if you are using RStudio. Alternatively, you can load this workspace using load("<path>/imps.RData")
. You then need to run:
mids
objectsWhen we are confident that the imputation was successfull, the imputed data can be analysed with any standard complete data method.
Depending on the research question, for the NHANES data you might use
lm()
glm()
with family = binomial()
glm()
with an appropriate family
Since our imputed data imp
is not a data.frame
but a mids
object, we cannot use the standard specification lm(formula = ..., data = imp, ...)
but need to use the function with()
from the mice package:
with(imp, lm(formula, ...))
class
and structure of the resulting object.For example:
mod1 < with(imp, lm(SBP ~ age + gender + bili + chol + BMI))
mod2 < with(imp, glm(hypchol ~ age + gender + bili + race + BMI, family = "binomial"))
mod3 < with(imp, glm(creat ~ age + gender + uricacid + WC + BMI, family = Gamma(link = "log")))
# You may, for example, choose one of these models:
mod1 < with(imp, lm(SBP ~ age + gender + bili + chol + BMI))
mod2 < with(imp, glm(hypchol ~ age + gender + bili + race + BMI,
family = "binomial"))
mod3 < with(imp, glm(creat ~ age + gender + uricacid + WC + BMI,
family = Gamma(link = 'log')))
## [1] "mira" "matrix"
## [1] "call" "call1" "nmis" "analyses"
## $call
## [1] "call"
##
## $call1
## [1] "call"
##
## $nmis
## [1] "integer"
##
## $analyses
## [1] "list"
## [[1]]
##
## Call:
## lm(formula = SBP ~ age + gender + bili + chol + BMI)
##
## Coefficients:
## (Intercept) age genderfemale bili chol BMI
## 91.3922 0.4009 5.2851 3.4062 1.6550 0.2715
##
##
## [[2]]
##
## Call:
## lm(formula = SBP ~ age + gender + bili + chol + BMI)
##
## Coefficients:
## (Intercept) age genderfemale bili chol BMI
## 93.8528 0.4096 5.5699 2.8959 0.9808 0.2801
##
##
## [[3]]
##
## Call:
## lm(formula = SBP ~ age + gender + bili + chol + BMI)
##
## Coefficients:
## (Intercept) age genderfemale bili chol BMI
## 93.8626 0.3982 5.4364 3.7986 1.1172 0.2914
##
##
## [[4]]
##
## Call:
## lm(formula = SBP ~ age + gender + bili + chol + BMI)
##
## Coefficients:
## (Intercept) age genderfemale bili chol BMI
## 92.4412 0.4116 5.5242 2.5347 1.1863 0.2862
##
##
## [[5]]
##
## Call:
## lm(formula = SBP ~ age + gender + bili + chol + BMI)
##
## Coefficients:
## (Intercept) age genderfemale bili chol BMI
## 92.5550 0.3970 5.6263 2.8590 1.2476 0.2956
##
## Call:
## lm(formula = SBP ~ age + gender + bili + chol + BMI)
##
## Residuals:
## Min 1Q Median 3Q Max
## 35.939 9.111 0.540 7.751 68.422
##
## Coefficients:
## Estimate Std. Error t value Pr(>t)
## (Intercept) 91.39220 3.20770 28.491 < 2e16 ***
## age 0.40092 0.02739 14.640 < 2e16 ***
## genderfemale 5.28511 0.90512 5.839 7.10e09 ***
## bili 3.40618 1.50234 2.267 0.023589 *
## chol 1.65496 0.42232 3.919 9.51e05 ***
## BMI 0.27148 0.07175 3.784 0.000164 ***
## 
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 13.93 on 994 degrees of freedom
## Multiple Rsquared: 0.2441, Adjusted Rsquared: 0.2403
## Fstatistic: 64.19 on 5 and 994 DF, pvalue: < 2.2e16
mira
objectsAnalyses performed on mids
objects will result in a mira
object (multiply imputed repeated analyses).
mira
object is a list with the following elements:
call :

The call that created the mira object.

call1 :

The call that created the mids object.

nmis :

The number of missing observations per column. 
analyses :

The models fitted on each of the imputed datasets: A list with imp$m components, where each component contains a fitted model object.

The class of each of the components of imp$analyses
will depend on the type of model that was fitted.
If you want to check the results from one particular imputed dataset, e.g., the 3rd, you can access the fitted model via mod1$analyses[[3]]
. Alternatively you can use the function getfit()
.
To pool the results from the repeated analyses contained in a mira
object, the function pool()
can be used.
Pool one of the provided mira
objects (mod1
, mod2
or mod3
) and investigate its structure.
## [1] "mipo" "data.frame"
## [1] "call" "m" "pooled"
## Classes 'mipo' and 'data.frame': 0 obs. of 3 variables:
## $ call : language pool(object = mod1)
## $ m : int 5
## $ pooled:'data.frame': 6 obs. of 9 variables:
## ..$ estimate: num 92.821 0.403 5.488 3.099 1.237 ...
## ..$ ubar : num 1.02e+01 7.53e04 8.26e01 2.24 1.77e01 ...
## ..$ b : num 1.10 4.48e05 1.77e02 2.50e01 6.43e02 ...
## ..$ t : num 1.15e+01 8.06e04 8.47e01 2.54 2.54e01 ...
## ..$ dfcom : int 994 994 994 994 994 994
## ..$ df : num 225.4 456.3 839 216.2 40.8 ...
## ..$ riv : num 0.1298 0.0714 0.0258 0.1338 0.4362 ...
## ..$ lambda : num 0.1149 0.0667 0.0251 0.118 0.3037 ...
## ..$ fmi : num 0.1226 0.0707 0.0274 0.1261 0.3355 ...
mipo
objectsPooling of a mira
object results in a mipo
object (multiply imputed pooled analysis).
mipo
object is a list that has the following elements:
call :

The call that created the mipo object.

m :

The number of imputed datasets. 
pooled :

A data.frame containing pooled estimates and variances.

pooled
has the following columns:
estimate :

Pooled parameter estimates 
ubar :

The withinimputation variance of the estimate

b :

The betweenimputation variance of the estimate

t :

The total variance of the estimate

dfcom :

Degrees of freedom in the complete data 
df :

Degrees of freedom associated with the \(t\)statistics. 
riv :

The relative increase in variance due to the missing values: \((b+b/m)/ubar\) 
lambda :

Proportion of the variation due to the missing values: \((b+b/m)/t\). 
fmi :

Fraction of missing information. 
From the elements contained in the mipo
object, we could calculate the pooled confidence intervals and pvalues by hand, using the function provided in the course notes.
However, the function summary()
applied to the mipo
object will do this for us. This function has four more arguments that may be relevant:
conf.int
: needs to be set to TRUE
in order to to get confidence intervalsconf.level
: the confidence level, by default this is 0.95