In this practical, we will fit multiple linear regression models that include non-linear effects and interactions, and visualize the estimated effects using effect plots.
To focus on the relevant issues with regards to non-linear effects and interactions we will skip some of the model evaluation steps that would of course be required in an analysis performed to answer a real research question. For the same reason, we also do not follow the recommended procedure for model building here.
This practical is intended to be “learning by doing”, i.e., you may not be able to solve all tasks by yourself without the hints or even checking the solution, and you will come across functions that you haven’t seen in the lecture slides.
When a function is new to you, you can find out more about it in the
help file that you can access by typing
?<function name>
.
For several tasks of this practical the given solution is only one of many possible solutions.
There are a few parts indicated as “Advanced Level” containing some more complex solutions (for those of you, for whom working with is a piece of cake).
For this practical we will use the same subset of the NHANES (National Health and Nutrition Examination Survey) data as used in the first two practicals.
To load this dataset into , you can
use the command file.choose()
which opens the explorer and
allows you to navigate to the location of the file on your computer.
If you know the path to the file, you can also use
load("<path>/nhanes.RData")
.
Here a short overview over the data:
## SBP gender age race WC alc educ creat albu uricacid bili occup smoke
## 1 110.6667 male 22 Non-Hispanic White 81.0 <NA> low 0.91 4.8 4.9 0.6 working never
## 2 118.0000 female 44 Non-Hispanic White 80.1 <NA> high 0.89 3.7 4.5 0.3 working never
## 3 124.6667 male 21 other 69.6 <1 low 0.87 4.4 5.4 0.2 not working never
## 4 102.0000 female 43 Non-Hispanic Black 120.4 >=1 low 0.68 4.4 5.0 0.8 not working current
## 5 146.6667 male 51 other 81.1 <NA> low 0.99 4.1 5.0 0.5 looking for work current
## 6 122.0000 male 80 Non-Hispanic White 112.5 <NA> low 1.01 4.2 4.8 0.7 not working never
## 'data.frame': 2765 obs. of 13 variables:
## $ SBP : num 111 118 125 102 147 ...
## $ gender : Factor w/ 2 levels "male","female": 1 2 1 2 1 1 1 2 1 1 ...
## $ age : num 22 44 21 43 51 80 26 30 70 35 ...
## $ race : Factor w/ 5 levels "Mexican American",..: 3 3 5 4 5 3 4 5 4 4 ...
## $ WC : num 81 80.1 69.6 120.4 81.1 ...
## $ alc : Factor w/ 2 levels "<1",">=1": NA NA 1 2 NA NA 2 2 1 NA ...
## $ educ : Factor w/ 2 levels "low","high": 1 2 1 1 1 1 1 2 1 2 ...
## $ creat : num 0.91 0.89 0.87 0.68 0.99 1.01 1.03 0.91 0.8 1.1 ...
## $ albu : num 4.8 3.7 4.4 4.4 4.1 4.2 5 4.8 3.7 4 ...
## $ uricacid: num 4.9 4.5 5.4 5 5 4.8 5.4 6.7 5.4 6.7 ...
## $ bili : num 0.6 0.3 0.2 0.8 0.5 0.7 1.1 0.9 0.9 0.9 ...
## $ occup : Factor w/ 3 levels "working","looking for work",..: 1 1 3 3 2 3 1 1 3 1 ...
## $ smoke : Ord.factor w/ 3 levels "never"<"former"<..: 1 1 1 3 3 1 1 1 2 1 ...
We want to model the effect of age
, gender
,
smoke
, SBP
, albu
and
creat
on bili
.
Fit the corresponding linear regression model.
Based on the distribution of the residuals, do you think this model is appropriate for the data at hand?
The variable smoke
is an ordered factor. You may want to
specify the type of coding used for this variable.
contrasts
argument (see also Task 3
from the Linear
Regression Summary practical).
Fit the linear regression model for bili
:
lm1 <- lm(bili ~ age + gender + smoke + SBP + albu + creat, data = nhanes,
contrasts = list(smoke = "contr.treatment"))
Because smoke
is an ordered factor
would by default use orthogonal
polynomials. Typically, we prefer dummy coding and specify this via the
contrasts
argument here.
To check the distribution of the residuals we can create an QQ-plot and a histogram:
par(mfrow = c(1, 2), mar = c(3, 3, 2, 1), mgp = c(2, 0.6, 0))
car::qqPlot(rstandard(lm1), ylim = c(-3, 15))
hist(rstandard(lm1), breaks = 100)
library()
) but can use
::
to indicate from which package the function is, e.g.,
car::qqPlot()
.
Here we set the limits of the y-axis (ylim
in
qqPlot()
) because by default, the range of the y-axis is
based on the quantiles, but does not include the 95% confidence
envelope, which would then be partially cut off.
The QQ-plot shows that the residuals are not normally distributed. They are right-skewed with some residuals being a lot larger than expected under a standard normal distribution. We can also see this in the histogram.
How can we improve our model to better comply with the assumptions?
Fit the improved model and check if the change did indeed improve the normality of the residuals.
The response variable bili
is right-skewed, and because
the covariates used in lm1
cannot sufficiently explain the
skewness, the residuals are skewed as well.
Transformation of the response may result in more normally distributed residuals.
For variables that are right-skewed, transformation with the logarithm or wit the square root can help to obtain a more symmetric distribution.
To find a suitable transformation, we can plot histograms of
different transformations of bili
. Note that having a
normally distributed response does not guarantee that the residuals are
normally distributed as well, but it many cases the response and
covariates have similar distributions (and it is more convenient to work
with the response because we have it available without having to fit a
model each time).
par(mfrow = c(1, 3))
hist(nhanes$bili, breaks = 100)
hist(log(nhanes$bili), breaks = 100)
hist(sqrt(nhanes$bili), breaks = 100)
The transformation with the natural logarithm gives us a relatively symmetric distribution, so we will work with this transformation.
Re-fit the model with the transformed response:
update()
instead of
re-writing the model because we only need to change one detail of the
model: the response. Therefore, we specify the new model formula via the
argument formula
. The dot after the tilde indicates that we
keep the right hand side of the model formula the same as it was in
lm1
.
Check the distribution of the residuals by re-creating the plots:
The distribution of the residuals is now closer to a normal distribution.
Now that we have transformed the response, how does this change the interpretation of the results?
The estimates for the regression coefficients from lm2
give the expected change in log(bili)
for a 1 unit increase
in the covariate, not the change in bili
.
Visualize the effect of albu
and gender
on
log(bili)
as well as on bili
with an effect
plot (using lm2
). What do the plots show you about these
associations?
bili
you need to transform
the fitted values obtained from the model for log(bili)
back.
Getting a nice-looking plot for gender
is a bit complex
when using the base framework. Try, but don’t worry if you
can’t figure it out.
Did you get the following error message (followed by some warning messages)?
## Error in plot.window(...): need finite 'xlim' values
When creating the plot, checks the
range of the data to determine the range of the x-axis and y-axis for
the plot. To do this, the data is converted to numeric values, and then
the range is determined (using range()
) for the subset of
values that are finite, i.e., range(x[is.finite(x)])
.
Check what type of variable gender
is in the data you
created for prediction and see what happens when you try to convert it
to numeric values.
gender
to a factor.
First, we need to create the data for which we want to calculate the
fitted values. We use a sequence of values for albu
, both
factor levels for gender
, and reference values for all
other covariates (the median or the first category).
# data for visualization of the effect of "albu"
ndf1 <- data.frame(albu = seq(min(nhanes$albu, na.rm = TRUE),
max(nhanes$albu, na.rm = TRUE),
by = 0.05),
age = median(nhanes$age),
gender = "male",
smoke = "never",
SBP = median(nhanes$SBP, na.rm = TRUE),
creat = median(nhanes$creat, na.rm = TRUE))
# data for visualization of the effect of "gender"
ndf2 <- data.frame(albu = median(nhanes$albu, na.rm = TRUE),
age = median(nhanes$age),
gender = levels(nhanes$gender),
smoke = "never",
SBP = median(nhanes$SBP, na.rm = TRUE),
creat = median(nhanes$creat, na.rm = TRUE))
I specified two separate datasets for visualization of the effects of
albu
and gender
since there is no interaction
between the two variables. This means that the association between one
of these variables and the outcome is independent from the value of the
other variable.
If we would specify one dataset in which both albu
and
gender
vary, we’d get a dataset with all the different
values for albu
for males, and then the same for females.
This would allow us to visualize the effects of both albu
and gender
in the same plot. This would be required to
visualize an interaction between the two variables.
In the following, we would then only have to use
predict()
once.
plot()
or
matplot()
) would get a bit more complicated, though.
Then we can calculate the fitted values and corresponding 95%
confidence intervals using the function predict()
:
pred1 <- predict(lm2, newdata = ndf1, interval = "confidence")
pred2 <- predict(lm2, newdata = ndf2, interval = "confidence")
albumin
The fitted values for bili
can be obtained by
transforming the fitted values using exp()
, the inverse of
the natural logarithm.
To compare the fitted values on both scales, i.e.,
log(bili)
and bili
we plot them next to each
other. For the continuous variable albu
, we can use the
function matplot()
which allows us to simultaneously plot
the fitted values and confidence bands.
gender
The function matplot()
does not work well for
categorical variables because the default type of plot for categorical
variables is a bar plot. To get points and error bars we use
plot.default()
.
This may sound confusing, because I just said that the default is a
bar plot, but plot.default()
is specifically plotting
scatter plots. When we call plot()
,
checks of which type the provided
objects are, and then internally chooses whether to use
plot.default()
— for continuous data — or another type of
plot.
When we use plot.default()
(or, in general
plot()
), we need to plot the confidence intervals
separately.
plot.default()
tries to convert the data that are
provided to numeric values. That is not possible for character vectors
(and gender
is a character vector in ndf
), but
it is possible for factors. We therefore convert gender
to
a factor.
par(mfrow = c(1, 2))
plot.default(factor(ndf2$gender), pred2[, "fit"],
ylim = range(pred2), xlim = c(0, 3), xaxt = "n",
xlab = "gender", ylab = "expected value of log(bili)")
axis(side = 1, at = seq_along(ndf2$gender), labels = levels(factor(ndf2$gender)))
arrows(x0 = as.numeric(factor(ndf2$gender)),
x1 = as.numeric(factor(ndf2$gender)),
y0 = pred2[, "lwr"],
y1 = pred2[, "upr"],
code = 3, angle = 90, length = 0.1)
plot.default(factor(ndf2$gender), exp(pred2[, "fit"]),
ylim = range(exp(pred2)), xlim = c(0, 3), xaxt = "n",
xlab = "gender", ylab = "expected value of bili")
axis(side = 1, at = seq_along(ndf2$gender), labels = levels(factor(ndf2$gender)))
arrows(x0 = as.numeric(factor(ndf2$gender)),
x1 = as.numeric(factor(ndf2$gender)),
y0 = exp(pred2[, "lwr"]),
y1 = exp(pred2[, "upr"]),
code = 3, angle = 90, length = 0.1)
To get a nicely labelled x-axis even though we are now plotting
gender
as numeric values, i.e., 1
and
2
instead of female
and male
, we
can create the x-axis ourselves.
First, we omit the x-axis from the plot by setting the x-axis type to
“none” (xaxt = "n"
). Then we create the axis using
axis()
. The argument side
specifies on which
side of the plot the axis should be printed (1
is the
bottom, 2
would be left, 3
the top and
4
the right side).
at
specifies the location on the axis at which we want the
axis to have values and tick marks. seq_along(ndf2$gender)
creates a sequence of integers that “counts” the elements in
ndf2$gender
, i.e, it will return the vector
c(1, 2)
. With labels
we can change what is
printed at the tick mark locations on the axis.
arrows()
. x0
,
x1
, y0
and y1
set the x- and
y-coordinates of the arrows to be drawn, code
specifies the
type of arrow (at which end the arrow head should be drawn, where
3
refers to both ends). The arguments angel
and length
determine how the arrow head looks like. Since
we do not actually want an arrow, but just a horizontal line, we set
angle = 90
(degrees). length
specifies how
wide the horizontal lines at the ends of the error bars are.
The plots demonstrate that transformation of the response introduces
a non-linear association between the covariates and the response on the
original scale. The effect plot for log(bili)
and
albu
shows a straight line, because the model assumed a
linear effect of albu
, so that a step of size 1 on the
x-axis will always result in the same change on the y-axis.
But this is not the case for the plot with bili
on the
y-axis. The higher the value of albu
, the larger the effect
of a 1 unit increase on bili
.
In the plots for the effect of gender
we do not see
large differences with respect to the differences in the expected
values. But plotting the exp()
(or another transformation)
of the expected value will lead to confidence intervals that are no
longer symmetric around the expected value. In this particular example,
however, this is not visible due to the small expected values.
As we have seen, transformation of the response introduces non-linear associations between the response and the covariates. It may be, however, that the shape of the association that is introduced is not appropriate. Either, because a covariate has a linear effect with the response on its original scale, or because the true association has a different, maybe more complex shape.
We want to explore the shapes of the associations for the continuous
covariates in lm2
.
Explore the distributions of the continuous covariates used in
lm2
to decide if the default settings
uses for splines are appropriate or
not.
Then update lm2
to model the associations with the
continuous covariates as non-linear using natural cubic splines with 3
degrees of freedom.
ns()
. You
need to load the splines package. See also the slides
on non-linear effects
First, we explore the distributions of the three continuous
covariates age
, albu
and
bili
:
par(mfrow = c(4, 1), mgp = c(2, 0.6, 0), mar = c(3, 3, 2, 1))
hist(nhanes$age, breaks = 200)
hist(nhanes$SBP, breaks = 200)
hist(nhanes$albu, breaks = 200)
hist(nhanes$creat, breaks = 200)
breaks
in the histograms.
The variable age
is relatively uniformly distributed.
albu
is symmetric but has a few observations in the tails
of the distribution and SBP
and creat
are
skewed with some observations that are much larger than the rest.
The default behaviour for ns()
is that the
boundary knots are placed at the minimum and maximum
value of the covariate.
For SBP
, albu
and creat
, the
minima and/or maxima are outside the area where we have many
observations. We should, therefore, change the boundary knots so that
they are not at the minimum and maximum of the distribution, but exclude
the few more extreme values.
We can indicate the positions of certain quantiles using vertical lines. This can help us to find where to place the boundary knots.
par(mfrow = c(4, 1), mgp = c(2, 0.6, 0), mar = c(3, 3, 2, 1))
hist(lm2$model$age, breaks = 200)
hist(lm2$model$SBP, breaks = 200)
abline(v = quantile(nhanes$SBP, c(0.01, 0.99), na.rm = TRUE),
lty = 2, col = "red")
hist(lm2$model$albu, breaks = 200)
abline(v = quantile(nhanes$albu, c(0.01, 0.99), na.rm = TRUE),
lty = 2, col = "red")
hist(lm2$model$creat, breaks = 200)
abline(v = quantile(lm2$model$creat, c(0, 0.975), na.rm = TRUE),
lty = 2, col = "red")
The positions of the quantiles shown by the vertical lines were chosen based on trial-and-error and visual inspection.
Here, I do not plot the covariates from the original data
(nhanes
), but use the element model
from the
fitted model object lm2
. The reason is that we have some
missing values in the nhanes
data and the
data.frame
returned by lm2$model
only contains
the observations that were actually used in the model. If there are
extreme observations in the original data which are excluded due to a
missing value in another variable, these observations are not relevant
for us here.
In the specification of the possible locations for the boundary
knots, indicated by the red vertical lines, we use the quantiles of the
original data (nhanes
) because when we specify the boundary
knots using quantiles()
in ns()
,
calculates the quantiles in the input
data, which is nhanes
.
In many applications it may not matter too much if we use the original data or the complete cases to investigate the locations of the boundary knots. But when there are many missing values, the complete cases may differ quite a bit from the full data and hence the chosen positions for the knots are not as appropriate as we may think they are.
Because the variables (might) contain missing values, we specify the
argument na.rm = TRUE
in quantile()
so that
missing values are excluded when calculating the quantiles. Without this
argument, quantile()
would return NA
if there
are any values missing.
We can now update our model using ns()
. Because we want
to replace the linear effects of age
, albu
and
creat
with the spline specification we need to remove them
from the formula. Otherwise we would have both the linear effect and the
spline in the model.
library("splines")
lm3 <- update(lm2,
formula = .~ . - age - SBP - creat - albu +
ns(age, df = 3) +
ns(SBP, df = 3, B = quantile(SBP, c(0.01, 0.99), na.rm = TRUE)) +
ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE)) +
ns(creat, df = 3, B = quantile(creat, c(0, 0.975), na.rm = TRUE))
)
Although, with such extensive changes in the model we can also
consider to just write the new model from scratch and not use
update()
.
Boundary.knots
with just B
. This can be convenient because the variable
names in the output can get really long when using splines so that the
output is broken into multiply lines and hard to read.
The resulting model now has the formula
## log(bili) ~ gender + smoke + ns(age, df = 3) + ns(SBP, df = 3,
## B = quantile(SBP, c(0.01, 0.99), na.rm = TRUE)) + ns(albu,
## df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE)) +
## ns(creat, df = 3, B = quantile(creat, c(0, 0.975), na.rm = TRUE))
We want to explore the estimated shapes of the associations of the continuous covariates.
We could do this by creating all the effect plots ourselves, but we can also use existing packages for this. We work with these packages here because they are convenient alternatives to doing everything by hand, but for the purpose of this course, you don’t need to become an expert in the use of these packages.
install.packages()
.packageVersion("<package name>")
which returns the
version number of the package if it is available and give an error if
the package is not available.
Read the help pages for these functions (and take a look at the Examples section in the help file!!!) to find out how to use these functions and what exactly they calculate. Do they all show the same thing?
Create effect plots using the functions
visreg()
from the package visreg,predictorEffects()
from the package
effects (and plot()
), andggpredict()
from the package ggeffects
(and plot()
).ggplot
plots into one figure you can
use the function ggpubr::ggarrange()
. You may need to
install the ggpubr package first.
visreg::visreg
The visreg package works with the
Trellis graphics framework. To show multiple plots in
the same figure, we can use the general graphics settings
(par()
) that we’ve used before.
The function visreg
has several additional arguments.
One of these arguments is partial
, which allows us to
choose whether we want to display the partial residuals or not.
Showing the partial residuals can help to interpret the non-linearity of the fit in the context of the remaining uncertainty.
Using the argument trans
we could transform the values
on the y-axis. This means, we can also show the expected values of
bili
instead of log(bili)
.
To explore whether we need the spline for a continuous covariate we
need to explore the plots with the y-axis using the scale the model was
fitted with. In our case, that is log(bili)
. If the effect
plot shows a relatively straight line (or rather a confidence band
through which we could draw a straight line) then this indicates that
there is no evidence for a non-linear association.
Since a linear effect on the transformed response corresponds to a curve when we plot the expected response on the original scale of the data, then it is not easily possible to identify whether the spline is needed or not.
The effect plot for lm3
shows that the effects of
SBP
and age
are relatively linear, and the
effects for albu
and creat
show some more
curvature. Because creat
is very skewed there is very
little information for large values of creat
and we cannot
see much of the details for the part of creat
that has the
most observations.
With the argument xvar
we can specify a single variable
for which the effect should be visualized.
We can create a single plot for creat
in which we adjust
the range of the x-axis:
In the explanation for the argument type
in the
Arguments section of the help
?visreg::visreg
we can read that when
type = "conditional"
the plot returned shows the value of the variable on the x-axis and the change in response on the y-axis, holding all other variables constant (by default, median for numeric variables and most common category for factors).
In the plots we obtained from visreg, the chosen “reference” values were, thus, the median and largest category.
To get more information about the visreg package, see
?visreg::`visreg-faq`
effects::predictorEffects
The function predictorEffects
from the package
effects calculates the effects, but does not
immediately plot them.
## [1] "gender" "smoke" "age" "SBP" "albu" "creat"
##
## gender effect
## gender
## male female
## -0.3094679 -0.4475585
##
## Lower 95 Percent Confidence Limits
## gender
## male female
## -0.3366397 -0.4812631
##
## Upper 95 Percent Confidence Limits
## gender
## male female
## -0.2822960 -0.4138538
We can obtain plots by using plot()
on the resulting
object:
When we compare the plots with the plots we obtained from the visreg package, they look very similar, however, the fitted values are different, because other “reference” values are used.
effects uses “average” values for the covariates not shown in a particular plot. This is the mean of the observed data for continuous variables and a weighted average of the categories (as numeric values), weighted by the frequency each category is observed in the data.
For more information on the options in the effects package, see the vignette.
ggeffects::ggpredict
As the name already indicates, ggeffects works with ggplot2, i.e., uses a different graphics framework than the other two functions.
The function ggpredict()
calculates the fitted values
and corresponding 95% confidence intervals
## $gender
## # Predicted values of bili
##
## gender | Predicted | 95% CI
## -----------------------------------
## male | -0.28 | [-0.31, -0.25]
## female | -0.42 | [-0.45, -0.38]
##
## Adjusted for:
## * smoke = never
## * age = 48.79
## * SBP = 122.71
## * albu = 4.26
## * creat = 0.90
##
## $smoke
## # Predicted values of bili
##
## smoke | Predicted | 95% CI
## ------------------------------------
## never | -0.28 | [-0.31, -0.25]
## former | -0.31 | [-0.35, -0.28]
## current | -0.39 | [-0.42, -0.35]
##
## Adjusted for:
## * gender = male
## * age = 48.79
## * SBP = 122.71
## * albu = 4.26
## * creat = 0.90
##
## $age
## # Predicted values of bili
##
## age | Predicted | 95% CI
## --------------------------------
## 20 | -0.33 | [-0.38, -0.27]
## 27 | -0.31 | [-0.34, -0.27]
## 35 | -0.29 | [-0.33, -0.26]
## 43 | -0.28 | [-0.32, -0.25]
## 50 | -0.28 | [-0.31, -0.25]
## 57 | -0.27 | [-0.31, -0.24]
## 65 | -0.25 | [-0.28, -0.21]
## 80 | -0.17 | [-0.23, -0.12]
##
## Adjusted for:
## * gender = male
## * smoke = never
## * SBP = 122.71
## * albu = 4.26
## * creat = 0.90
##
## Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show all rows.
##
## $SBP
## # Predicted values of bili
##
## SBP | Predicted | 95% CI
## -----------------------------------
## 81.33 | -0.20 | [-0.31, -0.10]
## 96.67 | -0.23 | [-0.28, -0.18]
## 110.00 | -0.25 | [-0.29, -0.22]
## 122.67 | -0.28 | [-0.31, -0.25]
## 134.00 | -0.30 | [-0.34, -0.27]
## 147.33 | -0.31 | [-0.35, -0.27]
## 160.00 | -0.30 | [-0.35, -0.25]
## 234.67 | -0.20 | [-0.40, 0.00]
##
## Adjusted for:
## * gender = male
## * smoke = never
## * age = 48.79
## * albu = 4.26
## * creat = 0.90
##
## Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show all rows.
##
## $albu
## # Predicted values of bili
##
## albu | Predicted | 95% CI
## ---------------------------------
## 2.60 | -0.45 | [-0.63, -0.26]
## 3.10 | -0.43 | [-0.54, -0.32]
## 3.40 | -0.42 | [-0.49, -0.35]
## 3.70 | -0.41 | [-0.45, -0.37]
## 4.10 | -0.34 | [-0.37, -0.30]
## 4.40 | -0.23 | [-0.26, -0.20]
## 4.70 | -0.14 | [-0.18, -0.11]
## 5.40 | -0.01 | [-0.14, 0.12]
##
## Adjusted for:
## * gender = male
## * smoke = never
## * age = 48.79
## * SBP = 122.71
## * creat = 0.90
##
## Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show all rows.
##
## $creat
## # Predicted values of bili
##
## creat | Predicted | 95% CI
## ----------------------------------
## 0.34 | -0.48 | [-0.60, -0.36]
## 0.59 | -0.34 | [-0.39, -0.30]
## 0.79 | -0.29 | [-0.32, -0.25]
## 0.98 | -0.28 | [-0.31, -0.25]
## 1.17 | -0.28 | [-0.32, -0.25]
## 1.37 | -0.29 | [-0.33, -0.25]
## 1.63 | -0.29 | [-0.33, -0.25]
## 9.51 | -0.43 | [-0.76, -0.11]
##
## Adjusted for:
## * gender = male
## * smoke = never
## * age = 48.79
## * SBP = 122.71
## * albu = 4.26
##
## Not all rows are shown in the ouput. Use `print(..., n = Inf)` to show all rows.
##
## attr(,"class")
## [1] "ggalleffects" "list"
## attr(,"model.name")
## [1] "lm3"
By default, ggpredict
will transform the results back to
the original scale of the response. We change this behaviour here using
the argument back.transform
.
The argument typical
controls how the “reference” values
for the covariates not of interest are set. The default is that the mean
is used for continuous covariates.
To create a plot from the calculated fitted values we can use the
function plot()
. This results in a list
of
ggplot
objects and each plot is shown separately. You can
click through them by using the arrow buttons in the Plot tab
in RStudio.
To show all plots in one figure, we can, for instance, use the
function ggarrange()
from the ggpubr
package:
Even though we set back.transform = "FALSE"
, which means
that we get the fitted values for log(bili)
, the y-axis is
(wrongly) labelled bili
.
The ggeffects package has another function,
ggeffect()
, which does something similar to what
ggpredict()
does. In the help file we can read that
ggpredict()
usespredict()
for generating predictions, whileggeffect()
computes marginal effects by internally callingeffects::Effect()
…
More information on the ggeffects package can be found on the package website https://strengejacke.github.io/ggeffects/.
The four functions from the three packages thus do not show us the same thing.
This is because they use different reference values:package | categorical | continuous |
---|---|---|
visreg::visreg
|
largest | median |
effects::predictorEffects
|
weighted average | mean |
ggeffects::ggpredict
|
reference category | mean |
ggeffects::ggeffect
|
weighted average | mean |
For variables that do not have an interaction with other variables this results only in a change of the absolute value (they are shifted along the y-axis).
We can see this, for instance, in the results for the covariate
gender
:
## gender smoke age SBP albu creat bili visregFit visregLwr visregUpr
## 1 male never 48 120 4.3 0.84 0.7 -0.2594933 -0.2933712 -0.2256154
## 2 female never 48 120 4.3 0.84 0.7 -0.3975839 -0.4316886 -0.3634792
##
## gender predictor effect
##
## gender effect
## gender
## male female
## -0.3094679 -0.4475585
## # Predicted values of bili
##
## gender | Predicted | 95% CI
## -----------------------------------
## male | -0.28 | [-0.31, -0.25]
## female | -0.42 | [-0.45, -0.38]
##
## Adjusted for:
## * smoke = never
## * age = 48.79
## * SBP = 122.71
## * albu = 4.26
## * creat = 0.90
## # Predicted values of bili
##
## gender | Predicted | 95% CI
## -----------------------------------
## male | -0.31 | [-0.34, -0.28]
## female | -0.45 | [-0.48, -0.41]
The difference in expected value between different values of the covariate remains the same with all four functions:
In the effect plots from the previous section we saw that
age
and SBP
seemed to have quite linear
effects.
To simplify the model a bit, we model these two variables using linear effects but will now introduce interactions into the model.
log(bili)
, with
natural cubic splines for creat
and albu
(both
with 3 degrees of freedom), linear effects for age
and
SBP
, gender
, smoke
, and the
interactions between gender
and age
.age
and
gender
.lm4 <- lm(log(bili) ~ SBP + smoke +
gender * age +
ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE)) +
ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE)),
data = nhanes, contrasts = list(smoke = "contr.treatment"))
##
## Call:
## lm(formula = log(bili) ~ SBP + smoke + gender * age + ns(creat,
## df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE)) +
## ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE)),
## data = nhanes, contrasts = list(smoke = "contr.treatment"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.79300 -0.20056 0.00025 0.19427 1.92024
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4635453 0.0699667 -6.625 4.30e-11 ***
## SBP -0.0010210 0.0004285 -2.383 0.01726 *
## smokeformer -0.0295654 0.0177632 -1.664 0.09616 .
## smokecurrent -0.1081718 0.0178814 -6.049 1.69e-09 ***
## genderfemale -0.1459250 0.0445310 -3.277 0.00106 **
## age 0.0018190 0.0006141 2.962 0.00309 **
## ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))1 0.0654868 0.0264376 2.477 0.01332 *
## ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))2 0.1770324 0.0544186 3.253 0.00116 **
## ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))3 0.0488002 0.0247937 1.968 0.04916 *
## ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))1 0.2410533 0.0294314 8.190 4.26e-16 ***
## ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))2 0.3129724 0.0715270 4.376 1.26e-05 ***
## ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))3 0.3323907 0.0350030 9.496 < 2e-16 ***
## genderfemale:age 0.0002956 0.0008007 0.369 0.71207
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3247 on 2309 degrees of freedom
## (443 observations deleted due to missingness)
## Multiple R-squared: 0.1552, Adjusted R-squared: 0.1508
## F-statistic: 35.34 on 12 and 2309 DF, p-value: < 2.2e-16
The relevant subset of the results table is
## Estimate Std. Error t value Pr(>|t|)
## genderfemale -0.145925042 0.0445309694 -3.2769339 0.001064956
## age 0.001818959 0.0006141296 2.9618494 0.003089180
## genderfemale:age 0.000295560 0.0008007182 0.3691187 0.712073122
For males, a one unit increase in age
is associated with
a 0.002 higher log(bili)
. There was no evidence that the
effect of age
on log(bili)
differs for males
and females (and there is no evidence that the effect of
gender
changes with age
). At zero years of
age, females have a -0.146 “higher” log(bili)
than
males.
Visualize the effects of gender
and age
using an effect plot.
data.frame
objects does not
work here.
Because there is an interaction between gender
and
age
, we need to create data that has all the combinations
of the values for the two variables.
ndf <- with(nhanes,
expand.grid(
age = seq(min(age), max(age), length = 50),
gender = levels(gender),
smoke = "never",
SBP = median(SBP, na.rm = TRUE),
creat = median(creat, na.rm = TRUE),
albu = median(albu, na.rm = TRUE)
)
)
To avoid having to write nhanes$
for each covariate we
can use the function with().
It evaluates an expression
(i.e., the expand.grid(...)
in our case) in an environment
constructed from data, i.e., nhanes
here.
nhanes$...
that is perfectly fine as well.
We can then calculate the fitted values and add them as covariates to the dataset;
In the following, I show multiple different solutions how to obtain the effect plots.
plot()
or matplot()
Because the fitted values for males and females are in the same
column of ndf
when we plot them as lines (either in
matplot()
or with plot()
and
lines()
) we will get a zigzag pattern:
To avoid this, we could either plot it with dots instead of lines
(but with matplot()
it is not possible to use different
colour for males and females). With plot()
this is
possible, but we need to plot the fitted values, lower and upper bound
of the confidence interval separately:
par(mfrow = c(1, 2))
matplot(x = ndf$age, y = ndf[, c("fit", "lwr", "upr")],
type = "p", pch = c(19, 1, 1), col = "black")
plot(fit ~ age, data = ndf, ylim = range(lwr, upr), col = gender, pch = 19)
points(lwr ~ age, data = ndf, col = gender)
points(upr ~ age, data = ndf, col = gender)
Alternatively, we could convert the data to have the values for males and females in separate columns:
ndf_wide <- reshape(ndf, direction = "wide",
v.names = c("fit", "lwr", "upr"),
idvar = "age", timevar = "gender")
head(ndf_wide)
## age smoke SBP creat albu fit.male lwr.male upr.male fit.female lwr.female upr.female
## 1 18.00000 never 120 0.84 4.3 -0.3029198 -0.3481019 -0.2577378 -0.4435248 -0.4924835 -0.3945661
## 2 19.26531 never 120 0.84 4.3 -0.3006183 -0.3446811 -0.2565555 -0.4408493 -0.4885696 -0.3931290
## 3 20.53061 never 120 0.84 4.3 -0.2983168 -0.3412853 -0.2553483 -0.4381738 -0.4846732 -0.3916744
## 4 21.79592 never 120 0.84 4.3 -0.2960152 -0.3379163 -0.2541142 -0.4354983 -0.4807956 -0.3902009
## 5 23.06122 never 120 0.84 4.3 -0.2937137 -0.3345762 -0.2528512 -0.4328227 -0.4769385 -0.3887070
## 6 24.32653 never 120 0.84 4.3 -0.2914121 -0.3312674 -0.2515569 -0.4301472 -0.4731035 -0.3871909
So that we could then use matplot()
:
matplot(x = ndf_wide$age,
y = ndf_wide[, grep("male|female", names(ndf_wide))],
xlab = "age", ylab = "expected value of log(bili)",
type = "l", lty = c(1, 2, 2), col = rep(c("blue", "red"), each = 3))
With the data in wide format, we could also use plot()
and create each line separately with the help of
lines()
.
When we use ggplot2 we can plot the data in the original (long) format:
The ggeffects package can help us to create such a plot without the need to create the data ourselves:
## Model has log-transformed response. Predictions are on log-scale.
In the interpretation of the model, we had to refer to the effect of
gender
“at zero years of age”, which has no meaningful
interpretation since we only have adults in the data.
Because there seems to be no interaction between gender
and age
(the effect plots showed lines that are virtually
parallel, i.e., the estimated interaction is so small that it is not
clinically relevant) the “at zero years of age” isn’t that relevant
here.
But in other examples, it may be relevant to be able to say what the expected difference in the response is for certain covariate values.
Calculate the difference in the expected response between females and
males for the minimum, median and maximum age
observed in
the data.
There is two ways we can calculate this difference.
We could predict()
the fitted values at the desired ages
and take their difference.
To do this, we create a new data.frame
for prediction in
which we set age to the minimum, median and maximum of the
age
in the nhanes
data and then calculate
fitted values with predict()
:
ndf <- with(nhanes,
expand.grid(
age = c(min(age), median(age), max(age)),
gender = levels(gender),
smoke = "never",
SBP = median(SBP, na.rm = TRUE),
creat = median(creat, na.rm = TRUE),
albu = median(albu, na.rm = TRUE)
)
)
ndf$fit <- predict(lm4, newdata = ndf)
## age gender smoke SBP creat albu fit
## 1 18 male never 120 0.84 4.3 -0.3029198
## 2 47 male never 120 0.84 4.3 -0.2501700
## 3 80 male never 120 0.84 4.3 -0.1901444
## 4 18 female never 120 0.84 4.3 -0.4435248
## 5 47 female never 120 0.84 4.3 -0.3822037
## 6 80 female never 120 0.84 4.3 -0.3124246
To be able to calculate the differences it is convenient to convert
ndf
so that the fitted values for males and females are
given in separate columns. We can do this using the function
reshape
:
ndf_wide <- reshape(ndf, direction = "wide", v.names = "fit",
idvar = "age", timevar = "gender")
ndf_wide
## age smoke SBP creat albu fit.male fit.female
## 1 18 never 120 0.84 4.3 -0.3029198 -0.4435248
## 2 47 never 120 0.84 4.3 -0.2501700 -0.3822037
## 3 80 never 120 0.84 4.3 -0.1901444 -0.3124246
We can now take the difference between the two columns:
## [1] -0.1406050 -0.1320337 -0.1222802
We can also calculate the difference directly from the parameter estimates.
The relevant part of the model summary was:
## Estimate Std. Error t value Pr(>|t|)
## genderfemale -0.145925042 0.0445309694 -3.2769339 0.001064956
## age 0.001818959 0.0006141296 2.9618494 0.003089180
## genderfemale:age 0.000295560 0.0008007182 0.3691187 0.712073122
The difference is the effect of gender
plus the
age
times the interaction term:
Instead of the interaction between gender
and
age
, re-fit the model with interactions between
gender
and the spline for creat
, and between
gender
and the spline for albu
.
Create an effect plot to help you with the interpretation.
The new model:
lm5 <- lm(log(bili) ~ SBP + smoke + age + gender *
(ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE)) +
ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))),
data = nhanes, contrasts = list(smoke = "contr.treatment"))
##
## Call:
## lm(formula = log(bili) ~ SBP + smoke + age + gender * (ns(creat,
## df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE)) +
## ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))),
## data = nhanes, contrasts = list(smoke = "contr.treatment"))
##
## Residuals:
## Min 1Q Median 3Q Max
## -1.81817 -0.20039 -0.00437 0.19154 1.94836
##
## Coefficients:
## Estimate Std. Error t value Pr(>|t|)
## (Intercept) -0.4957812 0.1458443 -3.399 0.000687 ***
## SBP -0.0009418 0.0004289 -2.196 0.028213 *
## smokeformer -0.0295538 0.0176337 -1.676 0.093876 .
## smokecurrent -0.1059635 0.0179034 -5.919 3.73e-09 ***
## age 0.0020465 0.0004773 4.288 1.88e-05 ***
## genderfemale -0.1425123 0.1404511 -1.015 0.310367
## ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))1 0.1554122 0.0634964 2.448 0.014456 *
## ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))2 0.3371666 0.2162565 1.559 0.119109
## ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))3 0.1217016 0.0515367 2.361 0.018286 *
## ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))1 0.1971162 0.0508439 3.877 0.000109 ***
## ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))2 0.1593285 0.1648436 0.967 0.333874
## ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))3 0.3071743 0.0488327 6.290 3.78e-10 ***
## genderfemale:ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))1 -0.1822389 0.0747034 -2.439 0.014783 *
## genderfemale:ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))2 -0.1813105 0.2240910 -0.809 0.418545
## genderfemale:ns(creat, df = 3, B = quantile(creat, c(0.025, 0.975), na.rm = TRUE))3 -0.1410623 0.0631180 -2.235 0.025520 *
## genderfemale:ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))1 0.0560519 0.0649681 0.863 0.388359
## genderfemale:ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))2 0.1876493 0.1850707 1.014 0.310721
## genderfemale:ns(albu, df = 3, B = quantile(albu, c(0.01, 0.99), na.rm = TRUE))3 0.0350688 0.0776279 0.452 0.651487
## ---
## Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
##
## Residual standard error: 0.3242 on 2304 degrees of freedom
## (443 observations deleted due to missingness)
## Multiple R-squared: 0.1596, Adjusted R-squared: 0.1534
## F-statistic: 25.74 on 17 and 2304 DF, p-value: < 2.2e-16
We can use, for example, ggpredict()
to create the
effect plot:
## Model has log-transformed response. Predictions are on log-scale.
We notice that for males the confidence band is extremely wide for
large values of creat
, and for males also for very small
values.
The crossing lines indicate that there is a relevant difference in
the effect of creat
for males and females. Or the other way
around: that there is a difference in the effect of gender
across values of creat
.
## Model has log-transformed response. Predictions are on log-scale.
For albu
, the confidence intervals are also getting
wider at the edges of the plot, but it is not as extreme as for
creat
. We see that the curves for males and females run
relatively parallel, indicating that there is no (strong)
interaction.
Explore why we get such a wide confidence band for small values of
creat
for males but not for females (and generally for
large values of creat
).
Find out for which range of creat
we have enough
information to get a meaningful result/effect plot.
creat
for males and females
separately.
Apparently, there is a difference in the data between males and
females. So far, we only explored the distribution of creat
overall to find out if and where to set the boundary knots, but not per
category of gender
.
par(mfrow = c(1, 2), mar = c(3, 3, 2, 1), mgp = c(2, 0.6, 0))
hist(lm2$model$creat[lm2$model$gender == "male"], breaks = 200,
main = "males", xlab = "creat")
hist(lm2$model$creat[lm2$model$gender == "female"], breaks = 200,
main = "females", xlab = "creat")
There are not many observations larger than 2 (but we already knew
that) and because the x-axis also includes large values, we can’t see
well what is going on for the smaller values of creat
.
To see this better, we can restrict the range of the x-axis by
setting xlim = c(0, 2)
:
par(mfrow = c(1, 2), mar = c(3, 3, 2, 1), mgp = c(2, 0.6, 0))
hist(lm2$model$creat[lm2$model$gender == "male"], breaks = 200,
main = "males", xlab = "creat", xlim = c(0, 2))
hist(lm2$model$creat[lm2$model$gender == "female"], breaks = 200,
main = "females", xlab = "creat", xlim = c(0, 2))
In this plot we see that males do not have any values of
creat < 0.5
, but females have.
The wide confidence intervals in the effect plots are due to
insufficient information for the smallest and highest values of
creat
. We could restrict the effect plots to areas where we
have at least a certain number of observations (within a relatively
small interval), for example, 5. (I’d say this is the absolute minimum
required, in other applications you might also need more observations to
get a reasonable precise fitted value.)
Using the histogram, we can guess for which range of
creat
we have at least 5 observations per interval. This is
easier when we add horizontal lines at the value 5:
par(mfrow = c(1, 2), mar = c(3, 3, 2, 1), mgp = c(2, 0.6, 0))
hist(lm2$model$creat[lm2$model$gender == "male"], breaks = 200,
main = "males", xlab = "creat", xlim = c(0, 2))
abline(h = 5, lty = 2)
hist(lm2$model$creat[lm2$model$gender == "female"], breaks = 200,
main = "females", xlab = "creat", xlim = c(0, 2))
abline(h = 5, lty = 2)
For males, we could use the range from 0.6 to 1.5, and for females the range from 0.4 until 1.3.
We can also extract the information which of the intervals contain at least 5 subjects from the histogram.
For the purpose of visualization we can add a horizontal line at 5,
and to be able to use the information from the histograms we assign them
names (hm
and hf
), so that we can then work
with these objects.
par(mfrow = c(1, 2), mar = c(3, 3, 2, 1), mgp = c(2, 0.6, 0))
hm <- hist(lm2$model$creat[lm2$model$gender == "male"], breaks = 200,
main = "males", xlab = "creat", xlim = c(0, 2))
abline(h = 5, lty = 2)
hf <- hist(lm2$model$creat[lm2$model$gender == "female"], breaks = 200,
main = "females", xlab = "creat", xlim = c(0, 2))
abline(h = 5, lty = 2)
The histograms return a list
with (among others)
elements mids
(the mid points of the intervals) and
counts
(the number of observations per interval). We can
use this to find suitable ranges for creat
for males and
females.
Here, we take the range of mid points for intervals that contain at least 10 observations. This is, of course, an arbitrary choice, and it depends on the number of intervals in our data whether this choice makes sense or not.
## [1] 0.625 1.375
## [1] 0.475 1.125
Create an effect plot for creat
and gender
that uses a more appropriate (and gender specific) range of values of
creat
for males and females.
Create this plot yourself, i.e., not using visreg, ggeffects or the effects package.
Using the bounds determined in the previous step, we can create data for calculation of the fitted values. We could either create separate datasets for males and females and then combine them, or create a combined dataset and then exclude unwanted rows:
ndf <- with(nhanes,
expand.grid(
age = median(age, na.rm = TRUE),
gender = levels(gender),
smoke = "never",
SBP = median(SBP, na.rm = TRUE),
creat = round(seq(0.45, 1.4, by = 0.05), 2),
albu = median(albu, na.rm = TRUE)
)
)
ndf <- subset(ndf, (creat >= 0.6 | gender == "female") &
(creat <= 1.15 | gender == "male"))
Here we created ndf
with the larger range of values (the
lowest and highest value to be used for either males or females) and now
have to exclude certain rows to prevent their data from being
plotted.
To exclude the values at the lower end of the range for males, we can
use the condition that creat
should be at least 0.6 OR that
gender
should be “female”, i.e., we exclude rows with
values lower than 0.6 for males.
creat
for females, where we use the restriction that either
creat
should be smaller or equal to 1.15 OR
gender
should be “male”.
Then we combine the data with the fitted values:
And can now plot them:
ggplot(ndf, aes(x = creat, y = fit, color = gender, fill = gender)) +
geom_line() +
geom_ribbon(aes(ymin = lwr, ymax = upr),
alpha = 0.2, color = NA) +
scale_color_manual(values = c("blue", "red"),
aesthetics = c("color", "fill"))
Restricting what is shown in the plot to the areas with sufficient data can help to prevent confusion about the interpretation.
We could, of course, also create the plot using plot()
or matplot()
as we did before, but the
ggplot2 solution is easier here.
Describe the effect plot.
Give the estimated values of the fitted values and confidence
intervals for relevant values of creat
in this
description.
bili
rather than the
expected values of log(bili)
.
To get an interpretation of the plot that is more meaningful from a
clinical point of view, we should use the plot for bili
rather than log(bili)
.
The plot for log(bili)
should be used to investigate
whether there is a need for splines, but in our example here, there is
very little difference between the two plots. This is because the fitted
values are relatively small and not widely spread.
The y-axis on the plot ranges from approximately -0.6 to -0.2. The
exp()
transformation for these values shows an almost
linear curve:
plot(seq(-0.6, -0.2, 0.01), exp(seq(-0.6, -0.2, 0.01)))
lines(x = c(-0.6, -0.2), y = exp(c(-0.6, -0.2)), lty = 2)
This would be different if our fitted values were 10 times as large:
In that case we would see much more differences between the plots for the response on the original scale and the log-transformed response.ggpubr::ggarrange(
ggplot(ndf, aes(x = creat, y = fit, color = gender, fill = gender)) +
geom_line() +
geom_ribbon(aes(ymin = lwr, ymax = upr), alpha = 0.2, color = NA) +
scale_color_manual(values = c("blue", "red"),
aesthetics = c("color", "fill")) +
ylab("expected value of log(bili)"),
ggplot(ndf, aes(x = creat, y = exp(fit), color = gender, fill = gender)) +
geom_line() +
geom_ribbon(aes(ymin = exp(lwr), ymax = exp(upr)), alpha = 0.2, color = NA) +
scale_color_manual(values = c("blue", "red"),
aesthetics = c("color", "fill")) +
ylab("expected value of bili"),
common.legend = TRUE
)
ggarrange()
from the package
ggpubr allows us to plot multiple
ggplot2 objects in a grid, i.e., what
par(mfrow(1, 2))
would do for base R graphics.
Some values we could use in the description are:
ndf$fit_exp <- exp(ndf$fit)
ndf$lwr_exp <- exp(ndf$lwr)
ndf$upr_exp <- exp(ndf$upr)
subset(ndf, gender == "female" & creat %in% c(0.55, 0.75, 1.15))
## age gender smoke SBP creat albu fit lwr upr fit_exp lwr_exp upr_exp
## 6 47 female never 120 0.55 4.3 -0.4559922 -0.4976694 -0.4143151 0.6338188 0.6079459 0.6607927
## 14 47 female never 120 0.75 4.3 -0.3758135 -0.4086156 -0.3430115 0.6867304 0.6645697 0.7096300
## 30 47 female never 120 1.15 4.3 -0.4397020 -0.5005517 -0.3788523 0.6442284 0.6061961 0.6846467
## age gender smoke SBP creat albu fit lwr upr fit_exp lwr_exp upr_exp
## 7 47 male never 120 0.6 4.3 -0.3752777 -0.4976481 -0.2529073 0.6870984 0.6079588 0.7765399
## 19 47 male never 120 0.9 4.3 -0.2540724 -0.2850398 -0.2231051 0.7756356 0.7519843 0.8000308
## 39 47 male never 120 1.4 4.3 -0.2379764 -0.2779690 -0.1979839 0.7882213 0.7573203 0.8203831
A possible description of the results for creat
and
gender
would be:
The plot shows that males and females have differently shaped
non-linear effects of creat
on log(bili)
(and
bili
).
For females, we see that bili
increases for increasing
values of creat
, from 0.634 (95% CI [0.608, 0.661]) for a
creat
of 0.55 until approximately 0.687 (95% CI [0.665,
0.710]) for a creat
of 0.75, after which higher values of
creat
result in a decrease in the expected
bili
to 0.644 (95% CI [0.606, 0.685]) for a
creat
of 1.15.
For males, the expected bili
is increasing from 0.687
(95% CI [0.608, 0.675]) for a value of creat
of 0.6, to
0.776 (95% CI [0.752, 0.694]) for a value of creat
of
approximately 0.9, however, for small values of creat
there
is much uncertainty. The expected bili
remains relatively
stable for large values of creat
(0.788, 95% CI [0.757,]
for a creat
of 1.4).