In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:
survival
(version: 3.2.7)memisc
(version: 0.99.27.3)ggplot2
(version: 3.3.3)For this practical, we will use the heart and retinopathy data sets from the survival
package. More details about the data sets can be found in:
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/heart.html
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/retinopathy.html
Before starting with any statistical analysis it is important to transform and explore your data set.
age
is equal to age
- 48. Let’s bring age
back to the normal scale. Do not overwrite the variable age
, but create a new variable with the name age_orig
.surgery
into a factor with levels 0: no
and 1: yes
.Use the function factor(…) to convert a numerical variable to a factor.
$age_orig <- heart$age + 48
heart$surgery <- factor(heart$surgery, levels = c(0, 1), labels = c("no", "yes")) heart
Categorize the variable age
from the retinopathy data set as young
: [minimum age
until mean age
) and old
: [mean age
until maximum age
). Give this variable the name ageCat
. Print the first 6 rows of the data set retinopathy.
To dichotomize a numerical variable use the function as.numeric(…). Use the function factor(…) to convert a variable into a factor.
$ageCat <- as.numeric(retinopathy$age >= mean(retinopathy$age))
retinopathy$ageCat <- factor(retinopathy$ageCat, levels = c(0, 1), labels = c("young", "old"))
retinopathyhead(retinopathy)
Categorize futime
from data set retinopathy as follows:
short
: [minimum futime
until 25).medium
: [25 until 45).long
: [45 until maximum futime
).futimeCat
. Print the first 6 rows of the data.Create a variable that is identical to the futime variable (use the name futimeCut). Then use indexing to select the correct subset and set it to the new categorical variable.
E.g. you can create the low category as:
retinopathy$futimeCut <- retinopathy$futime
retinopathy$futimeCut[retinopathy$futime < 25] <- "short"
Now continue with the other categories.
$futimeCut <- retinopathy$futime
retinopathy$futimeCut[retinopathy$futime < 25] <- "short"
retinopathy$futimeCut[retinopathy$futime >= 25 & retinopathy$futime < 45] <- "medium"
retinopathy$futimeCut[retinopathy$futime >= 45] <- "long"
retinopathyhead(retinopathy)
Create 2 vectors of size 50 as follows:
Sex
: takes 2 values 0 and 1.Age
: takes values from 20 till 80.Sex
variable into a factor with levels 0: female
and 1: male
.AgeCat
as dichotomous with Age
<= 50 to be 0 and 1 otherwise.AgeCat
variable into a factor with levels 0: young
and 1: old
.Age
variable by \(\frac{Age-mean(Age)}{sd(Age)}\).To sample a numerical and categorical variable use the function sample(…). To convert a numerical variable to a categorical use the function factor(…). To dichotomize a numerical variable use the function as.numeric(…).
<- sample(0:1, 50, replace = T)
Sex <- sample(20:80, 50, replace = T)
Age <- factor(Sex, levels = c(0:1), labels = c("female", "male"))
Sex <- as.numeric(Age > 50)
AgeCat <- factor(AgeCat, levels = c(0:1), labels = c("young", "old"))
AgeCat <- (Age - mean(Age))/sd(Age) Age
Create a data frame with the name DF
as follows:
Sex
, Age
, AgeCat
form the previous Task.Gender
, StandardizedAge
, DichotomousAge
.<- data.frame(Sex, Age, AgeCat)
DF <- data.frame("Gender" = Sex, "StandardizedAge" = Age, "DichotomousAge" = AgeCat) DF
Create 2 vectors of size 150 as follows:
Treatment
: takes 2 values 1 and 2.Weight
: takes values from 50 till 100.Treatment
variable into a factor with levels 1: no
and 2: yes
.Weight
variable by Weight
* 1000.Treatment
and Weight
.To sample a numerical and categorical variable use the function sample(…). To convert a numerical variable to a categorical use the function factor(…).
<- sample(1:2, 150, replace = T)
Treatment <- sample(50:100, 150, replace = T)
Weight <- factor(Treatment, levels = c(1:2), labels = c("no", "yes"))
Treatment <- Weight * 1000
Weight data.frame(Treatment, Weight)
Create a list called my_list
with the following:
let
: a
to i
.sex
: factor taking the values males
and females
and length 50.mat
: matrix
1 | 2 |
3 | 4 |
To obtain letters use the function letters(…). To sample a numerical and categorical variable use the function sample(…). To convert a numerical variable to a categorical use the function factor(…).
<- letters[1:9]
let <- sample(1:2, 50, replace = TRUE)
sex <- factor(sex, levels = 1:2, labels = c("males", "females"))
sex <- matrix(1:4 ,2, 2, byrow = TRUE)
mat <- list(let = let, sex = sex, mat = mat) my_list
Let’s obtain some descriptive statistics.
Obtain the mean and standard deviation for the variable age
using the heart data set.
Use the functions mean(…) and sd(…).
mean(heart$age)
## [1] -2.484027
sd(heart$age)
## [1] 9.419999
Using the retinopathy data set:
age
. type
.age
.Use the functions median(…) and IQR(…) to obtain the median and the interquartile range. Load the package memisc and use the function percent(…) in order to obtain the percentages. To check whether there are missing values use the functions sum(is.na(…)).
median(retinopathy$age)
## [1] 16
IQR(retinopathy$age)
## [1] 20
library(memisc)
percent(retinopathy$type)
## juvenile adult N
## 57.86802 42.13198 394.00000
sum(is.na(retinopathy$age)) # any(is.na(retinopathy$age))
## [1] 0
Using the data frame DF
from the exercise before (Task 5):
StandardizedAge
.StandardizedAge
.Gender
.DichotomousAge
.Gender
and DichotomousAge
(crosstab table).To calculate the frequencies, use the functions length(…) or table(…). To obtain the dimensions use the function dim(…).
mean(DF$StandardizedAge)
## [1] -1.423926e-16
sd(DF$StandardizedAge)
## [1] 1
length(DF$Gender[DF$Gender == "female"])
## [1] 20
length(DF$Gender[DF$Gender == "male"])
## [1] 30
table(DF$Gender)
##
## female male
## 20 30
table(DF$Gender, DF$DichotomousAge)
##
## young old
## female 15 5
## male 13 17
dim(DF)
## [1] 50 3
Let’s visualize the data.
Using the heart data set:
age
and year
.Age
for the x-axis and Year of acceptance
for the y-axis.Use the function plot(…, xlab, ylab, col). Use the function legend(…) to add a legend to the plot.
plot(heart$age, heart$year)
plot(heart$age, heart$year, xlab = "Age", ylab = "Year of acceptance")
plot(heart$age, heart$year, xlab = "Age", ylab = "Year of acceptance", col = heart$transplant)
legend(-40, 6, c("no", "yes"), col = c("black", "red"), pch = 1)
Using the retinopathy data set:
age
per status
.Use the function boxplot(…).
boxplot(retinopathy$age ~ retinopathy$status)
boxplot(retinopathy$age ~ retinopathy$status, col = c("blue", "green"))
Using the retinopathy data set:
age
with risk
.age
per type
group.Use the ggplot2 package and the functions: geom_smooth(…) and geom_density(…).
library(ggplot2)
ggplot(retinopathy, aes(age, risk)) +
geom_smooth(colour='black', span = 0.4)
## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(retinopathy, aes(age, fill = type)) +
geom_density(alpha = 0.25)
© Eleni-Rosalina Andrinopoulou