In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:
survival (version: 3.2.7)memisc (version: 0.99.27.3)ggplot2 (version: 3.3.3)For this practical, we will use the heart and retinopathy data sets from the survival package. More details about the data sets can be found in:
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/heart.html
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/retinopathy.html
Before starting with any statistical analysis it is important to transform and explore your data set.
age is equal to age - 48. Let’s bring age back to the normal scale. Do not overwrite the variable age, but create a new variable with the name age_orig.surgery into a factor with levels 0: no and 1: yes.Use the function factor(…) to convert a numerical variable to a factor.
heart$age_orig <- heart$age + 48
heart$surgery <- factor(heart$surgery, levels = c(0, 1), labels = c("no", "yes"))Categorize the variable age from the retinopathy data set as young: [minimum age until mean age) and old: [mean age until maximum age). Give this variable the name ageCat. Print the first 6 rows of the data set retinopathy.
To dichotomize a numerical variable use the function as.numeric(…). Use the function factor(…) to convert a variable into a factor.
retinopathy$ageCat <- as.numeric(retinopathy$age >= mean(retinopathy$age))
retinopathy$ageCat <- factor(retinopathy$ageCat, levels = c(0, 1), labels = c("young", "old"))
head(retinopathy)Categorize futime from data set retinopathy as follows:
short: [minimum futime until 25).medium: [25 until 45).long: [45 until maximum futime).futimeCat. Print the first 6 rows of the data.Create a variable that is identical to the futime variable (use the name futimeCut). Then use indexing to select the correct subset and set it to the new categorical variable.
E.g. you can create the low category as:
retinopathy$futimeCut <- retinopathy$futime
retinopathy$futimeCut[retinopathy$futime < 25] <- "short"
Now continue with the other categories.
retinopathy$futimeCut <- retinopathy$futime
retinopathy$futimeCut[retinopathy$futime < 25] <- "short"
retinopathy$futimeCut[retinopathy$futime >= 25 & retinopathy$futime < 45] <- "medium"
retinopathy$futimeCut[retinopathy$futime >= 45] <- "long"
head(retinopathy)Create 2 vectors of size 50 as follows:
Sex: takes 2 values 0 and 1.Age: takes values from 20 till 80.Sex variable into a factor with levels 0: female and 1: male.AgeCat as dichotomous with Age <= 50 to be 0 and 1 otherwise.AgeCat variable into a factor with levels 0: young and 1: old.Age variable by \(\frac{Age-mean(Age)}{sd(Age)}\).To sample a numerical and categorical variable use the function sample(…). To convert a numerical variable to a categorical use the function factor(…). To dichotomize a numerical variable use the function as.numeric(…).
Sex <- sample(0:1, 50, replace = T)
Age <- sample(20:80, 50, replace = T)
Sex <- factor(Sex, levels = c(0:1), labels = c("female", "male"))
AgeCat <- as.numeric(Age > 50)
AgeCat <- factor(AgeCat, levels = c(0:1), labels = c("young", "old"))
Age <- (Age - mean(Age))/sd(Age)Create a data frame with the name DF as follows:
Sex, Age, AgeCat form the previous Task.Gender, StandardizedAge, DichotomousAge.DF <- data.frame(Sex, Age, AgeCat)
DF <- data.frame("Gender" = Sex, "StandardizedAge" = Age, "DichotomousAge" = AgeCat)Create 2 vectors of size 150 as follows:
Treatment: takes 2 values 1 and 2.Weight: takes values from 50 till 100.Treatment variable into a factor with levels 1: no and 2: yes.Weight variable by Weight * 1000.Treatment and Weight.To sample a numerical and categorical variable use the function sample(…). To convert a numerical variable to a categorical use the function factor(…).
Treatment <- sample(1:2, 150, replace = T)
Weight <- sample(50:100, 150, replace = T)
Treatment <- factor(Treatment, levels = c(1:2), labels = c("no", "yes"))
Weight <- Weight * 1000
data.frame(Treatment, Weight)Create a list called my_list with the following:
let: a to i.sex: factor taking the values males and females and length 50.mat: matrix
| 1 | 2 |
| 3 | 4 |
To obtain letters use the function letters(…). To sample a numerical and categorical variable use the function sample(…). To convert a numerical variable to a categorical use the function factor(…).
let <- letters[1:9]
sex <- sample(1:2, 50, replace = TRUE)
sex <- factor(sex, levels = 1:2, labels = c("males", "females"))
mat <- matrix(1:4 ,2, 2, byrow = TRUE)
my_list <- list(let = let, sex = sex, mat = mat) Let’s obtain some descriptive statistics.
Obtain the mean and standard deviation for the variable age using the heart data set.
Use the functions mean(…) and sd(…).
mean(heart$age)## [1] -2.484027
sd(heart$age)## [1] 9.419999
Using the retinopathy data set:
age. type.age.Use the functions median(…) and IQR(…) to obtain the median and the interquartile range. Load the package memisc and use the function percent(…) in order to obtain the percentages. To check whether there are missing values use the functions sum(is.na(…)).
median(retinopathy$age)## [1] 16
IQR(retinopathy$age)## [1] 20
library(memisc)
percent(retinopathy$type)## juvenile adult N
## 57.86802 42.13198 394.00000
sum(is.na(retinopathy$age)) # any(is.na(retinopathy$age))## [1] 0
Using the data frame DF from the exercise before (Task 5):
StandardizedAge.StandardizedAge.Gender.DichotomousAge.Gender and DichotomousAge (crosstab table).To calculate the frequencies, use the functions length(…) or table(…). To obtain the dimensions use the function dim(…).
mean(DF$StandardizedAge)## [1] -1.423926e-16
sd(DF$StandardizedAge)## [1] 1
length(DF$Gender[DF$Gender == "female"])## [1] 20
length(DF$Gender[DF$Gender == "male"])## [1] 30
table(DF$Gender)##
## female male
## 20 30
table(DF$Gender, DF$DichotomousAge)##
## young old
## female 15 5
## male 13 17
dim(DF)## [1] 50 3
Let’s visualize the data.
Using the heart data set:
age and year.Age for the x-axis and Year of acceptance for the y-axis.Use the function plot(…, xlab, ylab, col). Use the function legend(…) to add a legend to the plot.
plot(heart$age, heart$year)plot(heart$age, heart$year, xlab = "Age", ylab = "Year of acceptance")plot(heart$age, heart$year, xlab = "Age", ylab = "Year of acceptance", col = heart$transplant)
legend(-40, 6, c("no", "yes"), col = c("black", "red"), pch = 1)Using the retinopathy data set:
age per status.Use the function boxplot(…).
boxplot(retinopathy$age ~ retinopathy$status)boxplot(retinopathy$age ~ retinopathy$status, col = c("blue", "green"))Using the retinopathy data set:
age with risk.age per type group.Use the ggplot2 package and the functions: geom_smooth(…) and geom_density(…).
library(ggplot2)
ggplot(retinopathy, aes(age, risk)) +
geom_smooth(colour='black', span = 0.4)## `geom_smooth()` using method = 'loess' and formula 'y ~ x'
ggplot(retinopathy, aes(age, fill = type)) +
geom_density(alpha = 0.25) © Eleni-Rosalina Andrinopoulou