In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:
survival
(version: 3.2.7)For this practical, we will use the heart and retinopathy data sets from the survival
package. More details about the data sets can be found in:
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/heart.html
https://stat.ethz.ch/R-manual/R-devel/library/survival/html/retinopathy.html
Examine the following code in R.
<- function(k){
h if (k <= 20){
3 * k
else {
} 2 * k
} }
Investigate what this code takes as an input (e.g. scalar, vector, matrix, data.frame, list) and what the output will be in every case.
<- function(k){
h if (k <= 20){
3 * k
else {
} 2 * k
}
}<- 10
scalar_test <- c(1:10)
vec_test <- matrix(1:6, 3, 3)
mat_test <- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))
df_test <- list(l1 = 1:30, l2 = c("m", "f"), l3 = c(TRUE, TRUE, FALSE, FALSE, FALSE))
list_test # h(scalar_test)
# h(vec_test)
# h(mat_test)
# h(df_test)
# h(list_test)
Examine the following code in R.
<- function(k){
h for(i in 1:length(k)){
<- if (k[i] <= 20){
k[i]3 * k[i]
else {
} 2 * k[i]
}
}
k }
Investigate what this code takes as an input (e.g. scalar, vector, matrix, data.frame, list) and what the output will be in every case.
<- function(k){
h for(i in 1:length(k)){
<- if (k[i] <= 20){
k[i]3 * k[i]
else {
} 2 * k[i]
}
}
k
}<- 10
scalar_test <- c(1:10)
vec_test <- matrix(1:6, 3, 3)
mat_test <- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))
df_test <- list(l1 = 1:30, l2 = c("m", "f"), l3 = c(TRUE, TRUE, FALSE, FALSE, FALSE))
list_test # h(scalar_test)
# h(vec_test)
# h(mat_test)
# h(df_test)
# h(list_test)
Define a function (with the name mysummary
) that takes one parameter (let’s call it x
) and investigates if it is a data frame, a factor, or a numeric variable and returns the message “This is a data.frame” if x
is a data frame, “This is a factor” is x
is a factor and “This is numeric” if x
is numeric.
Use an if statement. When you want to display results from the function you can use the print(…) function.
<- function(x){
mysummary if(is.data.frame(x)){
print('This is a data.frame')
else if(is.factor(x)){
}print('This is a factor')
else if(is.numeric(x)){
}print('This is numeric')
}
}<- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))
df_test <- as.factor(c("y", "n", "n", "n"))
vec1_test <- 1:10
vec2_test mysummary(df_test)
## [1] "This is a data.frame"
mysummary(vec1_test)
## [1] "This is a factor"
mysummary(vec2_test)
## [1] "This is numeric"
Extend the previous function mysummary
to take again one parameter (let’s call it x
) but to return also some descriptive statistics. We therefore modify the function in such a way that a frequency table is given for factors and the mean and standard deviation are given for numeric data. For a data frame, return only an extra message that indicates that this is not implemented yet.
Use the functions table(…), mean(…) and sd(…).
<- function(x){
mysummary if(is.data.frame(x)){
print('This is a data.frame')
print('Not yet implemented')
else if(is.factor(x)){
}print('This is a factor')
print(table(x))
else if(is.numeric(x)){
}print('This is numeric')
print('mean')
print(mean(x))
print('sd')
print(sd(x))
}
}<- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))
df_test <- as.factor(c("y", "n", "n", "n"))
vec1_test <- 1:10
vec2_test mysummary(df_test)
## [1] "This is a data.frame"
## [1] "Not yet implemented"
mysummary(vec1_test)
## [1] "This is a factor"
## x
## n y
## 3 1
mysummary(vec2_test)
## [1] "This is numeric"
## [1] "mean"
## [1] 5.5
## [1] "sd"
## [1] 3.02765
The previous function mysummary
still does not do anything useful when we apply it on the whole data.frame. Extend the function to take again one parameter (let’s call it x
) but to return also some descriptive statistics for the data frame.
This can be solved in the following way: whenever the function is called using a data.frame as argument, we use a for loop to loop over its columns. Within this loop we let the function call itself but now using the column as argument.
<- function(x){
mysummary if(is.data.frame(x)){
print('This is a data.frame')
for(i in 1:ncol(x)){
print(names(x)[i])
mysummary(x[,i])
} else if(is.factor(x)){
}print('This is a factor')
print(table(x))
else if(is.numeric(x)){
}print('This is numeric')
print('mean')
print(mean(x))
print('sd')
print(sd(x))
}
}<- data.frame(v1 = 1:10, v2 = sample(0:1, 10, replace = TRUE))
df1_test <- data.frame(v1 = 1:10, v2 = sample(c("no", "yes"), 10, replace = TRUE))
df2_test mysummary(df1_test)
## [1] "This is a data.frame"
## [1] "v1"
## [1] "This is numeric"
## [1] "mean"
## [1] 5.5
## [1] "sd"
## [1] 3.02765
## [1] "v2"
## [1] "This is numeric"
## [1] "mean"
## [1] 0.6
## [1] "sd"
## [1] 0.5163978
mysummary(df2_test)
## [1] "This is a data.frame"
## [1] "v1"
## [1] "This is numeric"
## [1] "mean"
## [1] 5.5
## [1] "sd"
## [1] 3.02765
## [1] "v2"
Create a function with the name std_num
that takes as input a data frame (let’s call it DF
), standardizes all the numerical variables and returns the new data frame. Apply this function to the heart data set.
Use an if statement and a for loop. Note that you will have to transform some variables to factors first.
$event <- as.factor(heart$event)
heart$surgery <- as.factor(heart$surgery)
heart$transplant <- as.factor(heart$transplant)
heart$id <- as.factor(heart$id)
heart
<- function(DF){
std_num for (j in 1:dim(DF)[2]){
if (is.numeric(DF[,j])){
<- (DF[,j] - mean(DF[,j]))/sd(DF[,j])
DF[,j]
}
}
DF
}std_num(heart)
Create a for loop that goes through the columns of the retinopathy data set and plots each column. If the column is a numerical variable then use a density plot, if the column is a categorical variable then use a barchart. Create a plot with multiple panels.
Use an if statement. Use the function par(mfrow = …) to create multiple plots. Note that you will have to transform some variables to factors first.
$trt <- as.factor(retinopathy$trt)
retinopathy$status <- as.factor(retinopathy$status)
retinopathy
par(mfrow = c(3, 3))
for (j in 1:dim(retinopathy)[2]){
if (is.numeric(retinopathy[,j])){
plot(density(retinopathy[,j]), col = rgb(0,0,1,0.5))
polygon(density(retinopathy[,j]), col = rgb(0,0,1,0.5), border = "blue")
else if (is.factor(retinopathy[,j])){
} plot(retinopathy[,j])
} }
Create a function that applies the previous code in any data set. The name of the function will be plot_summary
and it will take as input a data frame (let’s call it dt
) and the dimension of the panel that includes the plots. In particular, dim_row
represents the rows and dim_col
the columns. The function will return a plot for each column. If the column is a numerical variable then use a density plot, if the column is categorical variable then use a barchart. Apply this function to the retinopathy data set.
Use an if statement and a for loop. Use the function par(mfrow = …) to create multiple plots. Note that you will have to transform some variables to factors first.
$trt <- as.factor(retinopathy$trt)
retinopathy$status <- as.factor(retinopathy$status)
retinopathy
<- function(dt, dim_row, dim_col){
plot_summary par(mfrow = c(dim_row, dim_col))
for (j in 1:dim(dt)[2]){
if (is.numeric(dt[,j])){
plot(density(dt[,j]), col = rgb(0,0,1,0.5))
polygon(density(dt[,j]), col = rgb(0,0,1,0.5), border = "blue")
else if (is.factor(dt[,j])){
} plot(dt[,j])
}
}
}plot_summary(retinopathy, 3, 3)
Extend the previous function plot_summary
to include also plot titles indicating the variable. Apply this function to the retinopathy data set.
Use the paste0(…) function.
$trt <- as.factor(retinopathy$trt)
retinopathy$status <- as.factor(retinopathy$status)
retinopathy
<- function(dt, dim_row, dim_col){
plot_summary par(mfrow = c(dim_row, dim_col))
for (j in 1:dim(dt)[2]){
if (is.numeric(dt[,j])){
plot(density(dt[,j]), col = rgb(0,0,1,0.5), main = paste0(colnames(dt)[j]))
polygon(density(dt[,j]), col = rgb(0,0,1,0.5), border = "blue")
else if (is.factor(dt[,j])){
} plot(dt[,j], main = paste0(colnames(dt)[j]))
}
}
}plot_summary(retinopathy, 3, 3)
For each data set (heart and retinopathy), select only the observations that report an event.
Note that the variable name of the events is diffent for the two data sets.
$event == 1, ] heart[heart
$status == 1, ] retinopathy[retinopathy
Create a function with the name dt_events
that takes as input a list with data sets (let’s call it dt_list
) and a vector indicating the name of the event variable for each data set (let’s call it event_name
). The function will return a list consisting of each data set but including only the observations/rows that had an event. Create a list that consists of the data sets heart and retinopathy and apply the function.
Use a for loop. Note that the variable name of the events is different for the two data sets.
<- function(dt_lists, event_name){
dt_events <- list()
new_dt for (k in 1:length(dt_lists)) {
<- dt_lists[[k]]
dt <- dt[dt[[event_name[k]]] == 1, ]
new_dt[[k]]
}print(new_dt)
}
dt_events(dt_lists = list(heart, retinopathy), event_name = c("event", "status"))
## [[1]]
## start stop event age year surgery transplant id
## 1 0.0 50 1 -17.15537303 0.12320329 0 0 1
## 2 0.0 6 1 3.83572895 0.25462012 0 0 2
## 4 1.0 16 1 6.29705681 0.26557153 0 1 3
## 6 36.0 39 1 -7.73716632 0.49007529 0 1 4
## 7 0.0 18 1 -27.21423682 0.60780287 0 0 5
## 8 0.0 3 1 6.59548255 0.70088980 0 0 6
## 10 51.0 675 1 2.86926762 0.78028747 0 1 7
## 11 0.0 40 1 -2.65023956 0.83504449 0 0 8
## 12 0.0 85 1 -0.83778234 0.85694730 0 0 9
## 14 12.0 58 1 -5.49760438 0.86242300 0 1 10
## 16 26.0 153 1 -0.01916496 0.87337440 0 1 11
## 17 0.0 8 1 5.19370294 0.96372348 0 0 12
## 19 17.0 81 1 6.57357974 0.96919918 0 1 13
## 21 37.0 1387 1 6.01232033 0.97193703 0 1 14
## 22 0.0 1 1 5.81519507 0.99110198 1 0 15
## 24 28.0 308 1 1.44832307 1.07049966 0 1 16
## 25 0.0 36 1 -27.66872005 1.07597536 0 0 17
## 27 20.0 43 1 8.84873374 1.08692676 0 1 18
## 28 0.0 37 1 11.12388775 1.13347023 0 0 19
## 30 18.0 28 1 7.27994524 1.33059548 0 1 20
## 32 8.0 1032 1 -4.65708419 1.33880903 0 1 21
## 34 12.0 51 1 -5.21560575 1.46201232 0 1 22
## 36 3.0 733 1 10.35728953 1.52772074 0 1 23
## 38 83.0 219 1 3.80013689 1.56605065 0 1 24
## 42 0.0 263 1 -39.21423682 1.59069131 0 0 27
## 44 71.0 72 1 6.02327173 1.68377823 0 1 28
## 45 0.0 35 1 2.43394935 1.78507871 0 0 29
## 47 16.0 852 1 -3.08829569 1.88364134 0 1 30
## 48 0.0 16 1 6.88569473 1.89459274 0 0 31
## 50 17.0 77 1 16.40793977 1.91101985 0 1 32
## 55 0.0 12 1 -4.53388090 2.30800821 0 0 35
## 57 46.0 100 1 0.92539357 2.50787132 0 1 36
## 59 19.0 66 1 13.50034223 2.56536619 0 1 37
## 61 4.5 5 1 -6.52977413 2.59274470 0 1 38
## 63 2.0 53 1 2.51882272 2.63381246 0 1 39
## 68 0.0 3 1 -11.55920602 2.88843258 0 0 42
## 69 0.0 2 1 -4.60780287 3.05817933 1 0 43
## 70 0.0 40 1 -5.42094456 3.16495551 1 0 44
## 72 1.0 45 1 -11.81656400 3.26351814 0 1 45
## 74 2.0 996 1 0.61054073 3.27720739 1 1 46
## 76 21.0 72 1 -0.90075291 3.34017796 0 1 47
## 77 0.0 9 1 8.03559206 3.34839151 0 0 48
## 81 83.0 980 1 -2.11362081 3.37577002 1 1 50
## 83 32.0 285 1 0.73374401 3.47707050 0 1 51
## 84 0.0 102 1 -6.75154004 3.56468172 0 0 52
## 86 41.0 188 1 -0.65708419 3.75085558 0 1 53
## 87 0.0 3 1 -0.20807666 3.75085558 0 0 54
## 89 10.0 61 1 4.45448323 3.85489391 0 1 55
## 92 0.0 149 1 -6.73511294 3.95071869 0 0 57
## 94 21.0 343 1 0.01642710 3.97809719 1 1 58
## 98 3.0 68 1 1.05407255 4.13141684 0 1 60
## 99 0.0 2 1 4.56399726 4.17522245 0 0 61
## 100 0.0 69 1 -8.64613279 4.18891170 0 0 62
## 104 33.0 584 1 0.81587953 4.33675565 1 1 64
## 106 12.0 78 1 3.29363450 4.42984257 0 1 65
## 107 0.0 32 1 5.21286790 4.46817248 0 0 66
## 109 57.0 285 1 -28.44900753 4.47638604 0 1 67
## 111 3.0 68 1 -2.75975359 4.51745380 0 1 68
## 115 5.0 30 1 5.00205339 4.71184120 0 1 70
## 121 27.0 90 1 8.33127995 4.94729637 0 1 73
## 123 5.0 17 1 -18.83367556 4.96646133 0 1 74
## 124 0.0 2 1 4.18069815 4.99657769 0 0 75
## 127 0.0 21 1 -6.88843258 5.01574264 0 0 77
## 131 67.0 96 1 5.78234086 5.16632444 0 1 79
## 138 32.0 80 1 5.30869268 5.31690623 0 1 83
## 140 37.0 334 1 -5.28131417 5.33333333 0 1 84
## 141 0.0 5 1 -0.01916496 5.35249829 0 0 85
## 145 60.0 110 1 -1.74674880 5.47022587 0 1 87
## 149 139.0 207 1 3.04722793 5.51129363 0 1 89
## 151 160.0 186 1 4.03285421 5.51403149 1 1 90
## 152 0.0 340 1 -0.40520192 5.53319644 0 0 91
## 158 4.0 165 1 -4.15879535 5.95482546 1 1 94
## 160 2.0 16 1 -7.71800137 5.97672827 0 1 95
## 167 0.0 21 1 1.83436003 6.23408624 0 0 99
## 172 0.0 6 1 -8.68446270 -0.04928131 0 0 103
##
## [[2]]
## id laser eye age type trt futime status risk
## 4 14 argon right 12 juvenile 0 31.30 1 6
## 10 29 xenon left 13 juvenile 0 0.30 1 10
## 12 46 xenon right 12 juvenile 0 54.27 1 9
## 14 49 argon right 8 juvenile 0 10.80 1 6
## 20 71 argon right 21 adult 0 13.83 1 9
## 21 100 argon left 23 adult 1 46.43 1 9
## 24 112 argon right 44 adult 0 7.90 1 12
## 27 127 xenon right 48 adult 1 30.83 1 6
## 28 127 xenon right 48 adult 0 38.57 1 10
## 30 133 argon right 26 adult 0 14.10 1 9
## 31 150 argon left 10 juvenile 1 20.17 1 9
## 32 150 argon left 10 juvenile 0 6.90 1 10
## 34 167 argon left 23 adult 0 41.40 1 9
## 44 214 xenon right 45 adult 0 0.60 1 12
## 45 220 argon right 11 juvenile 1 10.27 1 10
## 46 220 argon right 11 juvenile 0 1.63 1 10
## 49 255 xenon left 10 juvenile 1 5.67 1 12
## 50 255 xenon left 10 juvenile 0 13.83 1 9
## 52 264 argon right 12 juvenile 0 29.97 1 11
## 54 266 argon right 36 adult 0 26.37 1 11
## 55 284 argon left 53 adult 1 5.77 1 10
## 56 284 argon left 53 adult 0 1.33 1 12
## 57 295 xenon left 10 juvenile 1 5.90 1 11
## 58 295 xenon left 10 juvenile 0 35.53 1 11
## 59 300 xenon left 25 adult 1 25.63 1 10
## 60 300 xenon left 25 adult 0 21.90 1 9
## 61 302 argon left 14 juvenile 1 33.90 1 9
## 62 302 argon left 14 juvenile 0 14.80 1 9
## 63 315 argon left 16 juvenile 1 1.73 1 10
## 64 315 argon left 16 juvenile 0 6.20 1 8
## 66 324 xenon left 38 adult 0 22.00 1 8
## 69 335 argon left 10 juvenile 1 30.20 1 9
## 70 335 argon left 10 juvenile 0 22.00 1 11
## 73 349 xenon right 44 adult 1 25.80 1 11
## 74 349 xenon right 44 adult 0 13.87 1 8
## 75 357 argon left 21 adult 1 5.73 1 9
## 76 357 argon left 21 adult 0 48.30 1 9
## 79 385 argon right 13 juvenile 1 1.90 1 10
## 81 396 argon left 40 adult 1 9.90 1 10
## 82 396 argon left 40 adult 0 9.90 1 9
## 86 409 argon left 48 adult 0 2.67 1 12
## 88 419 xenon right 42 adult 0 13.83 1 10
## 90 429 xenon right 24 adult 0 4.27 1 10
## 92 433 argon left 55 adult 0 13.90 1 10
## 97 468 argon right 6 juvenile 1 1.70 1 10
## 98 468 argon right 6 juvenile 0 1.70 1 6
## 99 480 xenon right 19 juvenile 1 1.77 1 9
## 100 480 xenon right 19 juvenile 0 43.03 1 12
## 105 503 xenon left 27 adult 1 8.30 1 9
## 106 503 xenon left 27 adult 0 8.30 1 10
## 108 515 argon right 43 adult 0 18.43 1 9
## 112 538 argon left 45 adult 0 31.63 1 9
## 115 550 xenon left 3 juvenile 1 18.70 1 10
## 116 550 xenon left 3 juvenile 0 6.53 1 11
## 120 557 argon right 13 juvenile 0 22.23 1 8
## 122 561 xenon right 15 juvenile 0 14.00 1 9
## 123 568 argon left 10 juvenile 1 42.17 1 9
## 124 568 argon left 10 juvenile 0 42.17 1 10
## 126 572 xenon left 6 juvenile 0 5.33 1 9
## 128 576 xenon left 17 juvenile 0 59.80 1 10
## 130 581 argon left 37 adult 0 5.83 1 11
## 132 606 xenon right 18 juvenile 0 2.17 1 12
## 133 610 xenon left 13 juvenile 1 14.30 1 9
## 134 610 xenon left 13 juvenile 0 48.43 1 8
## 141 631 argon left 11 juvenile 1 13.33 1 12
## 142 631 argon left 11 juvenile 0 9.60 1 10
## 147 653 xenon left 15 juvenile 1 14.27 1 9
## 148 653 xenon left 15 juvenile 0 7.60 1 12
## 149 662 argon left 7 juvenile 1 34.57 1 12
## 150 662 argon left 7 juvenile 0 1.80 1 12
## 152 664 argon left 2 juvenile 0 4.30 1 12
## 153 683 xenon left 22 adult 1 4.10 1 8
## 154 683 xenon left 22 adult 0 12.20 1 6
## 160 706 xenon right 27 adult 0 12.73 1 10
## 162 717 xenon right 53 adult 0 54.10 1 11
## 164 722 xenon right 10 juvenile 0 9.40 1 12
## 165 731 xenon left 13 juvenile 1 21.57 1 12
## 166 731 xenon left 13 juvenile 0 9.90 1 10
## 181 778 xenon left 25 adult 1 26.23 1 8
## 184 780 argon right 15 juvenile 0 18.03 1 11
## 189 804 xenon right 23 adult 1 7.07 1 10
## 191 810 xenon left 13 juvenile 1 13.77 1 10
## 192 810 xenon left 13 juvenile 0 13.77 1 12
## 194 815 xenon left 45 adult 0 9.63 1 12
## 198 834 argon left 8 juvenile 0 1.50 1 12
## 199 838 xenon left 30 adult 1 33.63 1 9
## 200 838 xenon left 30 adult 0 33.63 1 9
## 203 866 argon left 39 adult 1 63.33 1 11
## 204 866 argon left 39 adult 0 27.60 1 10
## 205 887 argon right 26 adult 1 38.47 1 10
## 206 887 argon right 26 adult 0 1.63 1 10
## 210 910 xenon left 34 adult 0 25.30 1 10
## 212 920 xenon left 10 juvenile 0 46.20 1 8
## 214 925 argon right 40 adult 0 1.70 1 10
## 221 949 xenon right 13 juvenile 1 10.33 1 9
## 222 949 xenon right 13 juvenile 0 0.83 1 10
## 223 952 argon left 9 juvenile 1 6.13 1 12
## 226 962 argon left 5 juvenile 0 25.93 1 9
## 230 971 argon right 23 adult 0 19.40 1 10
## 232 978 argon left 2 juvenile 0 21.97 1 10
## 235 987 xenon left 7 juvenile 1 26.20 1 10
## 238 1002 argon left 13 juvenile 0 18.03 1 11
## 239 1017 xenon left 50 adult 1 13.83 1 8
## 240 1017 xenon left 50 adult 0 1.57 1 9
## 242 1029 xenon right 20 adult 0 13.37 1 12
## 243 1034 argon left 15 juvenile 1 11.07 1 11
## 244 1034 argon left 15 juvenile 0 1.97 1 9
## 246 1037 xenon right 30 adult 0 22.20 1 9
## 251 1074 xenon left 4 juvenile 1 6.10 1 10
## 253 1098 argon left 3 juvenile 1 2.10 1 10
## 254 1098 argon left 3 juvenile 0 11.30 1 11
## [ reached 'max' / getOption("max.print") -- omitted 44 rows ]
For each data set (heart and retinopathy), obtain the mean age per event status.
tapply(heart$age, heart$event, mean)
## 0 1
## -3.421947 -1.270983
tapply(retinopathy$age, retinopathy$status, mean)
## 0 1
## 20.35146 21.44516
Create a function with the name summary_list
that takes as input a list with data sets (let’s call it dt_list
), a vector with the names of the continuous variables (let’s call it dt_cont
) and a vector with the names of the categorical variables (let’s call it dt_cat
) and returns the mean of the continuous variable per group (categorical variable) for each data set. Apply the function to the heart and retinopathy data sets. In particular, use the continuous variables year
(from the heart data set) and risk
(from the retinopathy data set) and the categorical variables surgery
(from the heart data set) and type
(from the retinopathy data set).
Use a for loop. Use the print(…) function to print the results.
<- function(dt_list, dt_cont, dt_cat){
summary_list for (i in 1:length(dt_list)){
<- dt_list[[i]]
dt print(tapply(dt[[dt_cont[i]]], dt[[dt_cat[i]]], mean))
}
}
summary_list(dt_list = list(heart, retinopathy), dt_cont = c("year", "risk"), dt_cat = c("surgery", "type"))
## 0 1
## 3.320975 4.105738
## juvenile adult
## 9.745614 9.632530
Extend the previous by including an extra input argument (let’s call it calc
) which will indicate each time the measure that we want to calculate e.g, the mean. So that the user can decide which function to use.
<- function(dt_list, dt_cont, dt_cat, calc){
summary_list for (i in 1:length(dt_list)){
<- dt_list[[i]]
dt print(tapply(dt[[dt_cont[i]]], dt[[dt_cat[i]]], calc))
}
}
summary_list(dt_list = list(heart, retinopathy), dt_cont = c("year", "risk"), dt_cat = c("surgery", "type"), calc = median)
## 0 1
## 3.564682 3.978097
## juvenile adult
## 10 9
© Eleni-Rosalina Andrinopoulou