Preface

R packages

In this practical, a number of R packages are used. The packages used (with versions that were used to generate the solutions) are:

  • R version 4.0.3 (2020-10-10)
  • survival (version: 3.2.7)

Datasets

For this practical, we will use the heart and retinopathy data sets from the survival package. More details about the data sets can be found in:

https://stat.ethz.ch/R-manual/R-devel/library/survival/html/heart.html

https://stat.ethz.ch/R-manual/R-devel/library/survival/html/retinopathy.html

The Apply Family

apply

Task 1

  • Obtain the mean of the columns start, stop, event, age, year, surgery of the heart data set.
  • Obtain the mean of the columns age, futime, risk of the retinopathy data set.

Solution 1

apply(heart[, c("start", "stop", "event", "age", "year", "surgery")], 2, mean)
##       start        stop       event         age        year     surgery 
##  15.5145349 201.2936047   0.4360465  -2.4840266   3.4532894   0.1686047
apply(retinopathy[, c("age", "futime", "risk")], 2, mean)
##      age   futime     risk 
## 20.78173 35.57929  9.69797

Task 2

Create the matrix dataset1 <- cbind(A = 1:30, B = sample(1:100, 30)) and find the row sum of dataset1.

Solution 2

dataset1 <- cbind(A = 1:30, B = sample(1:100, 30))
apply(dataset1, 1, sum)
##  [1]  67  97  83  81   6  44  10  37  34  55  54 101  82  72 115 108  52  42  69  53  28  30  79  37
## [25] 113  94  46  51 104  72

lapply

Task 1

Create the following function DerivativeFunction <- function(x) { log10(x) + 10 }. Apply the DerivativeFunction to dataset1 <- cbind(A = 1:30, B = sample(1:100, 30)). The output should be a list.

Solution 1

DerivativeFunction <- function(x) { log10(x) + 10 }
dataset1 <- cbind(A = 1:30, B = sample(1:100, 30))
lapply(dataset1, DerivativeFunction)
## [[1]]
## [1] 10
## 
## [[2]]
## [1] 10.30103
## 
## [[3]]
## [1] 10.47712
## 
## [[4]]
## [1] 10.60206
## 
## [[5]]
## [1] 10.69897
## 
## [[6]]
## [1] 10.77815
## 
## [[7]]
## [1] 10.8451
## 
## [[8]]
## [1] 10.90309
## 
## [[9]]
## [1] 10.95424
## 
## [[10]]
## [1] 11
## 
## [[11]]
## [1] 11.04139
## 
## [[12]]
## [1] 11.07918
## 
## [[13]]
## [1] 11.11394
## 
## [[14]]
## [1] 11.14613
## 
## [[15]]
## [1] 11.17609
## 
## [[16]]
## [1] 11.20412
## 
## [[17]]
## [1] 11.23045
## 
## [[18]]
## [1] 11.25527
## 
## [[19]]
## [1] 11.27875
## 
## [[20]]
## [1] 11.30103
## 
## [[21]]
## [1] 11.32222
## 
## [[22]]
## [1] 11.34242
## 
## [[23]]
## [1] 11.36173
## 
## [[24]]
## [1] 11.38021
## 
## [[25]]
## [1] 11.39794
## 
## [[26]]
## [1] 11.41497
## 
## [[27]]
## [1] 11.43136
## 
## [[28]]
## [1] 11.44716
## 
## [[29]]
## [1] 11.4624
## 
## [[30]]
## [1] 11.47712
## 
## [[31]]
## [1] 10
## 
## [[32]]
## [1] 11.6721
## 
## [[33]]
## [1] 11.9345
## 
## [[34]]
## [1] 11.49136
## 
## [[35]]
## [1] 11.77815
## 
## [[36]]
## [1] 11.20412
## 
## [[37]]
## [1] 11.8451
## 
## [[38]]
## [1] 11.94448
## 
## [[39]]
## [1] 11.74036
## 
## [[40]]
## [1] 11.41497
## 
## [[41]]
## [1] 11
## 
## [[42]]
## [1] 11.44716
## 
## [[43]]
## [1] 11.81291
## 
## [[44]]
## [1] 11.23045
## 
## [[45]]
## [1] 11.60206
## 
## [[46]]
## [1] 11.17609
## 
## [[47]]
## [1] 11.64345
## 
## [[48]]
## [1] 11.69897
## 
## [[49]]
## [1] 11.88649
## 
## [[50]]
## [1] 10.69897
## 
## [[51]]
## [1] 11.68124
## 
## [[52]]
## [1] 11.30103
## 
## [[53]]
## [1] 11.716
## 
## [[54]]
## [1] 11.14613
## 
## [[55]]
## [1] 11.11394
## 
## [[56]]
## [1] 11.94939
## 
## [[57]]
## [1] 11.61278
## 
## [[58]]
## [1] 11.98227
## 
## [[59]]
## [1] 11.99123
## 
## [[60]]
## [1] 11.43136

Task 2

  • Create a list that consist of the variables age and year from the heart data set and the variable risk from the retinopathy data set. Give the name list1 to this list.
  • Obtain the median of each element of the list. The output should be a list.

Solution 2

list1 <- list(heart$age, heart$year, retinopathy$risk)
lapply(list1, median)
## [[1]]
## [1] -0.1136208
## 
## [[2]]
## [1] 3.750856
## 
## [[3]]
## [1] 10

sapply

Task 1

Create the following function Function2 <- function(x) { exp(x) + 0.1 }. Apply the Function2 to dataset2 <- cbind(A = c(1:10), B = rnorm(10, 0, 1)). The output should be simplified.

Solution 1

Function2 <- function(x) { exp(x) + 0.1 }
dataset2 <- cbind(A = c(1:10), B = rnorm(10, 0, 1))
sapply(dataset2, Function2)
##  [1] 2.818282e+00 7.489056e+00 2.018554e+01 5.469815e+01 1.485132e+02 4.035288e+02 1.096733e+03
##  [8] 2.981058e+03 8.103184e+03 2.202657e+04 2.496550e-01 1.347263e+00 4.684401e+00 1.493833e+00
## [15] 1.703530e+00 3.206571e-01 1.308962e+00 3.383961e+00 5.384803e-01 3.022336e+00

Task 2

  • Create a list that consist of the variable transplant from the heart data set and the variable status from the retinopathy data set. Give the name list2 to this list.
  • Obtain the percentages of 0 cases of each element of the list in a simplified output (as a vector).
  • Obtain the percentages of 1 cases of each element of the list in a simplified output (as a vector).

Solution 2

list2 <- list(heart$transplant, retinopathy$status)
sapply(list2, function(x) { percent(x) } )
##        [,1]     [,2]
## 0  59.88372  60.6599
## 1  40.11628  39.3401
## N 172.00000 394.0000
sapply(list2, function(x) { percent(x) } )
##        [,1]     [,2]
## 0  59.88372  60.6599
## 1  40.11628  39.3401
## N 172.00000 394.0000

Task 3

  • Do you remember the practical Control_Flow_and_Functions: Writing your own function (Task 1 and 2)? Now try to create again the same function (called summary_df) but avoid the use of a for loop. Apply the function to the retinopathy dat set.

Use the functions summary_continuous() and summary_categorical().
summary_continuous <- function(x) {
   paste0(round(mean(x), 1), " ( ", round(sd(x), 1), ") ")
}

summary_categorical <- function(x) {
   tab <- prop.table(table(x))
   paste0(round(tab * 100, 1), "% ", names(tab), collapse = ", ")
}

Solution 3

summary_continuous <- function(x) {
  paste0(round(mean(x), 1), " (", round(sd(x), 1), ")")
}

summary_categorical <- function(x) {
  tab <- prop.table(table(x))
  paste0(round(tab * 100, 1), "% ", names(tab), collapse = ", ")
}


summary_df <- function(dat) {
  vec_categorical <- sapply(dat, is.factor)
  print(sapply(dat[,vec_categorical], summary_categorical))
  vec_continuous <- sapply(dat, is.numeric)
  print(sapply(dat[,vec_continuous], summary_continuous))
}

summary_df(dat = retinopathy)
##                         laser                           eye                          type 
##    "50.8% xenon, 49.2% argon"     "45.2% right, 54.8% left" "57.9% juvenile, 42.1% adult" 
##                           trt                        status 
##                "50% 0, 50% 1"            "60.7% 0, 39.3% 1" 
##              id             age          futime            risk 
## "873.2 (495.5)"   "20.8 (14.8)"   "35.6 (21.4)"     "9.7 (1.5)"

tapply

Task 1

  • Obtain the median year per transplant group using the heart data set.
  • Obtain the median futime per status group using the retinopathy data set.

Solution 1

tapply(heart$year, heart$transplant, median)
##       0       1 
## 3.47707 3.92334
tapply(retinopathy$futime, retinopathy$status, median)
##     0     1 
## 48.53 13.83

Task 2

  • Apply the function Fun1 <- function(x) { mean(x)/(length(x) - 2) } to year per transplant and surgery group using the heart data set.
  • Obtain the mean futime per status, type and trt group using the retinopathy data set.

Solution 2

Fun1 <- function(x) { mean(x)/(length(x) - 2) }
tapply(heart$year, list(heart$transplant, heart$surgery), Fun1)
##            0         1
## 0 0.03820181 0.2818764
## 1 0.06362334 0.3910915
tapply(retinopathy$futime, list(retinopathy$status, retinopathy$type, retinopathy$trt), mean)
## , , 0
## 
##   juvenile    adult
## 0 45.22127 48.42273
## 1 18.65137 19.25160
## 
## , , 1
## 
##   juvenile    adult
## 0 45.62218 47.92323
## 1 16.66944 21.32833
 

© Eleni-Rosalina Andrinopoulou