Load packages

If you are using the package for the first time, you will first have to install it.

# install.packages("survival") 
# install.packages("memisc") 

If you have already downloaded these packages in the current version of R, you will only have to load the packages.

library(survival)
library(memisc)

Get and view the data

Load a data set from a package.
You can use the double colon symbol (:), to return the pbc object from the package survival. We store this data set to an object with the name pbc.

pbc <- survival::pbc

Print the first 6 rows of the data set using the function head().

head(pbc)

View the data set.

view(pbc)

Common research questions that can be answered in R

What is the average age? To obtain that we can use the function mean().

mean(pbc$age)
## [1] 50.74155

What is the average serum bilirubin?

mean(pbc$bili)
## [1] 3.220813

What is the average serum cholesterol?

mean(pbc$chol)
## [1] NA

The previous code would not work because we have some missing values in that variable. If we carefully check the help page of the function mean(), we will see that there is an argument that can handle missing values. In particular, if we set na.rm equal to TRUE, R will only use the observed values to calculate the mean.

mean(pbc$chol, na.rm = TRUE)
## [1] 369.5106

What is the percentage of females?
We can use the function percent() to answer this question. The package memisc should be loaded first.

percent(pbc$sex)
##         m         f         N 
##  10.52632  89.47368 418.00000

We obtained some results in R by answering the aforementioned questions. However, we did not save anything.
For example if I type “Hello” in R I get the following:

"Hello"
## [1] "Hello"

In order to obtain this word again, we will have to retype it. An alternative approach is to assign the string “Hello” to a new variable named hi as follows.

hi <- "Hello"

Then we can print this word whenever we type hi.

hi
## [1] "Hello"

Make sure that you have defined the object before you use it.
E.g. number and x will not be found since we did not define them. We can only call them after we have defined them.

# number
number <- 10
number
## [1] 10
# x
x <- 1
x
## [1] 1

Things to remember!

  • = is different from==, e.g. x == 3 is asking a question to R. The single = is equal to <-. 
  • R is sensitive, e.g. pbc$Age will not run because there is a typo
    We need to check the names first using the function names()
names(pbc)
##  [1] "id"       "time"     "status"   "trt"      "age"      "sex"      "ascites"  "hepato"   "spiders" 
## [10] "edema"    "bili"     "chol"     "albumin"  "copper"   "alk.phos" "ast"      "trig"     "platelet"
## [19] "protime"  "stage"

The correct name is age and not Age.

pbc$age
##   [1] 58.76523 56.44627 70.07255 54.74059 38.10541 66.25873 55.53457 53.05681 42.50787 70.55989 53.71389
##  [12] 59.13758 45.68925 56.22177 64.64613 40.44353 52.18344 53.93018 49.56057 59.95346 64.18891 56.27652
##  [23] 55.96715 44.52019 45.07324 52.02464 54.43943 44.94730 63.87680 41.38535 41.55236 53.99589 51.28268
##  [34] 52.06023 48.61875 56.41068 61.72758 36.62697 55.39220 46.66940 33.63450 33.69473 48.87064 37.58248
##  [45] 41.79329 45.79877 47.42779 49.13621 61.15264 53.50856 52.08761 50.54073 67.40862 39.19781 65.76318
##  [56] 33.61807 53.57153 44.56947 40.39425 58.38193 43.89870 60.70637 46.62834 62.90760 40.20260 46.45311
##  [67] 51.28816 32.61328 49.33881 56.39973 48.84600 32.49281 38.49418 51.92060 43.51814 51.94251 49.82615
##  [78] 47.94524 46.51608 67.41136 63.26352 67.31006 56.01369 55.83025 47.21697 52.75838 37.27858 41.39357
##  [89] 52.44353 33.47570 45.60712 76.70910 36.53388 53.91650 46.39014 48.84600 71.89322 28.88433 48.46817
## [100] 51.46886 44.95003 56.56947 48.96372 43.01711 34.03970 68.50924 62.52156 50.35729 44.06297 38.91034
## [111] 41.15264 55.45791 51.23340 52.82683 42.63929 61.07050 49.65640 48.85421 54.25599 35.15127 67.90691
## [122] 55.43600 45.82067 52.88980 47.18138 53.59890 44.10404 41.94935 63.61396 44.22724 62.00137 40.55305
## [133] 62.64476 42.33539 42.96783 55.96167 62.86105 51.24983 46.76249 54.07529 47.03628 55.72621 46.10267
## [144] 52.28747 51.20055 33.86448 75.01164 30.86379 61.80424 34.98700 55.04175 69.94114 49.60438 69.37714
## [155] 43.55647 59.40862 48.75838 36.49281 45.76044 57.37166 42.74333 58.81725 53.49760 43.41410 53.30595
## [166] 41.35524 60.95825 47.75359 35.49076 48.66256 52.66804 49.86995 30.27515 55.56742 52.15332 41.60986
## [177] 55.45243 70.00411 43.94251 42.56810 44.56947 56.94456 40.26010 37.60712 48.36140 70.83641 35.79192
## [188] 62.62286 50.64750 54.52704 52.69268 52.72005 56.77207 44.39699 29.55510 57.04038 44.62697 35.79740
## [199] 40.71732 32.23272 41.09240 61.63997 37.05681 62.57906 48.97741 61.99042 72.77207 61.29500 52.62423
## [210] 49.76318 52.91444 47.26352 50.20397 69.34702 41.16906 59.16496 36.07940 34.59548 42.71321 63.63039
## [221] 56.62971 46.26420 61.24298 38.62012 38.77070 56.69541 58.95140 36.92266 62.41478 34.60917 58.33539
## [232] 50.18207 42.68583 34.37919 33.18275 38.38193 59.76181 66.41205 46.78987 56.07940 41.37440 64.57221
## [243] 67.48802 44.82957 45.77139 32.95003 41.22108 55.41684 47.98084 40.79124 56.97467 68.46270 78.43943
## [254] 39.85763 35.31006 31.44422 58.26420 51.48802 59.96988 74.52430 52.36413 42.78713 34.87474 44.13963
## [265] 46.38193 56.30938 70.90760 55.39493 45.08419 26.27789 50.47228 38.39836 47.41958 47.98084 38.31622
## [276] 50.10815 35.08830 32.50376 56.15332 46.15469 65.88364 33.94387 62.86105 48.56400 46.34908 38.85284
## [287] 58.64750 48.93634 67.57290 65.98494 40.90075 50.24504 57.19644 60.53662 35.35113 31.38125 55.98631
## [298] 52.72553 38.09172 58.17112 45.21013 37.79877 60.65982 35.53457 43.06639 56.39151 30.57358 61.18275
## [309] 58.29979 62.33265 37.99863 33.15264 60.00000 64.99932 54.00137 75.00068 62.00137 43.00068 46.00137
## [320] 44.00000 60.99932 64.00000 40.00000 63.00068 34.00137 52.00000 48.99932 54.00137 63.00068 54.00137
## [331] 46.00137 52.99932 56.00000 56.00000 55.00068 64.99932 56.00000 47.00068 60.00000 52.99932 54.00137
## [342] 50.00137 48.00000 36.00000 48.00000 70.00137 51.00068 52.00000 54.00137 48.00000 66.00137 52.99932
## [353] 62.00137 59.00068 39.00068 67.00068 58.00137 64.00000 46.00137 64.00000 40.99932 48.99932 44.00000
## [364] 59.00068 63.00068 60.99932 64.00000 48.99932 42.00137 50.00137 51.00068 36.99932 62.00137 51.00068
## [375] 52.00000 44.00000 32.99932 60.00000 63.00068 32.99932 40.99932 51.00068 36.99932 59.00068 55.00068
## [386] 54.00137 48.99932 40.00000 67.00068 68.00000 40.99932 68.99932 52.00000 56.99932 36.00000 50.00137
## [397] 64.00000 62.00137 42.00137 44.00000 68.99932 52.00000 66.00137 40.00000 52.00000 46.00137 54.00137
## [408] 51.00068 43.00068 39.00068 51.00068 67.00068 35.00068 67.00068 39.00068 56.99932 58.00137 52.99932

Data checks

Check if an object consists of missing values. To do that we can use the function is.na().

is.na(x)
## [1] FALSE

The function head() can be used in order to print the first 6 elements of an object.

head(is.na(pbc))
##         id  time status   trt   age   sex ascites hepato spiders edema  bili  chol albumin copper
## [1,] FALSE FALSE  FALSE FALSE FALSE FALSE   FALSE  FALSE   FALSE FALSE FALSE FALSE   FALSE  FALSE
## [2,] FALSE FALSE  FALSE FALSE FALSE FALSE   FALSE  FALSE   FALSE FALSE FALSE FALSE   FALSE  FALSE
## [3,] FALSE FALSE  FALSE FALSE FALSE FALSE   FALSE  FALSE   FALSE FALSE FALSE FALSE   FALSE  FALSE
## [4,] FALSE FALSE  FALSE FALSE FALSE FALSE   FALSE  FALSE   FALSE FALSE FALSE FALSE   FALSE  FALSE
## [5,] FALSE FALSE  FALSE FALSE FALSE FALSE   FALSE  FALSE   FALSE FALSE FALSE FALSE   FALSE  FALSE
## [6,] FALSE FALSE  FALSE FALSE FALSE FALSE   FALSE  FALSE   FALSE FALSE FALSE FALSE   FALSE  FALSE
##      alk.phos   ast  trig platelet protime stage
## [1,]    FALSE FALSE FALSE    FALSE   FALSE FALSE
## [2,]    FALSE FALSE FALSE    FALSE   FALSE FALSE
## [3,]    FALSE FALSE FALSE    FALSE   FALSE FALSE
## [4,]    FALSE FALSE FALSE    FALSE   FALSE FALSE
## [5,]    FALSE FALSE FALSE    FALSE   FALSE FALSE
## [6,]    FALSE FALSE FALSE     TRUE   FALSE FALSE

In order to get a summary of the missing values, use the function table().

table(is.na(pbc))
## 
## FALSE  TRUE 
##  7327  1033
is.na(pbc$age)
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [417] FALSE FALSE
table(is.na(pbc$age)) 
## 
## FALSE 
##   418

Use the is.infinite() function to Check for infinity data.

is.infinite(pbc$age)
##   [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
##  [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [417] FALSE FALSE