If you are using the package for the first time, you will first have to install it.
# install.packages("survival")
# install.packages("memisc")
If you have already downloaded these packages in the current version of R, you will only have to load the packages.
library(survival)
library(memisc)
Load a data set from a package.
You can use the double colon symbol (:), to return the pbc object from the package survival. We store this data set to an object with the name pbc.
pbc <- survival::pbc
Print the first 6 rows of the data set using the function head()
.
head(pbc)
View the data set.
view(pbc)
What is the average age
? To obtain that we can use the function mean()
.
mean(pbc$age)
## [1] 50.74155
What is the average serum bilirubin
?
mean(pbc$bili)
## [1] 3.220813
What is the average serum cholesterol
?
mean(pbc$chol)
## [1] NA
The previous code would not work because we have some missing values in that variable. If we carefully check the help page of the function mean()
, we will see that there is an argument that can handle missing values. In particular, if we set na.rm equal to TRUE, R will only use the observed values to calculate the mean.
mean(pbc$chol, na.rm = TRUE)
## [1] 369.5106
What is the percentage of females
?
We can use the function percent()
to answer this question. The package memisc
should be loaded first.
percent(pbc$sex)
## m f N
## 10.52632 89.47368 418.00000
We obtained some results in R by answering the aforementioned questions. However, we did not save anything.
For example if I type “Hello” in R I get the following:
"Hello"
## [1] "Hello"
In order to obtain this word again, we will have to retype it. An alternative approach is to assign the string “Hello” to a new variable named hi as follows.
hi <- "Hello"
Then we can print this word whenever we type hi.
hi
## [1] "Hello"
Make sure that you have defined the object before you use it.
E.g. number
and x
will not be found since we did not define them. We can only call them after we have defined them.
# number
number <- 10
number
## [1] 10
# x
x <- 1
x
## [1] 1
=
is different from==
, e.g. x == 3
is asking a question to R. The single = is equal to <-. pbc$Age
will not run because there is a typonames()
names(pbc)
## [1] "id" "time" "status" "trt" "age" "sex" "ascites" "hepato" "spiders"
## [10] "edema" "bili" "chol" "albumin" "copper" "alk.phos" "ast" "trig" "platelet"
## [19] "protime" "stage"
The correct name is age and not Age.
pbc$age
## [1] 58.76523 56.44627 70.07255 54.74059 38.10541 66.25873 55.53457 53.05681 42.50787 70.55989 53.71389
## [12] 59.13758 45.68925 56.22177 64.64613 40.44353 52.18344 53.93018 49.56057 59.95346 64.18891 56.27652
## [23] 55.96715 44.52019 45.07324 52.02464 54.43943 44.94730 63.87680 41.38535 41.55236 53.99589 51.28268
## [34] 52.06023 48.61875 56.41068 61.72758 36.62697 55.39220 46.66940 33.63450 33.69473 48.87064 37.58248
## [45] 41.79329 45.79877 47.42779 49.13621 61.15264 53.50856 52.08761 50.54073 67.40862 39.19781 65.76318
## [56] 33.61807 53.57153 44.56947 40.39425 58.38193 43.89870 60.70637 46.62834 62.90760 40.20260 46.45311
## [67] 51.28816 32.61328 49.33881 56.39973 48.84600 32.49281 38.49418 51.92060 43.51814 51.94251 49.82615
## [78] 47.94524 46.51608 67.41136 63.26352 67.31006 56.01369 55.83025 47.21697 52.75838 37.27858 41.39357
## [89] 52.44353 33.47570 45.60712 76.70910 36.53388 53.91650 46.39014 48.84600 71.89322 28.88433 48.46817
## [100] 51.46886 44.95003 56.56947 48.96372 43.01711 34.03970 68.50924 62.52156 50.35729 44.06297 38.91034
## [111] 41.15264 55.45791 51.23340 52.82683 42.63929 61.07050 49.65640 48.85421 54.25599 35.15127 67.90691
## [122] 55.43600 45.82067 52.88980 47.18138 53.59890 44.10404 41.94935 63.61396 44.22724 62.00137 40.55305
## [133] 62.64476 42.33539 42.96783 55.96167 62.86105 51.24983 46.76249 54.07529 47.03628 55.72621 46.10267
## [144] 52.28747 51.20055 33.86448 75.01164 30.86379 61.80424 34.98700 55.04175 69.94114 49.60438 69.37714
## [155] 43.55647 59.40862 48.75838 36.49281 45.76044 57.37166 42.74333 58.81725 53.49760 43.41410 53.30595
## [166] 41.35524 60.95825 47.75359 35.49076 48.66256 52.66804 49.86995 30.27515 55.56742 52.15332 41.60986
## [177] 55.45243 70.00411 43.94251 42.56810 44.56947 56.94456 40.26010 37.60712 48.36140 70.83641 35.79192
## [188] 62.62286 50.64750 54.52704 52.69268 52.72005 56.77207 44.39699 29.55510 57.04038 44.62697 35.79740
## [199] 40.71732 32.23272 41.09240 61.63997 37.05681 62.57906 48.97741 61.99042 72.77207 61.29500 52.62423
## [210] 49.76318 52.91444 47.26352 50.20397 69.34702 41.16906 59.16496 36.07940 34.59548 42.71321 63.63039
## [221] 56.62971 46.26420 61.24298 38.62012 38.77070 56.69541 58.95140 36.92266 62.41478 34.60917 58.33539
## [232] 50.18207 42.68583 34.37919 33.18275 38.38193 59.76181 66.41205 46.78987 56.07940 41.37440 64.57221
## [243] 67.48802 44.82957 45.77139 32.95003 41.22108 55.41684 47.98084 40.79124 56.97467 68.46270 78.43943
## [254] 39.85763 35.31006 31.44422 58.26420 51.48802 59.96988 74.52430 52.36413 42.78713 34.87474 44.13963
## [265] 46.38193 56.30938 70.90760 55.39493 45.08419 26.27789 50.47228 38.39836 47.41958 47.98084 38.31622
## [276] 50.10815 35.08830 32.50376 56.15332 46.15469 65.88364 33.94387 62.86105 48.56400 46.34908 38.85284
## [287] 58.64750 48.93634 67.57290 65.98494 40.90075 50.24504 57.19644 60.53662 35.35113 31.38125 55.98631
## [298] 52.72553 38.09172 58.17112 45.21013 37.79877 60.65982 35.53457 43.06639 56.39151 30.57358 61.18275
## [309] 58.29979 62.33265 37.99863 33.15264 60.00000 64.99932 54.00137 75.00068 62.00137 43.00068 46.00137
## [320] 44.00000 60.99932 64.00000 40.00000 63.00068 34.00137 52.00000 48.99932 54.00137 63.00068 54.00137
## [331] 46.00137 52.99932 56.00000 56.00000 55.00068 64.99932 56.00000 47.00068 60.00000 52.99932 54.00137
## [342] 50.00137 48.00000 36.00000 48.00000 70.00137 51.00068 52.00000 54.00137 48.00000 66.00137 52.99932
## [353] 62.00137 59.00068 39.00068 67.00068 58.00137 64.00000 46.00137 64.00000 40.99932 48.99932 44.00000
## [364] 59.00068 63.00068 60.99932 64.00000 48.99932 42.00137 50.00137 51.00068 36.99932 62.00137 51.00068
## [375] 52.00000 44.00000 32.99932 60.00000 63.00068 32.99932 40.99932 51.00068 36.99932 59.00068 55.00068
## [386] 54.00137 48.99932 40.00000 67.00068 68.00000 40.99932 68.99932 52.00000 56.99932 36.00000 50.00137
## [397] 64.00000 62.00137 42.00137 44.00000 68.99932 52.00000 66.00137 40.00000 52.00000 46.00137 54.00137
## [408] 51.00068 43.00068 39.00068 51.00068 67.00068 35.00068 67.00068 39.00068 56.99932 58.00137 52.99932
Check if an object consists of missing values. To do that we can use the function is.na()
.
is.na(x)
## [1] FALSE
The function head()
can be used in order to print the first 6 elements of an object.
head(is.na(pbc))
## id time status trt age sex ascites hepato spiders edema bili chol albumin copper
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## alk.phos ast trig platelet protime stage
## [1,] FALSE FALSE FALSE FALSE FALSE FALSE
## [2,] FALSE FALSE FALSE FALSE FALSE FALSE
## [3,] FALSE FALSE FALSE FALSE FALSE FALSE
## [4,] FALSE FALSE FALSE FALSE FALSE FALSE
## [5,] FALSE FALSE FALSE FALSE FALSE FALSE
## [6,] FALSE FALSE FALSE TRUE FALSE FALSE
In order to get a summary of the missing values, use the function table()
.
table(is.na(pbc))
##
## FALSE TRUE
## 7327 1033
is.na(pbc$age)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [417] FALSE FALSE
table(is.na(pbc$age))
##
## FALSE
## 418
Use the is.infinite()
function to Check for infinity data.
is.infinite(pbc$age)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [49] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [65] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [161] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [177] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [209] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [273] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [321] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [337] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [353] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [369] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [385] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [401] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [417] FALSE FALSE