If you are using the package for the first time, you will first have to install it.
# install.packages("survival")
# install.packages("memisc")
If you have already downloaded this package in the current version of R, you will only have to load the package.
library(survival)
library(memisc)
Load a data set from a package.
You can use the double colon symbol (:), to return the pbc object from the package survival. We store this data set to an object with the name pbc.
pbc <- survival::pbc
What is the mean and standard deviation for the variable age
of the pbc data set?
mean(pbc$age)
## [1] 50.74155
mean(pbc$age, na.rm = TRUE)
## [1] 50.74155
sd(pbc$age)
## [1] 10.44721
What is the mean and standard deviation for the variable time
of the pbc data set?
mean(pbc$time)
## [1] 1917.782
mean(pbc$time, na.rm = TRUE)
## [1] 1917.782
sd(pbc$time)
## [1] 1104.673
What is the median and interquartile range for the variable age
of the pbc data set?
median(pbc$age)
## [1] 51.00068
IQR(pbc$age)
## [1] 15.40862
What is the percentage of placebo
and treatment
patients in the pbc data set?
In order to use the percent()
function you will need to load the memisc
package.
percent(pbc$trt)
## 1 2 N
## 50.64103 49.35897 312.00000
What is the percentage of females
and males
in the pbc data set?
percent(pbc$sex)
## m f N
## 10.52632 89.47368 418.00000
What is the mean and standard deviation for the variable age
of the pbc data set for males
?
First we can create a new vector with the name pbc_males
the male
patients.
pbc_males <- pbc[pbc$sex == "m", ]
Then we can obtain the mean and standard deviation of this new vector.
mean(pbc_males$age)
## [1] 55.71072
sd(pbc_males$age)
## [1] 10.9778
What is the mean and standard deviation for serum bilirubin
of the pbc data set?
mean(pbc$bili)
## [1] 3.220813
sd(pbc$bili)
## [1] 4.407506
Check if there are any missing values using the is.na()
function.
is.na(pbc$chol)
## [1] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [17] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [33] FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE FALSE FALSE TRUE FALSE FALSE FALSE
## [49] TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [65] FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [81] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE
## [97] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [113] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE
## [129] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [145] FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [161] FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE TRUE FALSE TRUE
## [177] FALSE TRUE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE
## [193] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE
## [209] FALSE FALSE TRUE FALSE FALSE FALSE FALSE TRUE FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE
## [225] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [241] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [257] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [273] FALSE TRUE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE
## [289] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE FALSE FALSE FALSE FALSE
## [305] FALSE FALSE FALSE FALSE FALSE FALSE FALSE FALSE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [321] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [337] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [353] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [369] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [385] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [401] TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE TRUE
## [417] TRUE TRUE
Obtain complete cases for the variable serum cholesterol
of the pbc data set. We can use the function complete.cases()
in the row index, to select the non-empty rows.
DF <- pbc[complete.cases(pbc$chol), ]
Obtain the dimensions of a matrix or data frame. We can use the function dim()
.
dim(pbc)
## [1] 418 20
dim(DF)
## [1] 284 20
Obtain the rows of the pbc
data set where the serum cholesterol
variable has missing values.
We can use a logical vector in the row index to specify whether the particular row is missing.
pbc_chol_na <- pbc[is.na(pbc$chol) == TRUE, ]
head(pbc_chol_na)
## id time status trt age sex ascites hepato spiders edema bili chol albumin copper alk.phos ast
## 14 14 1217 2 2 56.22177 m 1 1 0 1 0.8 NA 2.27 43 728.0 71.00
## 40 40 4467 0 1 46.66940 f 0 0 0 0 1.3 NA 3.34 105 11046.6 104.49
## 41 41 1350 2 1 33.63450 f 0 1 0 0 6.8 NA 3.26 96 1215.0 151.90
## 42 42 4453 0 2 33.69473 f 0 1 1 0 2.1 NA 3.54 122 8778.0 56.76
## 45 45 4025 0 2 41.79329 f 0 0 0 0 0.6 NA 3.93 19 1826.0 71.30
## 49 49 708 2 2 61.15264 f 0 1 0 0 0.8 NA 3.82 58 678.0 97.65
## trig platelet protime stage
## 14 NA 156 11.0 4
## 40 NA 358 11.0 4
## 41 NA 226 11.7 4
## 42 NA 344 11.0 4
## 45 NA 474 10.9 2
## 49 NA 233 11.0 4
Obtain the dimensions of the pbc
data set where the serum cholesterol
variable has missing.
dim(pbc_chol_na)
## [1] 134 20
Outliers: e.g. let’s assume that patients with serum bilirun
values > 25 are outliers.
serum bilirun
outliers.pbc_out_bili <- pbc[pbc$bili > 25, ]
head(pbc_out_bili)
## id time status trt age sex ascites hepato spiders edema bili chol albumin copper alk.phos
## 144 144 943 2 2 52.28747 f 0 1 0 0.5 28.0 556 3.26 152 3896
## 156 156 853 2 2 59.40862 f 0 1 0 0.0 25.5 358 3.52 219 2468
## ast trig platelet protime stage
## 144 198.4 171 335 10.0 3
## 156 201.5 205 151 11.5 2