# make data:
x <- rnorm(250, mean = 20.5, sd = 4)
Test if the mean of x
is equal to 0:
t.test(x)
##
## One Sample t-test
##
## data: x
## t = 79.583, df = 249, p-value < 2.2e-16
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## 19.97675 20.99062
## sample estimates:
## mean of x
## 20.48368
Test if the mean of x
is equal to 21:
t.test(x, mu = 21)
##
## One Sample t-test
##
## data: x
## t = -2.006, df = 249, p-value = 0.04594
## alternative hypothesis: true mean is not equal to 21
## 95 percent confidence interval:
## 19.97675 20.99062
## sample estimates:
## mean of x
## 20.48368
By default, a two-sided test is performed. To do a one-sided test, the argument alternative
can be set to less
or greater
:
t.test(x, mu = 21, alternative = 'less')
##
## One Sample t-test
##
## data: x
## t = -2.006, df = 249, p-value = 0.02297
## alternative hypothesis: true mean is less than 21
## 95 percent confidence interval:
## -Inf 20.90863
## sample estimates:
## mean of x
## 20.48368
Note that the lower limit of the confidence interval became -Inf
, because we look at a one-sided test, and that the upper bound reduced compared to the two sided-test. The output we see is produced by the print()
function that is automatically called. To access the elements of the test results separately:
testres <- t.test(x, mu = 21, alternative = 'less', conf.level = 0.975)
names(testres)
## [1] "statistic" "parameter" "p.value" "conf.int" "estimate" "null.value" "stderr"
## [8] "alternative" "method" "data.name"
class(testres)
## [1] "htest"
is.list(testres)
## [1] TRUE
This means we can treat testres
as a list
:
testres$p.value
## [1] 0.02296972
testres$conf.int
## [1] -Inf 20.99062
## attr(,"conf.level")
## [1] 0.975
y <- rnorm(250, mean = 20, sd = 2)
By default, the test assumes that the two samples have different variances and that the samples are independent. How can you know this? It is written in the help file!
t.test(x, y)
##
## Welch Two Sample t-test
##
## data: x and y
## t = 1.6151, df = 373.63, p-value = 0.1071
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1017967 1.0381395
## sample estimates:
## mean of x mean of y
## 20.48368 20.01551
t.test(x, y, var.equal = TRUE)
##
## Two Sample t-test
##
## data: x and y
## t = 1.6151, df = 498, p-value = 0.1069
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1013345 1.0376773
## sample estimates:
## mean of x mean of y
## 20.48368 20.01551
A t-test for a paired sample can be performed by setting the argument paired = TRUE
:
t.test(x, y, paired = TRUE)
##
## Paired t-test
##
## data: x and y
## t = 1.6089, df = 249, p-value = 0.1089
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1049527 1.0412955
## sample estimates:
## mean of the differences
## 0.4681714
Again, we can change mean
to test if the difference is different from that value instead of testing for a difference equal to zero. This is equivalent to performing a one-sample t-test of the differences x - y
:
t.test(x - y)
##
## One Sample t-test
##
## data: x - y
## t = 1.6089, df = 249, p-value = 0.1089
## alternative hypothesis: true mean is not equal to 0
## 95 percent confidence interval:
## -0.1049527 1.0412955
## sample estimates:
## mean of x
## 0.4681714
It is also possible to specify the test using a formula. This can be more convenient when we have the data in a data.frame
dat <- data.frame(value = c(x, y),
group = rep(c('x', 'y'), c(length(x), length(y))))
summary(dat)
## value group
## Min. : 7.567 Length:500
## 1st Qu.:18.374 Class :character
## Median :20.069 Mode :character
## Mean :20.250
## 3rd Qu.:22.118
## Max. :32.675
t.test(value ~ group, data = dat)
##
## Welch Two Sample t-test
##
## data: value by group
## t = 1.6151, df = 373.63, p-value = 0.1071
## alternative hypothesis: true difference in means between group x and group y is not equal to 0
## 95 percent confidence interval:
## -0.1017967 1.0381395
## sample estimates:
## mean in group x mean in group y
## 20.48368 20.01551
or
t.test(value ~ group, data = dat, paired = TRUE)
##
## Paired t-test
##
## data: value by group
## t = 1.6089, df = 249, p-value = 0.1089
## alternative hypothesis: true difference in means is not equal to 0
## 95 percent confidence interval:
## -0.1049527 1.0412955
## sample estimates:
## mean of the differences
## 0.4681714
wilcox.test(x, mu = 20)
##
## Wilcoxon signed rank test with continuity correction
##
## data: x
## V = 17470, p-value = 0.1195
## alternative hypothesis: true location is not equal to 20
The function wilcox.test()
has similar arguments as t.test()
(x
, y
, alternative
, mu
, paired
, conf.level
). Note that confidence intervals are only returned if conf.int = TRUE
:
wilcox.test(x, conf.int = TRUE, mu = 20)
##
## Wilcoxon signed rank test with continuity correction
##
## data: x
## V = 17470, p-value = 0.1195
## alternative hypothesis: true location is not equal to 20
## 95 percent confidence interval:
## 19.89621 20.89644
## sample estimates:
## (pseudo)median
## 20.38424
The additional argument exact
controls if exact p-values and confidence intervals are calculated or if the normal approximation is used. In the latter case, the argument correct
determines if a continuity correction is applied.
wilcox.test(x, conf.int = TRUE, exact = TRUE, mu = 20)
##
## Wilcoxon signed rank exact test
##
## data: x
## V = 17470, p-value = 0.1196
## alternative hypothesis: true location is not equal to 20
## 95 percent confidence interval:
## 19.89624 20.89541
## sample estimates:
## (pseudo)median
## 20.38429
wilcox.test(x, y, correct = TRUE, conf.int = TRUE)
##
## Wilcoxon rank sum test with continuity correction
##
## data: x and y
## W = 33034, p-value = 0.2696
## alternative hypothesis: true location shift is not equal to 0
## 95 percent confidence interval:
## -0.2416028 0.8716494
## sample estimates:
## difference in location
## 0.3054734
Like for the t.test()
, we can also use a formula specification and the test output is of the same class, i.e., the p-value etc. can be accessed like elements of a list.
This test is an extension to the Wilcoxon rank sum test to more than two groups We can provide the data as a list()
:
x <- list(x1 = rnorm(100, mean = 12, sd = 1.5),
x2 = rnorm(100, mean = 11, sd = 2),
x3 = rnorm(100, mean = 11.5, sd = 1.8))
kruskal.test(x)
##
## Kruskal-Wallis rank sum test
##
## data: x
## Kruskal-Wallis chi-squared = 16.595, df = 2, p-value = 0.0002492
or as a data.frame
with one variable per column:
kruskal.test(as.data.frame(x))
##
## Kruskal-Wallis rank sum test
##
## data: as.data.frame(x)
## Kruskal-Wallis chi-squared = 16.595, df = 2, p-value = 0.0002492
or using the formula specification
dat <- data.frame(x = c(x$x1, x$x2, x$x3),
group = rep(c('x1', 'x2', 'x3'), each = 100))
head(dat)
## x group
## 1 13.47717 x1
## 2 10.16289 x1
## 3 13.06459 x1
## 4 11.83617 x1
## 5 14.67391 x1
## 6 11.63483 x1
kruskal.test(x ~ group, data = dat)
##
## Kruskal-Wallis rank sum test
##
## data: x by group
## Kruskal-Wallis chi-squared = 16.595, df = 2, p-value = 0.0002492
While we can calculate a correlation matrix for a whole set of variables, the function cor.test()
can only handle two variables at a time:
cor.test(x = swiss$Fertility, y = swiss$Agriculture)
##
## Pearson's product-moment correlation
##
## data: swiss$Fertility and swiss$Agriculture
## t = 2.5316, df = 45, p-value = 0.01492
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07334947 0.58130587
## sample estimates:
## cor
## 0.3530792
Using the argument method
we can choose between
pearson
kendall
spearman
Other arguments we know already from other tests are
alternative
conf.level
exact
(only for kendall
and spearman
)We additionally have the argument continuity
that allows us to use continuity correction for method = "kendall"
and method = "spearman"
.
cor.test(x = swiss$Fertility, y = swiss$Agriculture, method = "kendall",
exact = FALSE, continuity = TRUE)
##
## Kendall's rank correlation tau
##
## data: swiss$Fertility and swiss$Agriculture
## z = 1.77, p-value = 0.07673
## alternative hypothesis: true tau is not equal to 0
## sample estimates:
## tau
## 0.1795465
It is possible to use a formula
specification:
cor.test(~ Fertility + Agriculture, data = swiss)
##
## Pearson's product-moment correlation
##
## data: Fertility and Agriculture
## t = 2.5316, df = 45, p-value = 0.01492
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## 0.07334947 0.58130587
## sample estimates:
## cor
## 0.3530792
This allows us to use the argument subset
to select only part of the data:
cor.test(~ Fertility + Agriculture, data = swiss,
subset = Infant.Mortality > 18)
##
## Pearson's product-moment correlation
##
## data: Fertility and Agriculture
## t = 1.2838, df = 34, p-value = 0.2079
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.122145 0.507691
## sample estimates:
## cor
## 0.2150193
Testing multiple hypotheses at a significance level will result in an overall chance of at least one false positive result that is larger than that significance level. We then need to correct for multiple testing.
This can be done easily using the function p.adjust()
:
pval1 <- t.test(swiss$Fertility, mu = 50)$p.value
pval2 <- t.test(swiss$Agriculture, mu = 50)$p.value
pval3 <- t.test(swiss$Examination, mu = 30)$p.value
# combine the unadjusted p-values in one vector
(pvals <- c(Fert = pval1, Agri = pval2, Exam = pval3))
## Fert Agri Exam
## 1.525636e-14 8.430616e-01 2.864543e-15
# Adjustment using the Bonferroni method:
p_adj_Bonf <- p.adjust(pvals, method = 'bonferroni')
# Adjustment using the Benjamini-Hochberg method
p_adj_BH <- p.adjust(pvals, method = "BH")
cbind(pvals, p_adj_Bonf, p_adj_BH)
## pvals p_adj_Bonf p_adj_BH
## Fert 1.525636e-14 4.576907e-14 2.288453e-14
## Agri 8.430616e-01 1.000000e+00 8.430616e-01
## Exam 2.864543e-15 8.593628e-15 8.593628e-15
The available correction methods are available in the vector p.adjust.methods
:
p.adjust.methods
## [1] "holm" "hochberg" "hommel" "bonferroni" "BH" "BY" "fdr"
## [8] "none"