Bartlett’s Test in R Programming
Last Updated :
25 Aug, 2020
In statistics, Bartlett's test is used to test if k samples are from populations with equal variances. Equal variances across populations are called homoscedasticity or homogeneity of variances. Some statistical tests, for example, the ANOVA test, assume that variances are equal across groups or samples. The Bartlett test can be used to verify that assumption. Bartlett’s test enables us to compare the variance of two or more samples to decide whether they are drawn from populations with equal variance. It is fitting for normally distributed data. There are several solutions to test for the equality (homogeneity) of variance across groups, including:
- F-test
- Bartlett’s test
- Levene’s test
- Fligner-Killeen test
It is very much easy to perform these tests in R programming. In this article let's perform Bartlett’s test in R.
Statistical Hypotheses for Bartlett’s test
A hypothesis is a statement about the given problem. Hypothesis testing is a statistical method that is used in making a statistical decision using experimental data. Hypothesis testing is basically an assumption that we make about a population parameter. It evaluates two mutually exclusive statements about a population to determine which statement is best supported by the sample data. To know more about the statistical hypothesis please refer to Understanding Hypothesis Testing. For Bartlett’s test the statistical hypotheses are:
- Null Hypothesis: all populations variances are equal
- Alternative Hypothesis: At least two of them differ
Implementation in R
R provides a function bartlett.test() which is available in stats package can be used to compute Barlett’s test. The syntax for this function is given below:
Syntax:
bartlett.test(formula, dataset)
Parameters:
formula: a formula of the form values ~ groups
dataset: a matrix or data frame
Returns:
statistic: Bartlett’s K-squared test statistic
parameter: the degrees of freedom of the approximate chi-squared distribution of the test statistic.
p.value: the p-value of the test
There may arise two cases depending upon the format of data. And we have to apply the different formulas for these two different formats of data.
If data is in the stacked form: Data is in stacked form means the values for both samples stored in one variable, so in this case, use the following command:
bartlett.test(values ~ groups, dataset)
where:
values: the name of the variable containing the data values
groups: the name of the variable that specifies which sample each value belongs too
If data is in the unstacked form: Data is in unstacked form means the samples stored in a separate variable, so in this case, nest the variable names inside the list() function as shown below:
bartlett.test(list(dataset$sample1, dataset$sample2, dataset$sample3))
Examples for Bartlett's test
Bartlett’s test with one independent variable:
Consider the R's inbuilt PlantGrowth dataset that gives the dried weight of three groups of ten batches of plants, wherever every group of ten batches got a different treatment. The weight variable gives the weight of the batch and the group variable gives the treatment received either ctrl, trt1 or trt2. To view the data set please type below command:
R
Output:
weight group
1 4.17 ctrl
2 5.58 ctrl
3 5.18 ctrl
4 6.11 ctrl
5 4.50 ctrl
6 4.61 ctrl
7 5.17 ctrl
8 4.53 ctrl
9 5.33 ctrl
10 5.14 ctrl
11 4.81 trt1
12 4.17 trt1
13 4.41 trt1
14 3.59 trt1
15 5.87 trt1
16 3.83 trt1
17 6.03 trt1
18 4.89 trt1
19 4.32 trt1
20 4.69 trt1
21 6.31 trt2
22 5.12 trt2
23 5.54 trt2
24 5.50 trt2
25 5.37 trt2
26 5.29 trt2
27 4.92 trt2
28 6.15 trt2
29 5.80 trt2
30 5.26 trt2
Suppose one wants to use Bartlett’s test to determine whether the variance in weight is the same for all treatment groups at a significance level of 0.05. Here let's consider only one independent variable. To perform the test, use the below command:
R
# R program to illustrate
# Bartlett’s test
# Using bartlett.test()
result = bartlett.test(weight~group, PlantGrowth)
# print the result
print(result)
Output:
Bartlett test of homogeneity of variances
data: weight by group
Bartlett's K-squared = 2.8786, df = 2, p-value = 0.2371
Explanation:
From the output, it can be seen that the p-value of 0.2371 is not less than the significance level of 0.05. This means the null hypothesis can not be rejected that the variance is the same for all treatment groups. This concludes that there is no proof to recommend that the variance in plant growth is different for the three treatment groups.
Bartlett’s test with multiple independent variables:
If one wants to do the test with multiple independent variables then the interaction() function must be used to collapse multiple factors into a single variable containing all combinations of the factors. Here let's take the R's inbuilt ToothGrowth data set.
R
# R program to illustrate
# Bartlett’s test
# Print the first 10 rows
# of the data set
print(head(ToothGrowth, 10))
# Applying bartlett.test()
result = bartlett.test(len ~ interaction(supp, dose),
data = ToothGrowth)
# Print the result
print(result)
Output:
len supp dose
1 4.2 VC 0.5
2 11.5 VC 0.5
3 7.3 VC 0.5
4 5.8 VC 0.5
5 6.4 VC 0.5
6 10.0 VC 0.5
7 11.2 VC 0.5
8 11.2 VC 0.5
9 5.2 VC 0.5
10 7.0 VC 0.5
Bartlett test of homogeneity of variances
data: len by interaction(supp, dose)
Bartlett's K-squared = 6.9273, df = 5, p-value = 0.2261
Similar Reads
Fisherâs F-Test in R Programming
In this article, we will delve into the fundamental concepts of the F-Test, its applications, assumptions, and how to perform it using R programming. We will also provide a step-by-step guide with examples and visualizations to help you master the F-Test in the R Programming Language.What is Fisherâ
4 min read
T-Test Approach in R Programming
The T-Test is a statistical method used to determine whether there is a significant difference between the means of two groups or between a sample and a known value.For Example: businessman who owns two sweet shops in a town. He wants to know if there's a significant difference in the average number
5 min read
Leveneâs Test in R Programming
Levene's test is an inferential statistic used to assess whether the variances of a variable are equal across two or more groups, especially when the data comes from a non-normal distribution. This test checks the assumption of homoscedasticity (equal variances) before conducting tests like ANOVA. I
3 min read
Hypothesis Testing in R Programming
A hypothesis is made by the researchers about the data collected for any experiment or data set. A hypothesis is an assumption made by the researchers that are not mandatory true. In simple words, a hypothesis is a decision taken by the researchers based on the data of the population collected. Hypo
6 min read
Basic Syntax in R Programming
R is the most popular language used for Statistical Computing and Data Analysis with the support of over 10, 000+ free packages in CRAN repository. Like any other programming language, R has a specific syntax which is important to understand if you want to make use of its powerful features. This art
3 min read
Control Statements in R Programming
Control statements are expressions used to control the execution and flow of the program based on the conditions provided in the statements. These structures are used to make a decision after assessing the variable. In this article, we'll discuss all the control statements with the examples. In R pr
4 min read
ShapiroâWilk Test in R Programming
The Shapiro-Wilk's test or Shapiro test is a normality test in frequentist statistics. The null hypothesis of Shapiro's test is that the population is distributed normally. It is among the three tests for normality designed for detecting all kinds of departure from normality. If the value of p is eq
4 min read
Fligner-Killeen Test in R Programming
The Fligner-Killeen test is a non-parametric test for homogeneity of group variances based on ranks. It is useful when the data are non-normally distributed or when problems related to outliers in the dataset cannot be resolved. It is also one of the many tests for homogeneity of variances which is
3 min read
Spearman Correlation Testing in R Programming
Correlation is a key statistical concept used to measure the strength and direction of the relationship between two variables. Unlike Pearsonâs correlation, which assumes a linear relationship and continuous data, Spearmanâs rank correlation coefficient is a non-parametric measure that assesses how
3 min read
Contingency Tables in R Programming
Prerequisite: Data Structures in R ProgrammingContingency tables are very useful to condense a large number of observations into smaller to make it easier to maintain tables. A contingency table shows the distribution of a variable in the rows and another in its columns. Contingency tables are not o
6 min read