How to Perform Dunn’s Test in Python
Last Updated :
29 Mar, 2024
Dunn's test is a statistical procedure used for multiple comparisons following a Kruskal-Wallis test. Here's a breakdown of what it does and when it's used:
Dunn’s Test
Dunn’s Test is used after the Kruskal-Wallis one-way analysis of variance by ranks to identify which groups differ from each other. It determines whether the difference between the medians of various groups is statistically significant. Dunn’s Test adjusts for multiple comparisons, making it suitable for analyzing data with several groups.
Dunn’s Test is a non-parametric statistical test used for comparing multiple groups to each other. It's particularly useful when analyzing data with unequal sample sizes or when the assumption of normality is violated.
What is the Kruskal-Wallis test?
The Kruskal-Wallis test is a non-parametric statistical test used to determine whether there are statistically significant differences between three or more independent groups. If the Kruskal-Wallis test indicates significant differences, Dunn's test can be applied post-hoc to identify which specific pairs of groups differ significantly from each other. Dunn's test is tailored for pairwise comparisons following a significant result in the Kruskal-Wallis test, providing insights into specific group differences.
Key points about Dunn's test
- Purpose: Dunn's test is used to identify which specific groups differ from each other when there are statistically significant differences detected between groups in the omnibus test.
- Non-parametric: Like the Kruskal-Wallis and Friedman tests, Dunn's test is non-parametric, meaning it does not rely on assumptions about the distribution of the data.
- Procedure: Dunn's test calculates pairwise comparisons between all groups using a rank-based approach. It computes the difference in ranks between pairs of groups and adjusts the p-values for multiple comparisons using methods such as the Bonferroni correction.
- Interpretation: If the adjusted p-value for a pairwise comparison is below a predetermined significance level (e.g., 0.05), it indicates that the difference between those two groups is statistically significant.
- Interpretation: If the adjusted p-value for a pairwise comparison is below a predetermined significance level (e.g., 0.05), it indicates that the difference between those two groups is statistically significant.
Overall, Dunn's test provides a valuable tool for identifying specific group differences in situations where traditional parametric tests are not appropriate or when dealing with ranked data. It helps researchers gain deeper insights into the relationships between multiple groups in their data.
In Python, the scikit-posthocs library provides an efficient way to conduct Dunn’s Test. This article will guide you through the process of performing Dunn’s Test in Python, step by step.
Syntax to install posthocs library:
! pip install scikit-posthocs
posthoc_dunn() Function:
Syntax:
scikit_posthocs.posthoc_dunn(a, val_col: str = None, group_col: str = None, p_adjust: str = None, sort: bool = True)
Parameters:
- a : it's an array type object or a dataframe object or series.
- group_col : column of the predictor or the dependent variable
- p_adjust: P values can be adjusted using this method. it's a string type possible values are :
- 'bonferroni'
- hommel
- holm-sidak
- holm
- simes-hochberg and more...
Returns: p-values.
Hypotheses:
This is a hypotheses test and the two hypotheses are as follows:
- Null hypothesis: The given sample have the same median
- Alternative hypothesis: The given sample has a different median.
1. Import Necessary Libraries
Import the required libraries for data manipulation and Dunn’s Test:
Python3
# Importing necessary packages and modules
import pandas as pd
import scikit_posthocs as sp
from sklearn.datasets import load_iris
2. Load Your Dataset
Load your dataset into a pandas DataFrame. Ensure your data is structured appropriately for comparison:
Python3
# Load the dataset
iris_dataset = load_iris(as_frame=True)
dataset = iris_dataset.frame
print(dataset.head())
Output:
sepal length (cm) sepal width (cm) petal length (cm) petal width (cm) \
0 5.1 3.5 1.4 0.2
1 4.9 3.0 1.4 0.2
2 4.7 3.2 1.3 0.2
3 4.6 3.1 1.5 0.2
4 5.0 3.6 1.4 0.2
target
0 0
1 0
2 0
3 0
4 0
3. Prepare Your Data
Extract the data you want to compare. In this example, let's compare sepal widths among different species:
Python3
# Data containing sepal width of the three species
data = [dataset[dataset['target'] == 0]['sepal width (cm)'],
dataset[dataset['target'] == 1]['sepal width (cm)'],
dataset[dataset['target'] == 2]['sepal width (cm)']]
4. Perform Dunn’s Test:
Python3
# Using the posthoc_dunn() function
p_values = sp.posthoc_dunn(data, p_adjust='holm')
print(p_values)
Output:
1 2 3
1 1.000000e+00 2.047087e-14 1.536598e-07
2 2.047087e-14 1.000000e+00 1.580934e-02
3 1.536598e-07 1.580934e-02 1.000000e+00
- For the difference between groups 1 and 2, the adjusted p-value is 2.047087e-14
- For the difference between groups 1 and 3, the adjusted p-value is 1.536598e-07
- For the difference between groups 2 and 3, the adjusted p-value is 1.580934e-02
5. Compare with significance level
Let's assume our significance level is 0.05, So, we will check if p_values less than the chosen significance level indicate statistically significant differences between groups.
Python3
Output:
1 2 3
1 False True True
2 True False True
3 True True False
This indicates that:
- Group 1 is significantly different from Group 2 and Group 3.
- Group 2 is significantly different from Group 1 and Group 3.
- Group 3 is significantly different from Group 1 and Group 2.
Conclusion
Performing Dunn’s Test in Python using the scikit-posthocs library is straightforward and efficient. By following the steps outlined in this article, you can accurately assess the differences between multiple groups in your dataset. Dunn’s Test is a valuable tool for post hoc analysis, providing insights into group comparisons beyond traditional statistical methods.
Similar Reads
How to Perform an F-Test in Python
In statistics, Many tests are used to compare the different samples or groups and draw conclusions about populations. These techniques are commonly known as Statistical Tests or hypothesis Tests. It focuses on analyzing the likelihood or probability of obtaining the observed data that they are rando
10 min read
How to Perform Grubbsâ Test in Python
Prerequisites: Parametric and Non-Parametric Methods, Hypothesis Testing In this article, we will be discussing the different approaches to perform Grubbsâ Test in Python programming language. Grubbsâ Test is also known as the maximum normalized residual test or extreme studentized deviate test is
3 min read
How to Perform a Shapiro-Wilk Test in Python
In this article, we will be looking at the various approaches to perform a Shapiro-wilk test in Python. Shapiro-Wilk test is a test of normality, it determines whether the given sample comes from the normal distribution or not. Shapiro-Wilkâs test or Shapiro test is a normality test in frequentist s
2 min read
How to Perform Runs Test in R
The Runs Test is a simple statistical method used to analyze the randomness of a sequence of data points. It helps determine if the data fluctuates randomly or if there are systematic patterns or trends present. The test is used in quality control, finance, and other fields where randomness or indep
5 min read
How to Perform a Breusch-Pagan Test in Python
Heteroskedasticity is a statistical term and it is defined as the unequal scattering of residuals. More specifically it refers to a range of measured values the change in the spread of residuals. Heteroscedasticity possesses a challenge because ordinary least squares (OLS) regression considers the r
4 min read
How to Perform a Brown â Forsythe Test in Python
Prerequisites: Parametric and Non-Parametric Methods, Hypothesis Testing In this article, we will be looking at the approach to perform a brown-Forsythe test in the Python programming language. BrownâForsythe test is a statistical test for the equality of group variances based on performing an Anal
4 min read
How to Perform Welchâs ANOVA in Python
When the assumption of equal variances is violated, Welch's ANOVA is used as an alternative to the standard one-way ANOVA. A one-way ANOVA ("analysis of variance") is used to see if there is a statistically significant difference in the means of three or more independent groups. Steps to perform Wel
3 min read
How to Perform the Nemenyi Test in Python
Nemenyi Test: The Friedman Test is used to find whether there exists a significant difference between the means of more than two groups. In such groups, the same subjects show up in each group. If the p-value of the Friedman test turns out to be statistically significant then we can conduct the Neme
3 min read
How to perform testing in PyCharm?
PyCharm is a powerful integrated development environment (IDE) designed specifically for Python programming. Developed by JetBrains, PyCharm provides a comprehensive set of tools to streamline the development process, from code writing to testing and debugging. In this article, we will focus on the
4 min read
How to Perform a Kruskal-Wallis Test in Python
Kruskal-Wallis test is a non-parametric test and an alternative to One-Way Anova. By non-parametric we mean, the data is not assumed to become from a particular distribution. The main objective of this test is used to determine whether there is a statistical difference between the medians of at leas
2 min read