15 Statistical Hypothesis Tests in Python (Cheat Sheet)
15 Statistical Hypothesis Tests in Python (Cheat Sheet)
Navigation
Start Here Blog Books FAQ About Contact
Search...
In this post, you will discover a cheat sheet for the most popular statistical hypothesis tests for a machine learning project with
examples using the Python API.
Note, when it comes to assumptions such as the expected distribution of data or sample size, the results of a given test are likely
to degrade gracefully rather than become immediately unusable if an assumption is violated.
Generally, data samples need to be representative of the domain and large enough to expose their distribution to analysis.
In some cases, the data can be corrected to meet the assumptions, such as correcting a nearly normal distribution to be normal
by removing outliers, or using a correction to the degrees of freedom in a statistical test when samples have differing variance, to
name two examples.
Finally, there may be multiple tests for a given concern, e.g. normality. We cannot get crisp answers to questions with statistics;
instead, we get probabilistic answers. As such, we can arrive at different answers to the same question by considering the
question in different ways. Hence the need for multiple different tests for some questions we may have about data.
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 1/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
1. Normality Tests
2. Correlation Tests
3. Parametric Statistical Hypothesis Tests
4. Nonparametric Statistical Hypothesis Tests
1. Normality Tests
This section lists statistical tests that you can use to check if your data has a Gaussian distribution.
Shapiro-Wilk Test
Tests whether a data sample has a Gaussian distribution.
Assumptions
Interpretation
Python Code
More Information
scipy.stats.shapiro
Shapiro-Wilk test on Wikipedia
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 2/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
Assumptions
Interpretation
Python Code
Interpretation
Python Code
More Information
scipy.stats.anderson
Anderson-Darling test on Wikipedia
2. Correlation Tests
This section lists statistical tests that you can use to check if two samples are related.
Assumptions
Interpretation
Python Code
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 3/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
More Information
scipy.stats.pearsonr
Pearson’s correlation coefficient on Wikipedia
Assumptions
Assumptions
Interpretation
Python Code
More Information
scipy.stats.kendalltau
Kendall rank correlation coefficient on Wikipedia
Chi-Squared Test
Tests whether two categorical variables are related or independent.
Assumptions
Interpretation
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 4/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
H1: there is a dependency between the samples.
Python Code
More Information
scipy.stats.chi2_contingency
Chi-Squared test on Wikipedia
Observations in each sample are independent and identically distributed (iid).Email Address
Observations in each sample are normally distributed.
Observations in each sample have the same variance.
START MY EMAIL COURSE
Interpretation
Python Code
More Information
scipy.stats.ttest_ind
Student’s t-test on Wikipedia
Assumptions
Interpretation
Python Code
More Information
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 5/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
Student’s t-test on Wikipedia
Assumptions
Interpretation
H0: the means of the samples are equal. Your Start in Machine ×
H1: one or more of the means of the samples are unequal. Learning
Python Code You can master applied Machine Learning
without the math or fancy degree.
1 from scipy.stats import f_oneway
Find out how in this free and practical email
2 data1, data2, ... = ...
3 stat, p = f_oneway(data1, data2, ...) course.
More Information
Email Address
scipy.stats.f_oneway
Analysis of variance on Wikipedia START MY EMAIL COURSE
Assumptions
Interpretation
Python Code
More Information
Assumptions
Interpretation
H0: the distributions of both samples are equal. Your Start in Machine Learning
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 6/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
H1: the distributions of both samples are not equal.
Python Code
More Information
scipy.stats.mannwhitneyu
Mann-Whitney U test on Wikipedia
Interpretation
Email Address
H0: the distributions of both samples are equal.
H1: the distributions of both samples are not equal. START MY EMAIL COURSE
Python Code
More Information
scipy.stats.wilcoxon
Wilcoxon signed-rank test on Wikipedia
Kruskal-Wallis H Test
Tests whether the distributions of two or more independent samples are equal or not.
Assumptions
Interpretation
Python Code
More Information
scipy.stats.kruskal
Kruskal-Wallis one-way analysis of variance on Wikipedia
Friedman Test
Tests whether the distributions of two or more paired samples are equal or not.
Your Start in Machine Learning
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 7/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
Assumptions
Interpretation
Python Code
Further Reading
Email Address
This section provides more resources on the topic if you are looking to go deeper.
Summary
In this tutorial, you discovered the key statistical hypothesis tests that you may need to use in a machine learning project.
The types of tests to use in different circumstances, such as normality checking, relationships between variables, and
differences between samples.
The key assumptions for each test and how to interpret the test result.
How to implement the test using the Python API.
Did I miss an important statistical test or key assumption for one of the listed tests?
Let me know in the comments below.
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 8/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
Click to learn more.
Email Address
About Jason Brownlee
Jason Brownlee, Ph.D. is a machine learning specialist who teaches developers how to get results with modern machine
learning methods via hands-on tutorials. START MY EMAIL COURSE
How to Reduce Variance in a Final Machine Learning Model A Gentle Introduction to SARIMA for Time Series Forecasting in Python
REPLY
Jonathan dunne August 17, 2018 at 7:17 am #
hi, the list looks good. a few omissions. fishers exact test and Bernards test (potentially more power than a fishers exact
test)
one note on the anderson darling test. the use of p values to determine GoF has been discouraged in some fields .
REPLY
Jason Brownlee August 17, 2018 at 7:43 am #
Indeed, I think it was a journal of psychology that has adopted “estimation statistics” instead of hypothesis tests in reporting
results.
Leave a Reply
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 9/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
Name (required)
Read More
POPULAR
How to Use Word Embedding Layers for Deep Learning with Keras
OCTOBER 4, 2017
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 10/11
8/18/2018 15 Statistical Hypothesis Tests in Python (Cheat Sheet)
How to Define an Encoder-Decoder Sequence-to-Sequence Model for Neural Machine Translation in Keras
OCTOBER 26, 2017
How to Reshape Input Data for Long Short-Term Memory Networks in Keras
AUGUST 30, 2017
Email Address
How to Develop a Word-Level Neural Language Model and Use it to Generate Text
NOVEMBER 10, 2017
START MY EMAIL COURSE
https://machinelearningmastery.com/statistical-hypothesis-tests-in-python-cheat-sheet/ 11/11