Use Pandas to Calculate Statistics in Python
Last Updated :
23 Jun, 2021
Performing various complex statistical operations in python can be easily reduced to single line commands using pandas. We will discuss some of the most useful and common statistical operations in this post. We will be using the Titanic survival dataset to demonstrate such operations.
Python3
# Import Pandas Library
import pandas as pd
# Load Titanic Dataset as Dataframe
dataset = pd.read_csv('train.csv')
# Show dataset
# head() bydefault show
# 5 rows of the dataframe
dataset.head()
Output:

1. Mean:
Calculates the mean or average value by using DataFrame/Series.mean() method.
Syntax: DataFrame/Series.mean(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Parameters:
- axis: {index (0), columns (1)}
     Specify the axis for the function to be applied on.
- skipna: Â This parameter takes bool value, default value is True
      It excludes null values when computing the result.
- level: This parameter takes int value or level name, default value is None.
     If the axis is a MultiIndex, count along a particular level, collapsing into a Series.
- numeric_only: This parameter takes bool value, default value is None
      Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric  data values. Not implemented for Series.
- **kwargs: Additional arguments to be passed to the function.
Returns: Â Mean of Series or DataFrame (if level specified)
Code:
Python3
# Calculate the Mean
# of 'Age' column
mean = dataset['Age'].mean()
# Print mean
print(mean)
Output:Â
29.69911764705882
2. Median:
Calculates the median value by using DataFrame/Series.median() method.
Syntax: DataFrame/Series.median(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Parameters:
- axis: {index (0), columns (1)}
     Specify the axis for the function to be applied on.
- skipna: Â This parameter takes bool value, default value is True
     It excludes null values when computing the result.
- level: This parameter takes int or level name, default None
     If the axis is a MultiIndex, count along a particular level, collapsing into a Series.
- numeric_only: This parameter takes bool value, default value is None
     Include only float, int, boolean columns. If value is None, will attempt to use everything, then use only  numeric data.
- **kwargs: Additional arguments to be passed to the function.
Returns: Â Median of Series or DataFrame (if level specified)
Code:
Python3
# Calculate Median of 'Fare' column
median = dataset['Fare'].median()
# Print median
print(median)
 Â
Output:Â
Â
14.4542
3. Mode:
Â
Calculates the mode or most frequent value by using DataFrame.mode() method.
Â
Syntax: DataFrame/Series.mode(self, axis=0, numeric_only=False, dropna=True)
Parameters:
- axis: {index (0), columns (1)}
     The axis to iterate over while searching for the mode value:
     0 value or ‘index’ : get mode of each column
     1 value or ‘columns’ : get mode of each row.
- numeric_only:Â This parameter takes bool value, default value is False.
      If True, only apply to numeric value columns.
- dropna: This parameter takes bool value, default value is True.
      Don’t consider counts of NaN/None value.
Returns: Highest frequency value.Â
Â
Code:
Â
Python3
# Calculate Mode of 'Sex' column
mode = dataset['Sex'].mode()
# Print mode
print(mode)
Output:Â
0 male
dtype: object
4. Count:
Calculates the count or frequency of non-null values by using DataFrame/Series.count() Method.
Syntax: DataFrame/Series.count(self, axis=0, level=None, numeric_only=False)
Parameters:
- axis: {0 or ‘index’, 1 or ‘columns’}, default value is 0
     If value is 0 or ‘index’ counts are generated for each column. If value is 1 or ‘columns’ counts are             generated for each row.
- level: (optional)This parameter takes int or str value.
     If the axis is a MultiIndex type, count along a particular level, collapsing into a DataFrame. A str is used  specifies the level name.
- numeric_only: Â This parameter takes bool value, default False
     Include only float, int or boolean data.Returns: Return the highest frequency valueÂ
Returns: For each column/row the number of non-null entries. If level is specified returns a DataFrame                structure.
Code:
Python3
# Calculate Count of 'Ticket' column
count = dataset['Ticket'].count()
# Print count
print(count)
Output:Â
891
5. Standard Deviation:
Calculates the standard deviation of values by using DataFrame/Series.std() method.
Syntax: DataFrame/Series.std(self, axis=None, skipna=None, level=None, ddof=1, numeric_only=None, **kwargs)
Parameters:
- axis: {index (0), columns (1)}
- skipna: This parameters takes bool value, default value is True.
     Exclude NA/null values. If an entire row/column has NA values, the result will be NA value.
- level: This parameters takes int or level name, default value is None.
     If the axis is a MultiIndex, count along a particular level, collapsing into a Series.
- ddof: This parameter take int value, default value is 1.
     Delta Degrees of Freedom. The divisor used in calculations is N - ddof, where N value represents the  number of elements.
- numeric_only: This parameter takes bool value , default None
     Include only float, int, boolean columns. If None, will attempt to use everything, then use only numeric  data. Not implemented for Series.
Returns: Standard DeviationÂ
Code:
Python3
# Calculate Standard Deviation
# of 'Fare' column
std = dataset['Fare'].std()
# Print standard deviation
print(std)
Output:Â
49.693428597180905
6. Max:
Calculates the maximum value using DataFrame/Series.max() method.
Syntax: DataFrame/Series.max(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Parameters:
- axis: {index (0), columns (1)}
     Specify the axis for the function to be applied on.
- skipna: bool, default True
     It excludes null values when computing the result.
- level: int or level name, default None
     If the axis is a MultiIndex type, count along a particular level, collapsing into a Series.
- numeric_only: bool, default None
      Include only float, int, boolean columns. If None value, will attempt to use everything, then use only  numeric data.
- **kwargs: Additional keyword to be passed to the function.
Returns: Maximum value in Series or DataFrame (if level specified)
Code:
Python3
# Calculate Maximum value in 'Age' column
maxValue = dataset['Age'].max()
# Print maxValue
print(maxValue)
Output:Â
80.0
7. Min:
Calculates the minimum value using DataFrame/Series.min() method.
Syntax: DataFrame/Series.min(self, axis=None, skipna=None, level=None, numeric_only=None, **kwargs)
Parameters:
- axis: {index (0), columns (1)}
     Specify the axis for the function to be applied on.
- skipna: bool, default True
     It excludes null values when computing the result.
- level: int or level name, default None
     If the axis is a MultiIndex type, count along a particular level, collapsing into a Series.
- numeric_only: bool, default None
      Include only float, int, boolean columns. If None value, will attempt to use everything, then use only  numeric data.
- **kwargs: Additional keyword to be passed to the function.
Returns: Minimum value in Series or DataFrame (if level specified)
Code:
Python3
# Calculate Minimum value in 'Fare' column
minValue = dataset['Fare'].min()
# Print minValue
print(minValue)
 Â
Output:Â
Â
0.0000
8. Describe:
Â
Summarizes general descriptive statistics using DataFrame/Series.describe() method.
Â
Syntax: DataFrame/Series.describe(self: ~ FrameOrSeries, percentiles=None, include=None, exclude=None)Â
Parameters:
- percentiles: list-like of numbers, optional
- include: ‘all’, list-like of dtypes or None values (default), optional
- exclude: list-like of dtypes or None values (default), optional,
Returns: Summary statistics of the Series or Dataframe provided.
Â
Python3
# Statistical summary
dataset.describe()
Output:
Similar Reads
How To Calculate Summary Statistics In Pandas
Pandas, an incredibly versatile data manipulation library for Python, has various capabilities to calculate summary statistics on datasets. Summary statistics can give you a fast and comprehensive overview of the most important features of a dataset. In the following article, we will explore five me
4 min read
How to Calculate Test Statistic
In statistical hypothesis testing, a test statistic is a crucial tool used to determine the validity of the hypothesis about a population parameter. This article delves into the calculation of test statistics exploring its importance in hypothesis testing and its application in real-world scenarios.
7 min read
statistics mean() function - Python
The mean() function from Pythonâs statistics module is used to calculate the average of a set of numeric values. It adds up all the values in a list and divides the total by the number of elements. For example, if we have a list [2, 4, 6, 8], the mean would be (2 + 4 + 6 + 8) / 4 = 5.0. This functio
4 min read
How to Calculate SMAPE in Python?
In this article, we will see how to compute one of the methods to determine forecast accuracy called the Symmetric Mean Absolute Percentage Error (or simply SMAPE) in Python. The SMAPE is one of the alternatives to overcome the limitations with MAPE forecast error measurement. In contrast to the me
3 min read
Python | Pandas Series.str.len()
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas str.len() method is used to determine length of each string in a Pandas series.
3 min read
Python | Pandas Series.cov() to find Covariance
Python is a great language for doing data analysis, primarily because of the fantastic ecosystem of data-centric Python packages. Pandas is one of those packages and makes importing and analyzing data much easier. Pandas Series.cov() is used to find covariance of two series. In the following example
2 min read
How to Calculate Statistical Significance?
Answer: Statistical Significance can be calculated using the formulaZ=\frac{\overline{x}-\mu}{\frac{\sigma}{\sqrt{n}}}In research surveys, statistical significance is an important metric for determining the validity of hypotheses. Every day, a variety of people conduct a variety of tests and surveys
3 min read
Use Pandas to Calculate Stats from an Imported CSV file
The library in Python that allows users to analyze big data and work with datasets is known as Pandas. Pandas allow users to create the data frame or read from the CSV file using the read_csv function. Once you have created or imported the data, you can calculate various statistics from it, such as
4 min read
fmean() function in Python statistics
Statistics module of Python 3.8 provides a function fmean() that converts all the data into float data-type and then computes the arithmetic mean or average of data that is provided in the form of a sequence or an iterable. The output of this function is always a float. The only difference in comput
3 min read
Pandas Cheat Sheet for Data Science in Python
Pandas is a powerful and versatile library that allows you to work with data in Python. It offers a range of features and functions that make data analysis fast, easy, and efficient. Whether you are a data scientist, analyst, or engineer, Pandas can help you handle large datasets, perform complex op
15+ min read