Finding the Quantile and Decile Ranks of a Pandas DataFrame column
Last Updated :
20 Dec, 2021
A Quantile is where a sample is divided into equal-sized, adjacent, subgroups.
The median is a quantile; the median is placed in a probability distribution so that exactly half of the data is lower than the median and half of the data is above the median. The median cuts a distribution into two equal areas and so it is sometimes called 2-quantile.
Quartiles are also quantiles; they divide the distribution into four equal parts.
Percentiles are quantiles that divide a distribution into 100 equal parts and deciles are quantiles that divide a distribution into 10 equal parts.
We can use the following formula to estimate the ith observation:
ith observation = q (n + 1)
where q is the quantile, the proportion below the ith value that you are looking for
n is the number of items in a data set.
So for finding Quantile rank, q should be 0.25 as we want to divide our data set into 4 equal parts and rank the values from 0-3 based on which quartile they fall upon.
And similarly for Decile rank, q should be 0.1 as we want our data set to be divided into 10 equal parts.
Before moving to Pandas, lets us try the above concept on an example to understand how our Quantile and Decile Ranks are calculated.
Sample question : Find the number in the following set of data where 25 percent of values fall below it, and 75 percent fall above.
Data : 32, 47, 55, 62, 74, 77, 86
Step 1: Order the data from smallest to largest. The data in the question is already in ascending order.
Step 2: Count how many observations you have in your data set. this particular data set has 7 items.
Step 3: Convert any percentage to a decimal for “q”. We are looking for the number where 25 percent of the values fall below it, so convert that to .25.
Step 4: Insert your values into the formula:
Answer:
ith observation = q (n + 1)
ith observation = .25(7 + 1) = 2
The ith observation is at 2. The 2nd number in the set is 47, which is the number where 25 percent of the values fall below it. And then we can start ranking our numbers from 0-3 since we are finding Quantile Rank. Similar approach for finding Decile Rank, in this case its just that the value of q will be 0.1.
Now let us now look in Pandas how we can quickly achieve the same.
Code for Creating a DataFrame:
python3
# Import pandas
import pandas as pd
# Create a DataFrame
df1 = {'Name':['George', 'Andrea', 'John', 'Helen',
'Ravi', 'Julia', 'Justin'],
'EnglishScore':[62, 47, 55, 74, 32, 77, 86]}
df1 = pd.DataFrame(df1, columns = ['Name', ''])
# Sorting the DataFrame in Ascending Order of English Score
df1.sort_values(by =['EnglishScore'], inplace = True)
If we print the above dataframe we get the below result :
Data Frame
Now we can find the Quantile Rank using the pandas function qcut() by passing the column name which is to be considered for the Rank, the value for parameter q which signifies the Number of quantiles. 10 for deciles, 4 for quartiles, etc. and labels = False to return the bins as Integers.
Following is code for Quantile Rank
python3
# code
df1['QuantileRank']= pd.qcut(df1['EnglishScore'],
q = 4, labels = False)
And now if we print the dataframe we can see the new column QauntileRank ranking our data based on EnglishScore column.
Quantile Rank
Similarly to calculate Decile Rank we set q = 10
python3
# code
df1['DecileRank']= pd.qcut(df1['EnglishScore'],
q = 10, labels = False)
Now if we print our DataFrame we get the following output.
DecileRank
That is how we can use the Pandas qcut() method to calculate the various Quantiles on a column.
The entire code for the above example is given below.
python3
# code
import pandas as pd
# Create a DataFrame
df1 = {'Name':['George', 'Andrea', 'John', 'Helen',
'Ravi', 'Julia', 'Justin'],
'EnglishScore':[62, 47, 55, 74, 32, 77, 86]}
df1 = pd.DataFrame(df1, columns =['Name', 'EnglishScore'])
# Sorting the DataFrame in Ascending Order of English Score
# Sorting just for the purpose of better data readability.
df1.sort_values(by =['EnglishScore'], inplace = True)
# Calculating Quantile Rank
df1['QuantileRank']= pd.qcut(df1['EnglishScore'], q = 4, labels = False)
# Calculating Decile Rank
df1['DecileRank'] = pd.qcut(df1['EnglishScore'], q = 10, labels = False)
# printing the dataframe
print(df1)
Similar Reads
Quantile and Decile rank of a column in Pandas-Python Let's see how to find the Quantile and Decile ranks of a column in Pandas. We will be using the qcut() function of the pandas module. pandas.qcut() Pandas library's function qcut() is a Quantile-based discretization function. This means that it discretize the variables into equal-sized buckets based
2 min read
Percentile rank of a column in a Pandas DataFrame Let us see how to find the percentile rank of a column in a Pandas DataFrame. We will use the rank() function with the argument pct = True to find the percentile rank. Example 1 : Python3 # import the module import pandas as pd # create a DataFrame data = {'Name': ['Mukul', 'Rohan', 'Mayank', 'Shubh
1 min read
Change the order of a Pandas DataFrame columns in Python Let's explore ways to change the order of the Pandas DataFrame column in Python. Reordering columns in large Pandas DataFrames enhances data readability and usability.Change the Order of Pandas DataFrame Columns using ilociloc method allows you to reorder columns by specifying the index positions of
2 min read
How to convert index in a column of the Pandas dataframe? Each row in a dataframe (i.e level=0) has an index value i.e value from 0 to n-1 index location and there are many ways to convert these index values into a column in a pandas dataframe. First, let's create a Pandas dataframe. Here, we will create a Pandas dataframe regarding student's marks in a pa
4 min read
Ceil and floor of the dataframe in Pandas Python â Round up and Truncate In this article, we will discuss getting the ceil and floor values of the Pandas Dataframe. First, Let's create a dataframe. Example: Python3 # importing pandas and numpy import pandas as pd import numpy as np # Creating a DataFrame df = pd.DataFrame({'Student Name': ['Anuj', 'Ajay', 'Vivek', 'suraj
2 min read