Finding the Quantile and Decile Ranks of a Pandas DataFrame column
Last Updated :
20 Dec, 2021
A Quantile is where a sample is divided into equal-sized, adjacent, subgroups.
The median is a quantile; the median is placed in a probability distribution so that exactly half of the data is lower than the median and half of the data is above the median. The median cuts a distribution into two equal areas and so it is sometimes called 2-quantile.
Quartiles are also quantiles; they divide the distribution into four equal parts.
Percentiles are quantiles that divide a distribution into 100 equal parts and deciles are quantiles that divide a distribution into 10 equal parts.
We can use the following formula to estimate the ith observation:
ith observation = q (n + 1)
where q is the quantile, the proportion below the ith value that you are looking for
n is the number of items in a data set.
So for finding Quantile rank, q should be 0.25 as we want to divide our data set into 4 equal parts and rank the values from 0-3 based on which quartile they fall upon.
And similarly for Decile rank, q should be 0.1 as we want our data set to be divided into 10 equal parts.
Before moving to Pandas, lets us try the above concept on an example to understand how our Quantile and Decile Ranks are calculated.
Sample question : Find the number in the following set of data where 25 percent of values fall below it, and 75 percent fall above.
Data : 32, 47, 55, 62, 74, 77, 86
Step 1: Order the data from smallest to largest. The data in the question is already in ascending order.
Step 2: Count how many observations you have in your data set. this particular data set has 7 items.
Step 3: Convert any percentage to a decimal for “q”. We are looking for the number where 25 percent of the values fall below it, so convert that to .25.
Step 4: Insert your values into the formula:
Answer:
ith observation = q (n + 1)
ith observation = .25(7 + 1) = 2
The ith observation is at 2. The 2nd number in the set is 47, which is the number where 25 percent of the values fall below it. And then we can start ranking our numbers from 0-3 since we are finding Quantile Rank. Similar approach for finding Decile Rank, in this case its just that the value of q will be 0.1.
Now let us now look in Pandas how we can quickly achieve the same.
Code for Creating a DataFrame:
python3
# Import pandas
import pandas as pd
# Create a DataFrame
df1 = {'Name':['George', 'Andrea', 'John', 'Helen',
'Ravi', 'Julia', 'Justin'],
'EnglishScore':[62, 47, 55, 74, 32, 77, 86]}
df1 = pd.DataFrame(df1, columns = ['Name', ''])
# Sorting the DataFrame in Ascending Order of English Score
df1.sort_values(by =['EnglishScore'], inplace = True)
If we print the above dataframe we get the below result :
Data Frame
Now we can find the Quantile Rank using the pandas function qcut() by passing the column name which is to be considered for the Rank, the value for parameter q which signifies the Number of quantiles. 10 for deciles, 4 for quartiles, etc. and labels = False to return the bins as Integers.
Following is code for Quantile Rank
python3
# code
df1['QuantileRank']= pd.qcut(df1['EnglishScore'],
q = 4, labels = False)
And now if we print the dataframe we can see the new column QauntileRank ranking our data based on EnglishScore column.
Quantile Rank
Similarly to calculate Decile Rank we set q = 10
python3
# code
df1['DecileRank']= pd.qcut(df1['EnglishScore'],
q = 10, labels = False)
Now if we print our DataFrame we get the following output.
DecileRank
That is how we can use the Pandas qcut() method to calculate the various Quantiles on a column.
The entire code for the above example is given below.
python3
# code
import pandas as pd
# Create a DataFrame
df1 = {'Name':['George', 'Andrea', 'John', 'Helen',
'Ravi', 'Julia', 'Justin'],
'EnglishScore':[62, 47, 55, 74, 32, 77, 86]}
df1 = pd.DataFrame(df1, columns =['Name', 'EnglishScore'])
# Sorting the DataFrame in Ascending Order of English Score
# Sorting just for the purpose of better data readability.
df1.sort_values(by =['EnglishScore'], inplace = True)
# Calculating Quantile Rank
df1['QuantileRank']= pd.qcut(df1['EnglishScore'], q = 4, labels = False)
# Calculating Decile Rank
df1['DecileRank'] = pd.qcut(df1['EnglishScore'], q = 10, labels = False)
# printing the dataframe
print(df1)
Similar Reads
Quantile and Decile rank of a column in Pandas-Python
Let's see how to find the Quantile and Decile ranks of a column in Pandas. We will be using the qcut() function of the pandas module. pandas.qcut() Pandas library's function qcut() is a Quantile-based discretization function. This means that it discretize the variables into equal-sized buckets based
2 min read
Percentile rank of a column in a Pandas DataFrame
Let us see how to find the percentile rank of a column in a Pandas DataFrame. We will use the rank() function with the argument pct = True to find the percentile rank. Example 1 : Python3 # import the module import pandas as pd # create a DataFrame data = {'Name': ['Mukul', 'Rohan', 'Mayank', 'Shubh
1 min read
Change the order of a Pandas DataFrame columns in Python
Let's explore ways to change the order of the Pandas DataFrame column in Python. Reordering columns in large Pandas DataFrames enhances data readability and usability.Change the Order of Pandas DataFrame Columns using ilociloc method allows you to reorder columns by specifying the index positions of
2 min read
How to take column-slices of DataFrame in Pandas?
In this article, we will learn how to slice a DataFrame column-wise in Python. DataFrame is a two-dimensional tabular data structure with labeled axes. i.e. columns.Creating Dataframe to slice columnsPython# importing pandas import pandas as pd # Using DataFrame() method from pandas module df1 = pd.
2 min read
Get unique values from a column in Pandas DataFrame
In Pandas, retrieving unique values from DataFrame is used for analyzing categorical data or identifying duplicates. Let's learn how to get unique values from a column in Pandas DataFrame. Get the Unique Values of Pandas using unique()The.unique()method returns a NumPy array. It is useful for identi
5 min read
How to convert index in a column of the Pandas dataframe?
Each row in a dataframe (i.e level=0) has an index value i.e value from 0 to n-1 index location and there are many ways to convert these index values into a column in a pandas dataframe. First, let's create a Pandas dataframe. Here, we will create a Pandas dataframe regarding student's marks in a pa
4 min read
Getting Unique values from a column in Pandas dataframe
Let's see how can we retrieve the unique values from pandas dataframe. Let's create a dataframe from CSV file. We are using the past data of GDP from different countries. You can get the dataset from here. Python3 # import pandas as pd import pandas as pd gapminder_csv_url ='http://bit.ly/2cLzoxH' #
2 min read
Ceil and floor of the dataframe in Pandas Python â Round up and Truncate
In this article, we will discuss getting the ceil and floor values of the Pandas Dataframe. First, Let's create a dataframe. Example: Python3 # importing pandas and numpy import pandas as pd import numpy as np # Creating a DataFrame df = pd.DataFrame({'Student Name': ['Anuj', 'Ajay', 'Vivek', 'suraj
2 min read
How to Sort a Pandas DataFrame based on column names or row index?
Pandas dataframe.sort_index() method sorts objects by labels along the given axis. Basically, the sorting algorithm is applied to the axis labels rather than the actual data in the Dataframe and based on that the data is rearranged. Creating Pandas Dataframe Create a DataFrame object from the Python
3 min read
How to Sort a Pandas DataFrame by Both Index and Column?
In this article, we will discuss how to sort a Pandas dataframe by both index and columns. Sort DataFrame based on IndexWe can sort a Pandas DataFrame based on Index and column using sort_index method. To sort the DataFrame based on the index we need to pass axis=0 as a parameter to sort_index metho
3 min read