Sort the PySpark DataFrame columns by Ascending or Descending order
Last Updated :
06 Jun, 2021
In this article, we are going to sort the dataframe columns in the pyspark. For this, we are using sort() and orderBy() functions in ascending order and descending order sorting.
Let's create a sample dataframe.
Python3
# importing module
import pyspark
# importing sparksession from
# pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of employee data
data = [["1", "sravan", "company 1"],
["2", "ojaswi", "company 1"],
["3", "rohith", "company 2"],
["4", "sridevi", "company 1"],
["1", "sravan", "company 1"],
["4", "sridevi", "company 1"]]
# specify column names
columns = ['Employee_ID', 'Employee NAME', 'Company']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
# display data in the dataframe
dataframe.show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 1| sravan|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
Using sort() function
The sort function is used to sort the data frame column.
Syntax: dataframe.sort(['column name'], ascending=True).show()
Example 1: Arrange in ascending Using Sort() with one column
Sort the data based on Employee Name in increasing order
Python3
# sort the dataframe based on
# employee name column in ascending order
dataframe.sort(['Employee NAME'],
ascending = True).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
Sort the data based on Employee name in decreasing order:
Syntax: dataframe.sort(['column name'], ascending = False).show()
Code:
Python3
# sort the dataframe based on
# employee name column in descending order
dataframe.sort(['Employee NAME'],
ascending = False).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
+-----------+-------------+---------+
Example 2: Using Sort() with multiple columns
We are going to sort the dataframe based on employee id and employee name in ascending order.
Python3
# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.sort(['Employee_ID','Employee NAME'],
ascending = True).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
We are going to sort the dataframe based on employee ID, company, and employee name in descending order
Python3
# sort the dataframe based on employee ID ,
# company and employee Name columns in descending order
dataframe.sort(['Employee_ID','Employee NAME',
'Company'], ascending = False).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
Example 3: Sort by ASC methods.
ASC method of the Column function, it returns a sort expression based on the ascending order of the given column name.
Python3
dataframe.sort(dataframe.Employee_ID.asc()).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
Example 4: Sort by DESC methods.
DESC method of the Column function, it returns a sort expression based on the descending order of the given column name.
Python3
dataframe.sort(dataframe.Employee_ID.desc()).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
Using OrderBy() Function
The orderBy() function sorts by one or more columns. By default, it sorts by ascending order.
Syntax: orderBy(*cols, ascending=True)
Parameters:
- cols→ Columns by which sorting is needed to be performed.
- ascending→ Boolean value to say that sorting is to be done in ascending order
Example 1: ascending for one column
Python program to sort the dataframe based on Employee ID in ascending order
Python3
# sort the dataframe based on employee I
# columns in descending order
dataframe.orderBy(['Employee_ID'],
ascending=False).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
Python program to sort the dataframe based on  Employee ID in descending order
Python3
# sort the dataframe based on
# Employee ID in descending order
dataframe.orderBy(['Employee_ID'],
ascending = False).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
Example 2: Ascending multiple columns
Sort the dataframe based on employee ID and employee Name columns in descending order using orderBy.
Python3
# sort the dataframe based on employee ID
# and employee Name columns in descending order
dataframe.orderBy(['Employee ID','Employee NAME'],
ascending = False).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 4| sridevi|company 1|
| 4| sridevi|company 1|
| 3| rohith|company 2|
| 2| ojaswi|company 1|
| 1| sravan|company 1|
| 1| sravan|company 1|
+-----------+-------------+---------+
Sort the dataframe based on employee ID and employee Name columns in ascending order
Python3
# sort the dataframe based on employee ID
# and employee Name columns in ascending order
dataframe.orderBy(['Employee_ID','Employee NAME'],
ascending =True).show()
Output:
+-----------+-------------+---------+
|Employee_ID|Employee NAME| Company|
+-----------+-------------+---------+
| 1| sravan|company 1|
| 1| sravan|company 1|
| 2| ojaswi|company 1|
| 3| rohith|company 2|
| 4| sridevi|company 1|
| 4| sridevi|company 1|
+-----------+-------------+---------+
Similar Reads
PySpark - GroupBy and sort DataFrame in descending order
In this article, we will discuss how to groupby PySpark DataFrame and then sort it in descending order. Methods UsedgroupBy(): The groupBy() function in pyspark is used for identical grouping data on DataFrame while performing an aggregate function on the grouped data. Syntax: DataFrame.groupBy(*col
3 min read
How to Order Pyspark dataframe by list of columns ?
In this article, we are going to apply OrderBy with multiple columns over pyspark dataframe in Python. Ordering the rows means arranging the rows in ascending or descending order. Method 1: Using OrderBy() OrderBy() function is used to sort an object by its index value. Syntax: dataframe.orderBy(['
2 min read
How to select and order multiple columns in Pyspark DataFrame ?
In this article, we will discuss how to select and order multiple columns from a dataframe using pyspark in Python. For this, we are using sort() and orderBy() functions along with select() function. Methods UsedSelect(): This method is used to select the part of dataframe columns and return a copy
2 min read
Change the order of a Pandas DataFrame columns in Python
Let's explore ways to change the order of the Pandas DataFrame column in Python. Reordering columns in large Pandas DataFrames enhances data readability and usability.Change the Order of Pandas DataFrame Columns using ilociloc method allows you to reorder columns by specifying the index positions of
2 min read
PySpark DataFrame - Select all except one or a set of columns
In this article, we are going to extract all columns except a set of columns or one column from Pyspark dataframe. For this, we will use the select(), drop() functions. But first, let's create Dataframe for demonestration. Python3 # importing module import pyspark # importing sparksession from pyspa
2 min read
How to Iterate over rows and columns in PySpark dataframe
In this article, we will discuss how to iterate rows and columns in PySpark dataframe. Create the dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app nam
6 min read
Sort Rows or Columns in Pandas Dataframe Based on Values
Sorting is a fundamental operation when working with data in Pandas. Whether you need to arrange rows in ascending or descending order or reorder columns based on values, Pandas provides powerful functions to make sorting easy and efficient. In this article, we'll explore different ways to sort rows
4 min read
Merge two DataFrames with different amounts of columns in PySpark
In this article, we will discuss how to perform union on two dataframes with different amounts of columns in PySpark in Python. Let's consider the first dataframe Here we are having 3 columns named id, name, and address. Python3 # importing module import pyspark # import when and lit function from p
6 min read
Sort the Pandas DataFrame by two or more columns
In this article, our basic task is to sort the data frame based on two or more columns. For this, Dataframe.sort_values() method is used. This method sorts the data frame in Ascending or Descending order according to the columns passed inside the function. First, Let's Create a Dataframe: Python3 #i
2 min read
Why Does Column Order Change When Appending Pandas DataFrames
When appending Pandas DataFrames, we may notice that the column order has changed. This behavior can be confusing, especially if we want the columns to maintain a specific order. This article explains why this happens and how to ensure the column order remains consistent.Understanding Column Order i
3 min read