How to select and order multiple columns in Pyspark DataFrame ? Last Updated : 06 Jun, 2021 Comments Improve Suggest changes Like Article Like Report In this article, we will discuss how to select and order multiple columns from a dataframe using pyspark in Python. For this, we are using sort() and orderBy() functions along with select() function. Methods UsedSelect(): This method is used to select the part of dataframe columns and return a copy of that newly selected dataframe. Syntax: dataframe.select(['column1','column2','column n'].show() sort(): This method is used to sort the data of the dataframe and return a copy of that newly sorted dataframe. This sorts the dataframe in ascending by default. Syntax: dataframe.sort(['column1','column2','column n'], ascending=True).show() oderBy(): This method is similar to sort which is also used to sort the dataframe.This sorts the dataframe in ascending by default. Syntax: dataframe.orderBy(['column1','column2','column n'], ascending=True).show() Let's create a sample dataframe Python3 # importing module import pyspark # importing sparksession from # pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving an app name spark = SparkSession.builder.appName('sparkdf').getOrCreate() # list of students data data = [["1", "sravan", "vignan"], ["2", "ojaswi", "vvit"], ["3", "rohith", "vvit"], ["4", "sridevi", "vignan"], ["1", "sravan", "vignan"], ["5", "gnanesh", "iit"]] # specify column names columns = ['student ID', 'student NAME', 'college'] # creating a dataframe from the lists of data dataframe = spark.createDataFrame(data, columns) print("Actual data in dataframe") # show dataframe dataframe.show() Output: Selecting multiple columns and order by using sort() method Python3 # show dataframe by sorting the dataframe # based on two columns in ascending # order using sort() function dataframe.select(['student ID', 'student NAME'] ).sort(['student ID', 'student NAME'], ascending=True).show() Output: Python3 # show dataframe by sorting the dataframe # based on three columns in desc order # using sort() function dataframe.select(['student ID', 'student NAME', 'college'] ).sort(['student ID', 'student NAME', 'college'], ascending=False).show() Output: Selecting multiple columns and order by using orderBy() method Python3 # show dataframe by sorting the dataframe # based on three columns in desc # order using orderBy() function dataframe.select(['student ID', 'student NAME', 'college'] ).orderBy(['student ID', 'student NAME', 'college'], ascending=False).show() Output: Python3 # show dataframe by sorting the dataframe # based on two columns in asc # order using orderBy() function dataframe.select(['student NAME', 'college'] ).orderBy(['student NAME', 'college'], ascending=True).show() Output: Comment More infoAdvertise with us Next Article How to select and order multiple columns in Pyspark DataFrame ? G gottumukkalabobby Follow Improve Article Tags : Python Python-Pyspark Practice Tags : python Similar Reads How to Order PysPark DataFrame by Multiple Columns ? In this article, we are going to order the multiple columns by using orderBy() functions in pyspark dataframe. Ordering the rows means arranging the rows in ascending or descending order, so we are going to create the dataframe using nested list and get the distinct data. orderBy() function that sor 2 min read How to Add Multiple Columns in PySpark Dataframes ? In this article, we will see different ways of adding Multiple Columns in PySpark Dataframes. Let's create a sample dataframe for demonstration: Dataset Used: Cricket_data_set_odi Python3 # import pandas to read json file import pandas as pd # importing module import pyspark # importing sparksessio 2 min read How to rename multiple columns in PySpark dataframe ? In this article, we are going to see how to rename multiple columns in PySpark Dataframe. Before starting let's create a dataframe using pyspark: Python3 # importing module import pyspark from pyspark.sql.functions import col # importing sparksession from pyspark.sql module from pyspark.sql import S 2 min read How to Order Pyspark dataframe by list of columns ? In this article, we are going to apply OrderBy with multiple columns over pyspark dataframe in Python. Ordering the rows means arranging the rows in ascending or descending order. Method 1: Using OrderBy() OrderBy() function is used to sort an object by its index value. Syntax: dataframe.orderBy([' 2 min read Split single column into multiple columns in PySpark DataFrame pyspark.sql.functions provide a function split() which is used to split DataFrame string Column into multiple columns.  Syntax: pyspark.sql.functions.split(str, pattern, limit=- 1) Parameters: str: str is a Column or str to split.pattern: It is a str parameter, a string that represents a regular ex 4 min read Like