How to select a range of rows from a dataframe in PySpark ?
Last Updated :
18 Jul, 2022
In this article, we are going to select a range of rows from a PySpark dataframe.
It can be done in these ways:
- Using filter().
- Using where().
- Using SQL expression.
Creating Dataframe for demonstration:
Python3
# importing module
import pyspark
# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession
# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()
# list of students data
data = [["1", "sravan", "vignan", 67, 89],
["2", "ojaswi", "vvit", 78, 89],
["3", "rohith", "vvit", 100, 80],
["4", "sridevi", "vignan", 78, 80],
["1", "sravan", "vignan", 89, 98],
["5", "gnanesh", "iit", 94, 98]]
# specify column names
columns = ['student ID', 'student NAME',
'college', 'subject1', 'subject2']
# creating a dataframe from the lists of data
dataframe = spark.createDataFrame(data, columns)
# display dataframe
dataframe.show()
Output:

Method 1: Using filter()
This function is used to filter the dataframe by selecting the records based on the given condition.
Syntax: dataframe.filter(condition)
Example: Python code to select the dataframe based on subject2 column.
Python3
# select dataframe between
# 23 and 78 marks in subject2
dataframe.filter(
dataframe.subject1.between(23,78)).show()
Output:

Method 2: Using where()
This function is used to filter the dataframe by selecting the records based on the given condition.
Syntax: dataframe.where(condition)
Example 1: Python program to select dataframe based on subject1 column.
Python3
# select dataframe between
# 85 and 100 in subject1 column
dataframe.where(
dataframe.subject1.between(85,100)).show()
Output:

Example 2: Select rows in dataframe by college column
Python3
# select dataframe in college column
# for vvit
dataframe.where(
dataframe.college.between("vvit","vvit")).collect()
Output:
[Row(ID='2', student NAME='ojaswi', college='vvit', subject1=78, subject2=89),
Row(ID='3', student NAME='rohith', college='vvit', subject1=100, subject2=80)]
Method 3: Using SQL Expression
By using SQL query with between() operator we can get the range of rows.
Syntax: spark.sql("SELECT * FROM my_view WHERE column_name between value1 and value2")
Example 1: Python program to select rows from dataframe based on subject2 column
Python3
# create view for the dataframe
dataframe.createOrReplaceTempView("my_view")
# data subject1 between 23 and 78
spark.sql("SELECT * FROM my_view WHERE\
subject1 between 23 and 78").collect()
Output:
[Row(student ID='1', student NAME='sravan', college='vignan', subject1=67, subject2=89),
Row(student ID='2', student NAME='ojaswi', college='vvit', subject1=78, subject2=89),
Row(student ID='4', student NAME='sridevi', college='vignan', subject1=78, subject2=80)]
Example 2: Select based on ID
Python3
# create view for the dataframe
dataframe.createOrReplaceTempView("my_view")
# data subject1 between 23 and 78
spark.sql("SELECT * FROM my_view WHERE\
ID between 1 and 3").collect()
Output:
[Row(ID='1', student NAME='sravan', college='vignan', subject1=67, subject2=89),
Row(ID='2', student NAME='ojaswi', college='vvit', subject1=78, subject2=89),
Row(ID='3', student NAME='rohith', college='vvit', subject1=100, subject2=80),
Row(ID='1', student NAME='sravan', college='vignan', subject1=89, subject2=98)]
Similar Reads
How take a random row from a PySpark DataFrame?
In this article, we are going to learn how to take a random row from a PySpark DataFrame in the Python programming language. Method 1 : PySpark sample() method PySpark provides various methods for Sampling which are used to return a sample from the given PySpark DataFrame. Here are the details of th
4 min read
How to Select Rows from Pandas DataFrame?
pandas.DataFrame.loc is a function used to select rows from Pandas DataFrame based on the condition provided. In this article, let's learn to select the rows from Pandas DataFrame based on some conditions. Syntax: df.loc[df['cname'] 'condition'] Parameters: df: represents data frame cname: represent
2 min read
How to Randomly Select rows from Pandas DataFrame
In Pandas, it is possible to select rows randomly from a DataFrame with different methods. Randomly selecting rows can be useful for tasks like sampling, testing or data exploration.Creating Sample Pandas DataFrameFirst, we will create a sample Pandas DataFrame that we will use further in our articl
3 min read
How to select last row and access PySpark dataframe by index ?
In this article, we will discuss how to select the last row and access pyspark dataframe by index. Creating dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pyspark.sql import SparkSession # creating sparksession and giving
2 min read
How to slice a PySpark dataframe in two row-wise dataframe?
In this article, we are going to learn how to slice a PySpark DataFrame into two row-wise. Slicing a DataFrame is getting a subset containing all rows from one index to another. Method 1: Using limit() and subtract() functions In this method, we first make a PySpark DataFrame with precoded data usin
4 min read
How to get a value from the Row object in PySpark Dataframe?
In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame. Method 1 : Using __getitem()__ magic method We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.co
5 min read
How to Select Rows from a Dataframe based on Column Values ?
Selecting rows from a Pandas DataFrame based on column values is a fundamental operation in data analysis using pandas. The process allows to filter data, making it easier to perform analyses or visualizations on specific subsets. Key takeaway is that pandas provides several methods to achieve this,
4 min read
How to check for a substring in a PySpark dataframe ?
In this article, we are going to see how to check for a substring in PySpark dataframe. Substring is a continuous sequence of characters within a larger string size. For example, "learning pyspark" is a substring of "I am learning pyspark from GeeksForGeeks". Let us look at different ways in which w
5 min read
Select any row from a Dataframe in Pandas | Python
In this article, we will learn how to get the rows from a dataframe as a list, without using the functions like ilic[]. There are multiple ways to do get the rows as a list from given dataframe. Letâs see them will the help of examples. Python3 # importing pandas as pd import pandas as pd # Create t
1 min read
How to Get substring from a column in PySpark Dataframe ?
In this article, we are going to see how to get the substring from the PySpark Dataframe column and how to create the new column and put the substring in that newly created column. We can get the substring of the column using substring() and substr() function. Syntax: substring(str,pos,len) df.col_n
3 min read