How to get a value from the Row object in PySpark Dataframe?
Last Updated :
04 Jan, 2022
In this article, we are going to learn how to get a value from the Row object in PySpark DataFrame.
Method 1 : Using __getitem()__ magic method
We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the __getitem()__ magic method to get an item of a particular column name. Given below is the syntax.
Syntax : DataFrame.__getitem__('Column_Name')
Returns : value corresponding to the column name in the Row object
Python
# library import
import pyspark
from pyspark.sql import SparkSession
from pyspark.sql import Row
# Session Creation
random_value_session = SparkSession.builder.appName(
'Random_Value_Session'
).getOrCreate()
# Data filled in our DataFrame
# 5 rows below
rows = [['All England Open', 'March', 'Super 1000'],
['Malaysia Open', 'January', 'Super 750'],
['Korea Open', 'April', 'Super 500'],
['Hylo Open', 'November', 'Super 100'],
['Spain Masters', 'March', 'Super 300']]
# Columns of our DataFrame
columns = ['Tournament', 'Month', 'Level']
#DataFrame is created
dataframe = random_value_session.createDataFrame(rows,
columns)
# Showing the DataFrame
dataframe.show()
# getting list of rows using collect()
row_list = dataframe.collect()
# Printing the first Row object
# from which data is extracted
print(row_list[0])
# Using __getitem__() magic method
# To get value corresponding to a particular
# column
print(row_list[0].__getitem__('Level'))
print(row_list[0].__getitem__('Tournament'))
print(row_list[0].__getitem__('Level'))
print(row_list[0].__getitem__('Month'))
Output:Â
+----------------+--------+----------+
| Tournament| Month| Level|
+----------------+--------+----------+
|All England Open| March|Super 1000|
| Malaysia Open| January| Super 750|
| Korea Open| April| Super 500|
| Hylo Open|November| Super 100|
| Spain Masters| March| Super 300|
+----------------+--------+----------+
Row(Tournament='All England Open', Month='March', Level='Super 1000')
Super 1000
All England Open
Super 1000
March
Method 2 : Using asDict() method
We will create a Spark DataFrame with atleast one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). We then use the asDict() method to get a dictionary where column names are keys and their row values are dictionary values. Given below is the syntax:
Syntax : DataFrame.asDict(recursive)
Parameters :Â
recursive: bool : returns nested rows as dict. The default value is False.
We then get easily get the value from the dictionary using DictionaryName['key_name'].
Python
# library imports are done here
import pyspark
from pyspark.sql import SparkSession
# Session Creation
random_value_session = SparkSession.builder.appName(
'Random_Value_Session'
).getOrCreate()
# Data filled in our DataFrame
# Rows below will be filled
rows = [['French Open', 'October', 'Super 750'],
['Macau Open', 'November', 'Super 300'],
['India Open', 'January', 'Super 500'],
['Odisha Open', 'January', 'Super 100'],
['China Open', 'November', 'Super 1000']]
# DataFrame Columns
columns = ['Tournament', 'Month', 'Level']
# DataFrame creation
dataframe = random_value_session.createDataFrame(rows,
columns)
# DataFrame print
dataframe.show()
# list of rows using collect()
row_list = dataframe.collect()
# Printing the second Row object
# from which we will read data
print(row_list[1])
print()
# Printing dictionary to make
# things more clear
print(row_list[1].asDict())
print()
# Using asDict() method to convert row object
# into a dictionary where the column names are keys
# Using column names as keys to get respective values
print(row_list[1].asDict()['Tournament'])
print(row_list[1].asDict()['Month'])
print(row_list[1].asDict()['Level'])
Output :Â
+-----------+--------+----------+
| Tournament| Month| Level|
+-----------+--------+----------+
|French Open| October| Super 750|
| Macau Open|November| Super 300|
| India Open| January| Super 500|
|Odisha Open| January| Super 100|
| China Open|November|Super 1000|
+-----------+--------+----------+
Row(Tournament='Macau Open', Month='November', Level='Super 300')
{'Tournament': 'Macau Open', 'Month': 'November', 'Level': 'Super 300'}
Macau Open
November
Super 300
Method 3: Imagining Row object just like a list
Here we will imagine a Row object like a Python List and perform operations. We will create a Spark DataFrame with at least one row using createDataFrame(). We then get a Row object from a list of row objects returned by DataFrame.collect(). Since we are imagining the Row object like a List, we just use :Â
Syntax : RowObject['Column_name']
Returns : Value corresponding to the column name in the row object.
Python
# library imports are done here
import pyspark
from pyspark.sql import SparkSession
# Session Creation
random_value_session = SparkSession.builder.appName(
'Random_Value_Session'
).getOrCreate()
# Data filled in our DataFrame
# Rows below will be filled
rows = [['Denmark Open', 'October', 'Super 1000'],
['Indonesia Open', 'June', 'Super 1000'],
['Korea Open', 'April', 'Super 500'],
['Japan Open', 'August', 'Super 750'],
['Akita Masters', 'July', 'Super 100']]
# DataFrame Columns
columns = ['Tournament', 'Month', 'Level']
# DataFrame creation
dataframe = random_value_session.createDataFrame(rows,
columns)
# DataFrame print
dataframe.show()
# list of rows using collect()
row_list = dataframe.collect()
# Lets take the third Row object
row_object = row_list[2]
# If we imagine it as a Python List,
# We can get the first value of the list,
# index 0, let's try it
print(row_object[0])
# We got the value of column at index 0
# which is - 'Tournament'
# A few more examples
print(row_list[4][0])
print(row_list[3][1])
print(row_list[4][2])
Output:Â
+--------------+-------+----------+
| Tournament| Month| Level|
+--------------+-------+----------+
| Denmark Open|October|Super 1000|
|Indonesia Open| June|Super 1000|
| Korea Open| April| Super 500|
| Japan Open| August| Super 750|
| Akita Masters| July| Super 100|
+--------------+-------+----------+
Korea Open
Akita Masters
August
Super 100
Similar Reads
How to get keys and values from Map Type column in Spark SQL DataFrame
In Python, the MapType function is preferably used to define an array of elements or a dictionary which is used to represent key-value pairs as a map function. The Maptype interface is just like HashMap in Java and the dictionary in Python. It takes a collection and a function as input and returns a
4 min read
How to slice a PySpark dataframe in two row-wise dataframe?
In this article, we are going to learn how to slice a PySpark DataFrame into two row-wise. Slicing a DataFrame is getting a subset containing all rows from one index to another. Method 1: Using limit() and subtract() functions In this method, we first make a PySpark DataFrame with precoded data usin
4 min read
How to duplicate a row N time in Pyspark dataframe?
In this article, we are going to learn how to duplicate a row N times in a PySpark DataFrame. Method 1: Repeating rows based on column value In this method, we will first make a PySpark DataFrame using createDataFrame(). In our example, the column "Y" has a numerical value that can only be used here
4 min read
How to select a range of rows from a dataframe in PySpark ?
In this article, we are going to select a range of rows from a PySpark dataframe. It can be done in these ways: Using filter().Using where().Using SQL expression. Creating Dataframe for demonstration: Python3 # importing module import pyspark # importing sparksession from pyspark.sql module from pys
3 min read
How to Get the maximum value from the Pandas dataframe in Python?
Python Pandas max() function returns the maximum of the values over the requested axis. Syntax: dataframe.max(axis) where, axis=0 specifies columnaxis=1 specifies rowExample 1: Get maximum value in dataframe row To get the maximum value in a dataframe row simply call the max() function with axis set
2 min read
How to Get substring from a column in PySpark Dataframe ?
In this article, we are going to see how to get the substring from the PySpark Dataframe column and how to create the new column and put the substring in that newly created column. We can get the substring of the column using substring() and substr() function. Syntax: substring(str,pos,len) df.col_n
3 min read
How take a random row from a PySpark DataFrame?
In this article, we are going to learn how to take a random row from a PySpark DataFrame in the Python programming language. Method 1 : PySpark sample() method PySpark provides various methods for Sampling which are used to return a sample from the given PySpark DataFrame. Here are the details of th
4 min read
How to get distinct rows in dataframe using PySpark?
In this article we are going to get the distinct data from pyspark dataframe in Python, So we are going to create the dataframe using a nested list and get the distinct data. We are going to create a dataframe from pyspark list bypassing the list to the createDataFrame() method from pyspark, then by
2 min read
How to Get Cell Value from Pandas DataFrame?
In this article, we will explore various methods to retrieve cell values from a Pandas DataFrame in Python. Pandas provides several functions to access specific cell values, either by label or by position.Get value from a cell of Dataframe using loc() functionThe .loc[] function in Pandas allows you
3 min read
How to get nth row in a Pandas DataFrame?
Pandas Dataframes are basically table format data that comprises rows and columns. Now for accessing the rows from large datasets, we have different methods like iloc, loc and values in Pandas. The most commonly used method is iloc(). Let us consider a simple example.Method 1. Using iloc() to access
4 min read