Open In App

Pyspark - Converting JSON to DataFrame

Last Updated : 29 Jun, 2021
Comments
Improve
Suggest changes
Like Article
Like
Report

In this article, we are going to convert JSON String to DataFrame in Pyspark.

Method 1: Using read_json()

We can read JSON files using pandas.read_json. This method is basically used to read JSON files through pandas.

Syntax: pandas.read_json("file_name.json")

Here we are going to use this JSON file for demonstration:

Code:

Python3
# import pandas to read json file
import pandas as pd

# importing module
import pyspark

# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession

# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()


# creating a dataframe from the json file named student
dataframe = spark.createDataFrame(pd.read_json('student.json'))

# display the dataframe (Pyspark dataframe)
dataframe.show()

Output:

Method 2: Using spark.read.json()

This is used to read a json data from a file and display the data in the form of a dataframe

Syntax: spark.read.json('file_name.json')

JSON file for demonstration:

Code:

Python3
# importing module
import pyspark

# importing sparksession from pyspark.sql module
from pyspark.sql import SparkSession

# creating sparksession and giving an app name
spark = SparkSession.builder.appName('sparkdf').getOrCreate()

# read json file
data = spark.read.json('college.json')

# display json data
data.show()

Output:


Next Article
Article Tags :
Practice Tags :

Similar Reads