0% found this document useful (0 votes)
269 views

DataFrame Operations

This Python code creates a SparkSession and defines a Row schema to represent personal data with fields for ID, Name, Age, and Area of Interest. It then creates sample Row objects and collects them in a list to create a DataFrame. The code calculates descriptive statistics on the Age column, writes it to Parquet format, sorts the DataFrame by Name in descending order and displays the results, and writes it to Parquet format.

Uploaded by

Arpita Das
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
269 views

DataFrame Operations

This Python code creates a SparkSession and defines a Row schema to represent personal data with fields for ID, Name, Age, and Area of Interest. It then creates sample Row objects and collects them in a list to create a DataFrame. The code calculates descriptive statistics on the Age column, writes it to Parquet format, sorts the DataFrame by Name in descending order and displays the results, and writes it to Parquet format.

Uploaded by

Arpita Das
Copyright
© © All Rights Reserved
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 1

# Put your code here

from pyspark.sql import SparkSession


spark = SparkSession \
.builder \
.appName("Data Frame Personal") \
.config("spark.some.config.option", "some-value") \
.getOrCreate()
from pyspark.sql import *
Personal = Row("ID","Name","Age","Area of Interest")
data1 = Personal("1","Jack", "22", "Data Science")
data2 = Personal("2","Luke", "21", "Data Analytics")
data3 = Personal("3","Leo", "24", "Micro Services")
data4 = Personal("4","Mark", "21", "Data Analytics")

PersonalData=[data1,data2,data3,data4]
df = spark.createDataFrame(PersonalData)
Result1 = df.describe('Age')
Result1.coalesce(1).write.parquet("Age")
Result2 = df.select('ID','Name','Age').orderBy('Name',ascending=False)
Result2.show()
Result2.coalesce(1).write.parquet("NameSorted")
#df.show()

You might also like