Exploratory Data Analysis
Exploratory Data Analysis
Analysis (EDA)
1. Data Loading
3. Data Cleaning
4. Data Transformation
By: lakshmanan M
● SQL Query: spark.sql('SELECT * FROM view_name WHERE condition')
6. Statistical Analysis
9. Column Operations
By: lakshmanan M
● Date Formatting: df.withColumn('formatted_date',
date_format('dateColumn', 'yyyyMMdd'))
● Date Arithmetic: df.withColumn('date_plus_days',
date_add(df['date'], 5))
15. Joins
By: lakshmanan M
● Left Outer Join: df1.join(df2, df1['id'] == df2['id'],
'left_outer')
Right Outer Join: df1.join(df2, df1['id'] == df2['id'],
●
'right_outer')
By: lakshmanan M
● Perform SQL Queries: spark.sql('SELECT * FROM temp_table WHERE
condition')
By: lakshmanan M
● Set Operations (Union, Intersect, Minus): df1.union(df2);
df1.intersect(df2); df1.subtract(df2)
By: lakshmanan M
34. Advanced File Formats
By: lakshmanan M
● Reading Data from Multiple Sources: df =
spark.read.format('format').option('option',
'value').load(['path1', 'path2'])
● Writing Data to Multiple Formats:
df.write.format('format').save('path', mode='overwrite')
By: lakshmanan M
Presented by Lakshmanan M
Thank you
very much!
By: lakshmanan M