Spark SQL

The document discusses Spark DataFrames and Datasets using Scala. It shows how to create DataFrames from data, rename columns, sort data and describe datasets. It also demonstrates creating datasets from JSON and Parquet files and reading a CSV file.

Uploaded by

Karthikeyan

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

50 views

Spark SQL

Uploaded by

Karthikeyan

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 2

1) DATAFRAME:

import org.apache.spark.sql.SparkSession
val sparkSession = SparkSession.builder.master("local").appName("Spark session in
Fresco").getOrCreate()
val langPercentDF = spark.createDataFrame(List(("Scala", 35), ("Python", 30), ("R",
15), ("Java", 20)))
langPercentDF.show()
val lpDF = langPercentDF.withColumnRenamed("_1",
"language").withColumnRenamed("_2", "percent")
lpDF.orderBy(desc("percent")).show(false)

2) DATASET:
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local").appName("Spark session in
Fresco").getOrCreate()
val numDS = spark.range(5, 100, 5)
numDS.show()
numDS.orderBy(desc("id")).show(5)
numDS.describe().show()

3) CREATE DATSET by JSON

{"name":"Rahul","age":"35"}
{"name":"Sachin","age":"46"}
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local").appName("Spark session in
Fresco").getOrCreate()
val peopleDS = spark.read.json("/projects/People.json")
peopleDS.show()
case class Person(name:String,age:String)
object Main
{
def main(args: Array[String])
{
var Person1 = Person("35", "Rahul")
var Person2 = Person("46", "Sachin")
println("Age of the Person1 is " + Person1.age);
println("Name of the Person1 is " + Person1.name);
println("Age of the Person2 is " + Person2.age);
println("Name of the Person2 is " + Person2.name);
}
}

4) PARQUET
{"name":"Rahul","age":"35"}
{"name":"Sachin","age":"46"}
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder.master("local").appName("Spark session in
Fresco").getOrCreate()
val peopleDS = spark.read.json("/projects/People.json")
peopleDS.show()
val peoplePAR = peopleDS.write.parquet("/projects/challenge/data.parquet")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val data = sqlContext.read.parquet("/projects/challenge/data.parquet")
data.show()

5) CSV Files
git clone https://github.com/frescoplaylab/Census.git
import org.apache.spark.sql.SparkSession
val spark = SparkSession.builder().master("local[1]").appName("Spark Session in
Frescoplay").getOrCreate()
val dfs = spark.read.format("csv").option("header",
"true").option("Inferschema","true").option("mode",
"DROPMALFORMED").load("/projects/challenge/Census/demography.csv")
val joined = dfs.join(TotalPopulation, "Total Population")

The C# Player's Guide - 5th Edition - 5.0.0
83% (18)
The C# Player's Guide - 5th Edition - 5.0.0
497 pages
Corce
70% (46)
Corce
206 pages
Introduction To Computer Theory by Cohen Solutions Manual
80% (5)
Introduction To Computer Theory by Cohen Solutions Manual
198 pages
Ap Computer Science Principles Practice Exam and Notes 2021
86% (7)
Ap Computer Science Principles Practice Exam and Notes 2021
108 pages
The Ethical Slut PDF
55% (69)
The Ethical Slut PDF
298 pages
Hacking The Art of Exploitation 2nd Edition Jon Erickson
100% (20)
Hacking The Art of Exploitation 2nd Edition Jon Erickson
492 pages
PrepTest 83 - Print and Take Test - 7sage Lsat
100% (3)
PrepTest 83 - Print and Take Test - 7sage Lsat
46 pages
Typography For Lawyers
33% (6)
Typography For Lawyers
9 pages
50 Phone Hacks DR - Brad
58% (19)
50 Phone Hacks DR - Brad
29 pages
Apache Cassandra Administrator Associate - Exam Practice Tests
From Everand
Apache Cassandra Administrator Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
One-Page Mythic GME
100% (8)
One-Page Mythic GME
11 pages
Pyspark Questions & Scenario Based
No ratings yet
Pyspark Questions & Scenario Based
25 pages
Cassandra PPT Final
No ratings yet
Cassandra PPT Final
23 pages
C# Cheat Sheet
100% (6)
C# Cheat Sheet
12 pages
Learn Python in A Day
100% (14)
Learn Python in A Day
141 pages
De Mod 5 Deploy Workloads With Databricks Workflows
No ratings yet
De Mod 5 Deploy Workloads With Databricks Workflows
19 pages
Rakesh Kumar - 21554244 - Big Data - Assessment 2
No ratings yet
Rakesh Kumar - 21554244 - Big Data - Assessment 2
23 pages
Pyspark MCQ
No ratings yet
Pyspark MCQ
3 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
3 pages
Informatica Big Data Management Course Agenda
100% (2)
Informatica Big Data Management Course Agenda
4 pages
All Codes Mobile
100% (1)
All Codes Mobile
53 pages
PySpark Cheatsheet
No ratings yet
PySpark Cheatsheet
12 pages
Num Py
No ratings yet
Num Py
46 pages
Pyspark IQ FREE Guide
No ratings yet
Pyspark IQ FREE Guide
57 pages
Apache Spark Architecture
No ratings yet
Apache Spark Architecture
7 pages
Hive Commands
No ratings yet
Hive Commands
3 pages
Databricks Course Curriculum
No ratings yet
Databricks Course Curriculum
2 pages
Complete Guide To Spark Memory Management 1726709042
No ratings yet
Complete Guide To Spark Memory Management 1726709042
11 pages
Python-IQ
No ratings yet
Python-IQ
123 pages
Real Time Data Processing With PDI
No ratings yet
Real Time Data Processing With PDI
15 pages
Mongodb Interview Questions (V4.4)
No ratings yet
Mongodb Interview Questions (V4.4)
25 pages
Fundamentals of Apache Sqoop Notes
No ratings yet
Fundamentals of Apache Sqoop Notes
66 pages
Spark Interview QUestions
No ratings yet
Spark Interview QUestions
200 pages
Pyspark Learning Hub
No ratings yet
Pyspark Learning Hub
7 pages
Spark Syllabus 1
No ratings yet
Spark Syllabus 1
3 pages
5 - Programming With RDDs and Dataframes
No ratings yet
5 - Programming With RDDs and Dataframes
32 pages
Databricks
No ratings yet
Databricks
4 pages
Big Data Hadoop Interview Questions and Answers
100% (1)
Big Data Hadoop Interview Questions and Answers
25 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Top Answers To Spark Interview Questions
No ratings yet
Top Answers To Spark Interview Questions
32 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
100% (1)
Some of The Frequently Asked Interview Questions For Hadoop Developers Are
72 pages
Spark Summit East 2015 - Adv Dev Ops - Student Slides
No ratings yet
Spark Summit East 2015 - Adv Dev Ops - Student Slides
219 pages
Hbase: Q) What Is Hbase ?
No ratings yet
Hbase: Q) What Is Hbase ?
15 pages
PySpark FP Course ID 58339
No ratings yet
PySpark FP Course ID 58339
44 pages
Apache Hue-Cloudera
No ratings yet
Apache Hue-Cloudera
63 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
Transformations and Actions: A Visual Guide of The API
No ratings yet
Transformations and Actions: A Visual Guide of The API
122 pages
Data Engineering SQL Top 100 Questions With Answers
No ratings yet
Data Engineering SQL Top 100 Questions With Answers
297 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Must Know Pyspark Coding Before Databricks Interview
No ratings yet
Must Know Pyspark Coding Before Databricks Interview
7 pages
PySpark Questions
No ratings yet
PySpark Questions
5 pages
Pyspark
No ratings yet
Pyspark
31 pages
Snowflake SnowPro Core Certification Exam Questions - Page 24 of 27 - SkillCertPro
No ratings yet
Snowflake SnowPro Core Certification Exam Questions - Page 24 of 27 - SkillCertPro
1 page
Top Pyspark InterviewQuestions
No ratings yet
Top Pyspark InterviewQuestions
21 pages
Unit 5
100% (1)
Unit 5
109 pages
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
No ratings yet
Apache Airflow On Docker For Complete Beginners - Justin Gage - Medium
12 pages
Data Engineering
No ratings yet
Data Engineering
6 pages
PySpark Comprehensive Notes⚡
No ratings yet
PySpark Comprehensive Notes⚡
59 pages
Hive Interview Questions Answers
No ratings yet
Hive Interview Questions Answers
6 pages
Spark Details
No ratings yet
Spark Details
11 pages
Snow SQL
No ratings yet
Snow SQL
3 pages
Spark Notes
No ratings yet
Spark Notes
37 pages
Most Frequently Asked Azure Data Factory Interview Questions
0% (1)
Most Frequently Asked Azure Data Factory Interview Questions
5 pages
Hadoop Questions
No ratings yet
Hadoop Questions
41 pages
Hadoop HBase Notes-Abhijit-Nagargoje
No ratings yet
Hadoop HBase Notes-Abhijit-Nagargoje
24 pages
Apache Spark Analytics Made Simple PDF
No ratings yet
Apache Spark Analytics Made Simple PDF
76 pages
BIG DATA & Hadoop Interview Questions With Answers
No ratings yet
BIG DATA & Hadoop Interview Questions With Answers
9 pages
AaxHadoop Interview Questions and Answers
No ratings yet
AaxHadoop Interview Questions and Answers
37 pages
Scala PDF
No ratings yet
Scala PDF
29 pages
Hive Interview Questions
No ratings yet
Hive Interview Questions
5 pages
The Complete Spring Boot: A Comprehensive Guide to Modern Java Applications
From Everand
The Complete Spring Boot: A Comprehensive Guide to Modern Java Applications
Aarav Joshi
No ratings yet
PostgreSQL 9 High Availability Cookbook
From Everand
PostgreSQL 9 High Availability Cookbook
Shaun M. Thomas
5/5 (2)
Heap
No ratings yet
Heap
1 page
Writing SQL Queries
No ratings yet
Writing SQL Queries
15 pages
CGFNG
No ratings yet
CGFNG
1 page
L-Systems: Simulation of Development and Growth
No ratings yet
L-Systems: Simulation of Development and Growth
56 pages
Introduction To Automata Introduction To Automata Theoryy: Reading: Chapter 1
No ratings yet
Introduction To Automata Introduction To Automata Theoryy: Reading: Chapter 1
24 pages
App
No ratings yet
App
2 pages
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
100% (1)
Coding With JavaScript For Dummies Everything To Know About JavaScript (2020) - 40153
247 pages
AI Tools and Prompts
100% (4)
AI Tools and Prompts
94 pages
Linux Cheat Sheet
No ratings yet
Linux Cheat Sheet
4 pages
Learn To Code Getting Started Guide
100% (4)
Learn To Code Getting Started Guide
23 pages
Introduction To Computer Science
100% (6)
Introduction To Computer Science
202 pages
Simple Sabotage Field Manual
100% (2)
Simple Sabotage Field Manual
16 pages
Eat That Frog
100% (10)
Eat That Frog
124 pages
Structured and Unstructured Maintenance With Example
0% (1)
Structured and Unstructured Maintenance With Example
9 pages
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
No ratings yet
NWO, Illuminati, Freemason, Occult, Bible Prophecy, Conspiracy, Secret Society, Etc. Links
47 pages
The JavaScript Beginner's Handbook
90% (10)
The JavaScript Beginner's Handbook
76 pages
Credit Card Processing System
No ratings yet
Credit Card Processing System
18 pages
Do You Speak Java
No ratings yet
Do You Speak Java
186 pages
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
83% (6)
Python Programming For Beginners - A Crash Course To Learn Python and Other Recommended Coding
86 pages
Learn To Code HTML and CSS Develop Style Websites PDF
100% (2)
Learn To Code HTML and CSS Develop Style Websites PDF
595 pages
How To Use PATS Module Initialization Function
No ratings yet
How To Use PATS Module Initialization Function
5 pages
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
No ratings yet
LINUX COMMAND LINE An Introduction To Linux Command Line Environment
174 pages
Learning Liquid
100% (1)
Learning Liquid
89 pages
IO24ULN2803v1 Data Sheet
No ratings yet
IO24ULN2803v1 Data Sheet
5 pages
Digital Marketing UNIT-II
No ratings yet
Digital Marketing UNIT-II
15 pages
Issued by RES: Second Edition
No ratings yet
Issued by RES: Second Edition
89 pages
ZPTC Creo 10 Read This First
No ratings yet
ZPTC Creo 10 Read This First
10 pages
Hamza Hussain - Resume - Portfolio
No ratings yet
Hamza Hussain - Resume - Portfolio
3 pages
Laserfiche Security Datasheet
No ratings yet
Laserfiche Security Datasheet
2 pages
Abdulla
No ratings yet
Abdulla
18 pages
Update Log
No ratings yet
Update Log
2 pages
DSDP Submission User-Manual
No ratings yet
DSDP Submission User-Manual
12 pages
Full download (Ebook) The Routledge Social Science Handbook of AI (Routledge International Handbooks) by Anthony Elliott (editor) ISBN 9780367188252, 0367188252 pdf docx
100% (7)
Full download (Ebook) The Routledge Social Science Handbook of AI (Routledge International Handbooks) by Anthony Elliott (editor) ISBN 9780367188252, 0367188252 pdf docx
81 pages
Computer Science - XII
No ratings yet
Computer Science - XII
33 pages
Plotting Results With Induced Polarization: Creating Pseudo-Section Plots
No ratings yet
Plotting Results With Induced Polarization: Creating Pseudo-Section Plots
15 pages
Neutron
No ratings yet
Neutron
1,248 pages
Unified Process
No ratings yet
Unified Process
50 pages
Grade 10 q2 Math Las No Ak
No ratings yet
Grade 10 q2 Math Las No Ak
84 pages
Introduction To Image Enhancement Techniques
No ratings yet
Introduction To Image Enhancement Techniques
7 pages
Fast Return To LTE Nokia
100% (1)
Fast Return To LTE Nokia
16 pages
Técnicas Cripto Farm
No ratings yet
Técnicas Cripto Farm
1 page
NT00089-EN-09 - Flite 116 & G200 User's Manual
No ratings yet
NT00089-EN-09 - Flite 116 & G200 User's Manual
48 pages
Usability Inspection Methods: Usman Ahmad
No ratings yet
Usability Inspection Methods: Usman Ahmad
49 pages
AGE 302 -Introduction to Computer Application in Agricultural Economics
No ratings yet
AGE 302 -Introduction to Computer Application in Agricultural Economics
25 pages
AI Exam 2021-2022 UET
No ratings yet
AI Exam 2021-2022 UET
2 pages
Uzbekistan Visa Applications Instructions
No ratings yet
Uzbekistan Visa Applications Instructions
5 pages
How To Think Like A Hacker and Stop Attacks Faster With The Mitre Att&Ck Framework
No ratings yet
How To Think Like A Hacker and Stop Attacks Faster With The Mitre Att&Ck Framework
55 pages

Spark SQL

Uploaded by

Spark SQL

Uploaded by

1) DATAFRAME:

3) CREATE DATSET by JSON

You might also like