0% found this document useful (0 votes)

81 views4 pages

Pysparkdump

The document contains questions and answers related to Apache Spark. It covers topics like RDDs, DataFrames, caching, partitioning, dynamic allocation, and Spark Streaming. Many questions focus on transformations and actions that can be performed on RDDs and DataFrames as well as features and capabilities of the Spark platform.

Uploaded by

VIDHYA HK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

81 views4 pages

Pysparkdump

Uploaded by

VIDHYA HK

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

1) Sfpd RDD

Ans ; val pairs = sfpd.map(x=>x.parallelize))

2) repartition(5) vs coalesce(5,shuffle=True)

Ans; True

3) Which is true for running spark on Hadoop YARN?

Ans; there are two deploy modes client and cluster

4) What is dynamic allocation?

Ans; dynamic allocation is a property where executors can be released back to cluster
resource pool if they are idle for specified period of time

5) Accumulators are incremented can be read from spark workers ? T or F?

Ans; FALSE

6) The keys transformation returns an RDD with ordered keys from key value psir RDD? T or
F

Ans; TRUE

7) groupbyKey is less efficient than reducebykey ?

ans;

8) which partitioner class used to order keys accor to sort order resp to given type?

Ans; Rangepartitioner

9) the primary ML api for spark now is ____ based api?

Ans; Dataframe

10) an existing RDD unhcrRDD contains refugee?

Ans; val country = unhcrRDD.map(x=>(x(0),x(3))).reducebykey((a,b)=>a+b)

11) the number of stages in a job is no of RDD in DAG, scheduler can truncate lineage ?

ans; RDD is cched or persisted

12) combining a set of filtered edges and filtered vertives from a graph creates what structure?

Ans; subgraph

13) what RDD function returns max,min,count,mean,std deviation?

Ans; stats

14) spark broadcast variables and setting variables in driver prog in pyspark same?

Ans;

15) which of following in scala will give top 10 resolutuins assuming sfpdDF is dataframe
registered as table-sfpd?

Ans; sqlContext.sql(“SELECT resolution.count(incidentnum) AS inccount FROM sfpd

GROUP BY resolution ORDER BY inccount DESC LIMIT 10”)

sfpdDF.groupBy(“resolution).count.sort……….show(10)

16) Given the pair RDD country that contain tuple (country, count()) which one to get lowest
refugee in scala?

Ans; val low- country.map(x=>(x._2,x._1)).sortbykey(false).first)

17) Which parameters required for windowed operatrion as reducebykeyAndwindow?

Ans; window length and sliding interval

18) What r some of the things u can monitor in spark web UI?

Ans; All of above

19) Which of the following is not feature of spark?

Ans; it is cost efficient

20) How to enable dynamic allocation?

Ans; spark.dynamicallocation.enabled=True

21) Which of thebelow to remove broadcast variable bvar from memory?

Ans; bvar.unpersist()
22) A dataframe can be created from existing RDD . You would create dataframe from existing
rdd by inferring schema using case classes in which case?

Ans; if all your users are going to need dataset parsed in same way

23) Dstream internally is?

Ans;

24) MEMORY AND DISK SER storage level options in RDD?

Ans; in memory,ondisk,serialized

25) Which partition hinder spark performance?

Ans; Both small and large

26) Which dataframe method is used to remove column from resultant dataframe?

Ans; drop()

27) The foreach and map difference?

Ans; foreach is action and map is transformation

28) Difference between take(1) and first() ?

Ans; take(1) returns an array with one element from an RDD , first() returns one element not
in array

29) Caching can use disk if memory not available. T or F

Ans; TRUE

30) sparkSQL translated commands into codes ,processed by ?

ans; executor node

31) which of following is true for spark application on Hadoop YARN?

Ans; there are two deploy modes .client and cluster mode

32) apache spark has api’s in ?

ans; All of above

33) pyspark is bunch figuring structure keeps running on grp of item and perform information
unification . T or F.

ans;

34) function used to call program written In shellscipt/perl into pyspark/

ans; pipe()

35) ___ leverages spark core fast scheduling capability for streaminganalytics?

Ans; SparkStreaming

36) We can create dataframe using

Ans; ALL of the above

37) Which Dstream output operation used to write output to console?

Ans; pprint()

38) Which of following not feature of spark?

Ans; it is cost efficient

39) Some ways of improving performance of ur spark app einclude?

Ans; All of the above

40) Dataset was introduced in which spark release?

Ans; spark 1.6

Databricks Certified Data Engineer Associate Practice Questions
No ratings yet
Databricks Certified Data Engineer Associate Practice Questions
6 pages
Databricks Certified Professional Data Engineer 1 1
No ratings yet
Databricks Certified Professional Data Engineer 1 1
16 pages
Azure Databricks Notes
No ratings yet
Azure Databricks Notes
20 pages
Pyspark Basics
No ratings yet
Pyspark Basics
16 pages
Matillion - Interview - Questions
No ratings yet
Matillion - Interview - Questions
2 pages
Pyspark MCQ
No ratings yet
Pyspark MCQ
3 pages
Spark Vs Hadoop Features Spark
No ratings yet
Spark Vs Hadoop Features Spark
9 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Data Bricks
No ratings yet
Data Bricks
20 pages
Interview Ques
No ratings yet
Interview Ques
102 pages
Databricks Data Engineer Associate Notes
No ratings yet
Databricks Data Engineer Associate Notes
5 pages
Pyspark Interview 1738079940
No ratings yet
Pyspark Interview 1738079940
6 pages
50 PySpark Interview Questions 1732556477
No ratings yet
50 PySpark Interview Questions 1732556477
7 pages
SQL - & - Pyspak
No ratings yet
SQL - & - Pyspak
6 pages
Interview Questions On ADF
No ratings yet
Interview Questions On ADF
2 pages
Databricksmcqsquestionsandanswers
No ratings yet
Databricksmcqsquestionsandanswers
5 pages
Databricks Data Engineer Professional Practice
No ratings yet
Databricks Data Engineer Professional Practice
10 pages
Py 1731703428
No ratings yet
Py 1731703428
8 pages
Master Pyspark Zero To Hero 1738689679
No ratings yet
Master Pyspark Zero To Hero 1738689679
102 pages
Pyspark
No ratings yet
Pyspark
31 pages
PySpark and Azure Data Engineer Free Notes
No ratings yet
PySpark and Azure Data Engineer Free Notes
65 pages
Databricks Certified Data Engineer Associate
No ratings yet
Databricks Certified Data Engineer Associate
4 pages
Srikanth
No ratings yet
Srikanth
7 pages
Hadoop Interview Question
No ratings yet
Hadoop Interview Question
25 pages
DataStage Faq S
No ratings yet
DataStage Faq S
57 pages
Pyspark Practice - Databricks
No ratings yet
Pyspark Practice - Databricks
66 pages
Databricks
No ratings yet
Databricks
11 pages
Big Data With Apache Spark 3 and Python From Zero To Expert
No ratings yet
Big Data With Apache Spark 3 and Python From Zero To Expert
28 pages
Parallel Processing
No ratings yet
Parallel Processing
38 pages
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
No ratings yet
Deloitte Pyspark Interview Questions For Data Engineer 2024 - by Ronit Malhotra - Jun, 2024 - Medium
9 pages
SCD Type-1,2 Implementation in Pyspark
No ratings yet
SCD Type-1,2 Implementation in Pyspark
6 pages
Pyspark RDD Cheat Sheet Python For Data Science
No ratings yet
Pyspark RDD Cheat Sheet Python For Data Science
1 page
Databricks Performance Tuning
No ratings yet
Databricks Performance Tuning
9 pages
Spark Syllabus 1
No ratings yet
Spark Syllabus 1
3 pages
Snowproans
No ratings yet
Snowproans
85 pages
Complete Guide To Spark Memory Management 1726709042
No ratings yet
Complete Guide To Spark Memory Management 1726709042
11 pages
Pyspark Study Material
No ratings yet
Pyspark Study Material
5 pages
PLSQL Introduction Final
No ratings yet
PLSQL Introduction Final
81 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
PySpark Optimization Scenarios - Wipro
No ratings yet
PySpark Optimization Scenarios - Wipro
8 pages
Etl VS Elt
No ratings yet
Etl VS Elt
8 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
Py Spark
No ratings yet
Py Spark
10 pages
SCD Type 2. Pyspark
No ratings yet
SCD Type 2. Pyspark
7 pages
Snowflake SnowPro Core Certification Exam Questions - Page 24 of 27 - SkillCertPro
No ratings yet
Snowflake SnowPro Core Certification Exam Questions - Page 24 of 27 - SkillCertPro
1 page
Data Bricks
No ratings yet
Data Bricks
43 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Master PySpark 1-18
No ratings yet
Master PySpark 1-18
59 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Databricks Quiz Questions
No ratings yet
Databricks Quiz Questions
35 pages
Data Egineer Interview Questions
No ratings yet
Data Egineer Interview Questions
126 pages
Machine Learning With Spark
No ratings yet
Machine Learning With Spark
26 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
Apache Druid: Sudhindra Tirupati Nagaraj
No ratings yet
Apache Druid: Sudhindra Tirupati Nagaraj
12 pages
2018 02 08 Whats New in Apache Spark 2 180213220045
No ratings yet
2018 02 08 Whats New in Apache Spark 2 180213220045
57 pages
Maneesh Azure
No ratings yet
Maneesh Azure
6 pages
DWH & Datastage
No ratings yet
DWH & Datastage
5 pages
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
From Everand
Data Engineering with Scala and Spark: Build streaming and batch pipelines that process massive amounts of data using Scala
Eric Tome
No ratings yet
Optimizing Hadoop for MapReduce
From Everand
Optimizing Hadoop for MapReduce
Khaled Tannir
No ratings yet
Fibre Optic Cables
No ratings yet
Fibre Optic Cables
11 pages
Big Data Hadoop MCQ Question
No ratings yet
Big Data Hadoop MCQ Question
109 pages
Stock Alert System
No ratings yet
Stock Alert System
1 page
Module Queries
No ratings yet
Module Queries
2 pages
Demo
No ratings yet
Demo
2 pages

Pysparkdump

Uploaded by

Pysparkdump

Uploaded by

1) Sfpd RDD

Ans ; val pairs = sfpd.map(x=>x.parallelize))

3) Which is true for running spark on Hadoop YARN?

Ans; there are two deploy modes client and cluster

4) What is dynamic allocation?

5) Accumulators are incremented can be read from spark workers ? T or F?

7) groupbyKey is less efficient than reducebykey ?

9) the primary ML api for spark now is ____ based api?

10) an existing RDD unhcrRDD contains refugee?

Ans; val country = unhcrRDD.map(x=>(x(0),x(3))).reducebykey((a,b)=>a+b)

ans; RDD is cched or persisted

13) what RDD function returns max,min,count,mean,std deviation?

Ans; sqlContext.sql(“SELECT resolution.count(incidentnum) AS inccount FROM sfpd

Ans; val low- country.map(x=>(x._2,x._1)).sortbykey(false).first)

17) Which parameters required for windowed operatrion as reducebykeyAndwindow?

Ans; window length and sliding interval

Ans; All of above

19) Which of the following is not feature of spark?

Ans; it is cost efficient

20) How to enable dynamic allocation?

21) Which of thebelow to remove broadcast variable bvar from memory?

23) Dstream internally is?

24) MEMORY AND DISK SER storage level options in RDD?

25) Which partition hinder spark performance?

Ans; Both small and large

27) The foreach and map difference?

Ans; foreach is action and map is transformation

28) Difference between take(1) and first() ?

29) Caching can use disk if memory not available. T or F

30) sparkSQL translated commands into codes ,processed by ?

ans; executor node

31) which of following is true for spark application on Hadoop YARN?

32) apache spark has api’s in ?

ans; All of above

34) function used to call program written In shellscipt/perl into pyspark/

36) We can create dataframe using

Ans; ALL of the above

37) Which Dstream output operation used to write output to console?

38) Which of following not feature of spark?

Ans; it is cost efficient

39) Some ways of improving performance of ur spark app einclude?

Ans; All of the above

40) Dataset was introduced in which spark release?

Ans; spark 1.6

You might also like