Interviewsss

The document contains a comprehensive list of mock interview questions covering PySpark theory, coding, and scenarios, as well as Python theory and coding questions. It also includes various database and SQL-related questions, project explanations, and specific technical topics such as memory management, data compression, and deadlocks. The questions are designed to assess knowledge in data processing, programming, and database management.

Uploaded by

Satyajit Ligade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

32 views4 pages

Interviewsss

Uploaded by

Satyajit Ligade

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

Mock Interview Questions - Please add yours

PySpark Theory

Spark Architecture
Resilient Distributed Dataset (RDD) vs DataFrame vs Dataset
What’s Lazy Evaluation ?
AQE
Checkpointing in spark
Lineage Graph vs DAG
Spark job Optimization techniques
Caching and Persistence (Memory and Disk) with diff levels
Catalyst Optimizer
Joins and Join Strategies (Broadcast Join, Sort Merge Join)
Spark Submit
DAG Visualization
Serialization (Kryo vs. Java Serialization)
how to debug a failed PySpark job

PySpark Coding

Word count program using RDD and using DF

Explicit schema definition -StructType()...
Spark-Submit command
nth highest salary using window functions
SparkSession creation
DataFrame from a CSV file with corrupted records
How to perform a groupBy
Handle missing values with example
rename columns in a DataFrame

PySpark Scenarios
There are two tables- Sales and Products (with similar columns inside) Find the
total and average sales per product
How can u handle skewed data
How to retry for failed jobs
How can you remove null records
Duplicate records - as Bad records

Python (Theory)
Memory management in python
Monkey Patching
OOPs
Exception Handling with all blocks
Tuple vs Set vs Frozenset
Some dictionary methods
All data types

Python (Coding)
List flattening
Eg. list1 = [12,3[‘string’, 2,3,4,5],(456)]
List [10,2,3,4,5,6,7] op > 70 (maximum product)
List comprehension examples
Dict comprehension with examples
Decorators
Generators
Example of inheritance
Try except block with finally and its execution
String reversing
Fibonacci
Prime number
Even odd
Duplicate removal without changing order
ip > (4,5,6,2,7,9,4,3,2,4,2) op> (4,5,6,2,7,9,3)

Project explanation

Odc discussion questions

1)
2 lists
L1 = [‘rohitt’,’dhoni’,Sachin]
l2 = [10,20,45]
output = [(‘rohit’,10),(‘dhoni’,20),(‘sachin’,45)]
2)list1 = [‘mohini’,’jojo’,’gaurav’,’ajay’]
find the count of elements having the same length
3)pivot syntax
4)first name,second name,third name separate using regex
5)pyspark args and pyspark variable
6)cdc
7)database deadlock

8) Select department,avg(salary) from [Link]

groupby department
having avg(salary)>(select avg(salary) from [Link])
SELECT [Link],AVG([Link]) FROM [Link] A
INNER JOIN [Link] B ON([Link] = [Link])
GROUPBY [Link]
HAVING AVG([Link])> AVG([Link])
Which of the above query is faster??
9) how to implement normalization in practical?(in sql and spark)
10)which schema did we use in airline ,how do we choose a schema for a project?
11) if we have already cached dataframe ,how do we know that the dataframe is
already cached? Ans df.is_cached for dataframe
And for table [Link].is_cached
12)spark submit using 3 ways
13)where is spark submit command saved?
14) is cache and persist a action?
15)is oracle sql , OLTP? OR OLAP?
16) We have sales table and in that we have column name transaction where the
transaction is recorded on hourly basis..We have to fetch the time from that column
17)by default which join happens in spark?
18 ) garbage collector in spark? How it impacts application execution
19 ) types of connectors
20)diff between jdbc and odbc?
21)how to read excel file in spark
22)datawarehouse architecture
23) how to read json,csv in python
24)how do we know if the data is skewed
25) how are tasks discarded in speculative analysis(practically)
26) scd and scd2 codes
27)how to check the size of the data in the partition
28) sql,python architecture
29) diff between sql,python,spark
30) self scenarios,like decorator,generator
31)why do we select the cores between 3-5 in spark ?
32)python optimization tecniques
33) what is deadlock
34)what is GIL?
35)init and new - diff between
36)diff between self and init
37)how read textfile in spark other than rdd?
38) memory management in sql
39) why do we store bad data
40) recursive funx in sql
41) scd2 use case
42) database deadlock
43) dynamic schema change in sql

44) user_id login_date

1 2024-11-29
1 2024-11-30
1 2024-11-31
2 2024-11-29
2 2024-11-30
2 2024-11-31
3 2024-11-29
3 2024-11-30
3 2024-12-02
Output:
user_id
1
2
Get user ids who have logged in for 3 consecutive days

45)500gb data – how will we optimize?

46)parquet file and avro – format ( how is data stored)
47)five algorithms of data compression
48) Mask bank account number using 2 approaches in sql n spark
49)how to explain spark architecture using spark submit
50) File compressing types
51)if we have 5 column table with data and next day we have incremental load and
the data changes and we have 10 columns , how will we handle the data
52) if we have two 100 gb tables , how will u use broadcast join on any one of them
53)python execution plan
54)PEP -8
55)what is py and pyc files
56)monkey patching
57)stack and heap memory in python
58)procedure bottleneck
59)how will you handle OOM in executor and driver?
60)deadlock in python,sql,spark
61)database logging
62)how will you maintain python code
63) monotonically increasing id in pyspark
64)map partition in spark
65)naming convention in sql
66)sql memory management
67)GIL
68)synchronous and asynchronus
69)what is diff between store procedure and trigger
70) name mangling in python
71)iter tools
72)lineage graph
73)mapreduce
74)accumulator
75)file system - hdfs
76)sliding window option
77)temp view , global view
78)vertices and edges

Question
No ratings yet
Question
6 pages
Data Engineer
No ratings yet
Data Engineer
19 pages
Interview Questions For 5 Yrs of Exp
No ratings yet
Interview Questions For 5 Yrs of Exp
6 pages
Data Engineer Interview Prep
No ratings yet
Data Engineer Interview Prep
27 pages
Important Interview Qa
No ratings yet
Important Interview Qa
13 pages
Top 50 Industry-Relevant Data Analyst Interview Q - A
No ratings yet
Top 50 Industry-Relevant Data Analyst Interview Q - A
5 pages
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
No ratings yet
Interview Q & A (SQL Spark HIVE Airflow AWS Kafka) - 1
25 pages
Data Engineering Part - 2
No ratings yet
Data Engineering Part - 2
21 pages
SQL and PySpark Interview Questions
No ratings yet
SQL and PySpark Interview Questions
15 pages
PySpark Interview Questions
No ratings yet
PySpark Interview Questions
2 pages
Myinterview Qs
No ratings yet
Myinterview Qs
9 pages
SQL Questions
No ratings yet
SQL Questions
25 pages
Python SQL Quiz With Answers
No ratings yet
Python SQL Quiz With Answers
2 pages
Shaik 200 Questions Data Engineer Interview Guide
No ratings yet
Shaik 200 Questions Data Engineer Interview Guide
76 pages
PySpark DataFrame Operations Guide
No ratings yet
PySpark DataFrame Operations Guide
7 pages
Pyspark and SQL
No ratings yet
Pyspark and SQL
57 pages
Full PySpark Interview QA
No ratings yet
Full PySpark Interview QA
5 pages
New Questions From Batch
No ratings yet
New Questions From Batch
7 pages
Deloitte Data Engineer Interview Experience (0-3 Yoe)
No ratings yet
Deloitte Data Engineer Interview Experience (0-3 Yoe)
22 pages
Spark Interview Questions
No ratings yet
Spark Interview Questions
4 pages
LT Mindtree
No ratings yet
LT Mindtree
3 pages
SQL, Python, Azure Interview Questions
No ratings yet
SQL, Python, Azure Interview Questions
8 pages
BigData - Recent Interview Q's
No ratings yet
BigData - Recent Interview Q's
25 pages
Midterm Exam Multiple Choice
No ratings yet
Midterm Exam Multiple Choice
8 pages
Amazon Data Engineer Interview Guide
No ratings yet
Amazon Data Engineer Interview Guide
3 pages
Apache Spark Interview Prep Guide
No ratings yet
Apache Spark Interview Prep Guide
18 pages
SP 3
No ratings yet
SP 3
18 pages
Freedium - cfd-PySpark Interview Questions
No ratings yet
Freedium - cfd-PySpark Interview Questions
17 pages
PySpark Real Time Q&A
No ratings yet
PySpark Real Time Q&A
5 pages
Midterm Exam Multiple Choice
No ratings yet
Midterm Exam Multiple Choice
8 pages
Pyspark 4
No ratings yet
Pyspark 4
5 pages
Barclays Data Engineer Interview Questions
No ratings yet
Barclays Data Engineer Interview Questions
17 pages
Q1. Difference Between Cache and Pe
No ratings yet
Q1. Difference Between Cache and Pe
13 pages
Companywise Interview Questions
No ratings yet
Companywise Interview Questions
71 pages
Pyspark Interview Q & A in Topic Wise
No ratings yet
Pyspark Interview Q & A in Topic Wise
5 pages
Data Engineering Interview Prep
No ratings yet
Data Engineering Interview Prep
6 pages
Interview Questions
No ratings yet
Interview Questions
2 pages
Ade Companywise Interview
No ratings yet
Ade Companywise Interview
133 pages
Mastercard Data Engineer Interview Questions
No ratings yet
Mastercard Data Engineer Interview Questions
16 pages
PySpark Cheatsheet
100% (1)
PySpark Cheatsheet
12 pages
Data Engineering Interview QA
No ratings yet
Data Engineering Interview QA
4 pages
SQL and Data Analysis Interview Questions
No ratings yet
SQL and Data Analysis Interview Questions
9 pages
Midterm Exam Practice: Distributed Systems & Apache Spark
No ratings yet
Midterm Exam Practice: Distributed Systems & Apache Spark
24 pages
100 Interview Questions
No ratings yet
100 Interview Questions
15 pages
Senior Data Engineer Qna
No ratings yet
Senior Data Engineer Qna
4 pages
PYSPARK Interview Questions
100% (4)
PYSPARK Interview Questions
126 pages
Pyspark Scenario Based Qs
No ratings yet
Pyspark Scenario Based Qs
13 pages
Complete Data Engineering Interview QA
No ratings yet
Complete Data Engineering Interview QA
6 pages
SQL Interview Questions
No ratings yet
SQL Interview Questions
3 pages
Interview Asked Questions
No ratings yet
Interview Asked Questions
6 pages
Azure Comapny Wise Question
No ratings yet
Azure Comapny Wise Question
68 pages
Top 100 Data Analyst Interview Questions
No ratings yet
Top 100 Data Analyst Interview Questions
16 pages
Date Truncation in Databricks
No ratings yet
Date Truncation in Databricks
1 page
DBMS Viva Questions
No ratings yet
DBMS Viva Questions
6 pages
Most Asked Interview Questions in Top MNC'S: 1. A. Partitioning Caching Broadcasting
No ratings yet
Most Asked Interview Questions in Top MNC'S: 1. A. Partitioning Caching Broadcasting
4 pages
Pyspark Theory Questions
No ratings yet
Pyspark Theory Questions
5 pages
CDE Sample Interview Questions
No ratings yet
CDE Sample Interview Questions
10 pages
???? ?????????? ????
No ratings yet
???? ?????????? ????
4 pages
v3 GCP Service Wise Interview Questions
No ratings yet
v3 GCP Service Wise Interview Questions
62 pages
Vedant Int Ques Till Now
No ratings yet
Vedant Int Ques Till Now
2 pages
Pyspark 1
No ratings yet
Pyspark 1
7 pages
Spark Theory
No ratings yet
Spark Theory
26 pages
Huawei OceanStor 2600 V3 Storage System Product Description
100% (1)
Huawei OceanStor 2600 V3 Storage System Product Description
142 pages
Module 1 Methods of Research
No ratings yet
Module 1 Methods of Research
27 pages
ADBMS Lab Manual-1
No ratings yet
ADBMS Lab Manual-1
12 pages
Engaging Science Method for Teachers
No ratings yet
Engaging Science Method for Teachers
9 pages
Apache Spark Tutorial
100% (4)
Apache Spark Tutorial
36 pages
TC58NVG2S0FTA00 Toshiba
No ratings yet
TC58NVG2S0FTA00 Toshiba
71 pages
C# Array Types and Declarations Guide
No ratings yet
C# Array Types and Declarations Guide
13 pages
Answers For Analytics Data Science Artificial Intelligence Systems For Decision Support 11th Edition Global Edition Ebook and TestBank Bundle
No ratings yet
Answers For Analytics Data Science Artificial Intelligence Systems For Decision Support 11th Edition Global Edition Ebook and TestBank Bundle
329 pages
Challenges in Big Data Analytics Techniques
No ratings yet
Challenges in Big Data Analytics Techniques
6 pages
Introduction to Knowledge Modules
No ratings yet
Introduction to Knowledge Modules
9 pages
Introduction To SQL Part2
No ratings yet
Introduction To SQL Part2
45 pages
AAM Praposal
No ratings yet
AAM Praposal
3 pages
DBMS - Question Bank - Important Questions
No ratings yet
DBMS - Question Bank - Important Questions
5 pages
Zillow Case Study: Data Management Insights
No ratings yet
Zillow Case Study: Data Management Insights
2 pages
Essential v2 Migration Guide
No ratings yet
Essential v2 Migration Guide
27 pages
Student Database System Effectiveness
No ratings yet
Student Database System Effectiveness
117 pages
GR 9 AI Revision Worksheet
No ratings yet
GR 9 AI Revision Worksheet
9 pages
Dbms Lab Manual (15csl58)
100% (1)
Dbms Lab Manual (15csl58)
80 pages
Program To Search in A File Having Record Maintained Through Classes
No ratings yet
Program To Search in A File Having Record Maintained Through Classes
17 pages
Mysql Cheat Sheet Ledger
No ratings yet
Mysql Cheat Sheet Ledger
1 page
Research Paper - Chapter 1 - 3
No ratings yet
Research Paper - Chapter 1 - 3
30 pages
MIS Lecture 3-Chapter2 (Part1)
No ratings yet
MIS Lecture 3-Chapter2 (Part1)
21 pages
ETL vs. ELT: Frictionless Data Integration - Diyotta
100% (1)
ETL vs. ELT: Frictionless Data Integration - Diyotta
3 pages
SSIS From Oracle To SQL Server
No ratings yet
SSIS From Oracle To SQL Server
11 pages
MPR-Mansi Dwivedi
No ratings yet
MPR-Mansi Dwivedi
50 pages
MSMEs: Digital Marketing & AI Impact
No ratings yet
MSMEs: Digital Marketing & AI Impact
54 pages
Chapter 6 Notes
No ratings yet
Chapter 6 Notes
10 pages
Database Systems and Web (15B11CI312)
No ratings yet
Database Systems and Web (15B11CI312)
24 pages
Intro To Business Analytics Syllabus 8 - 6 - 2020
No ratings yet
Intro To Business Analytics Syllabus 8 - 6 - 2020
8 pages
EDI Message Types and SAP IDocs List
100% (2)
EDI Message Types and SAP IDocs List
5 pages

Interviewsss

Uploaded by

Interviewsss

Uploaded by

Mock Interview Questions - Please add yours

Word count program using RDD and using DF

Odc discussion questions

8) Select department,avg(salary) from [Link]

44) user_id login_date

45)500gb data – how will we optimize?

You might also like