0% found this document useful (0 votes)

10 views10 pages

Day77

Uploaded by

Lapi Lapil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views10 pages

Day77

Uploaded by

Lapi Lapil

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 10

Scenario Based

Interview
Question

Ganesh. R
Problem Statement

Problem Statement:
Assume you have an events table on Facebook app
analytics. Write a query to calculate the click-through
rate (CTR) for the app in 2022 and round the results to 2
decimal places.
Definition and note:
Percentage of click-through rate (CTR) = 100.0 *
Number of clicks / Number of impressions To avoid
integer division, multiply the CTR by 100.0, not 100.
Input Table Data

#Define the employee data

data = [ (1, "Emma Thompson", 3800, 1, 6),
(2, "Daniel Rodriguez", 2230, 1, 7),
(3, "Olivia Smith", 7000, 1, 8),
(4, "Noah Johnson", 6800, 2, 9),
(5, "Sophia Martinez", 1750, 1, 11),
(6, "Liam Brown", 13000, 3, None),
(7, "Ava Garcia", 12500, 3, None),
(8, "William Davis", 6800, 2, None),
(9, "Isabella Wilson", 11000, 3, None),
(10, "James Anderson", 4000, 1, 11),
(11, "Mia Taylor", 10800, 3, None),
(12, "Benjamin Hernandez", 9500, 3, 8),
(13, "Charlotte Miller", 7000, 2, 6),
(14, "Logan Moore", 8000, 2, 6),
(15, "Amelia Lee", 4000, 1, 7),]

# Define the schema

columns = ["employee_id", "name", "salary",
"department_id", "manager_id"]
# Create the DataFrame
employee_df = spark.createDataFrame(data,
schema=columns)

# display the DataFrame

employee_df.display()

# Define the department data

department_data = [ (1, "Data Analytics"),
(2, "Data Science"),
(3, "Data Engineering"),]

# Define the schema for the department DataFrame

department_columns = ["department_id",
"department_name"]

# Create the DataFrame for departments

department_df =
spark.createDataFrame(department_data,
schema=department_columns)

# Show the DataFrame

department_df.display()
Output Table

department
name salary
_name
Data Analytics Olivia Smith 7000
Data Analytics Amelia Lee 4000
Data Analytics James Anderson 4000
Data Analytics Emma Thompson 3800
Data Engineering Liam Brown 13000
Data Engineering Ava Garcia 12500
Data Engineering Isabella Wilson 11000
Data Science Logan Moore 8000
Data Science Charlotte Miller 7000
Data Science Noah Johnson 6800
Data Science William Davis 6800
Problem Statement:

As part of an ongoing analysis of salary distribution within the company, your manager has
requested a report identifying high earners in each department. A 'high earner' within a
department is defined as an employee with a salary ranking among the top three salaries within
that department.

You're tasked with identifying these high earners across all departments. Write a query to
display the employee's name along with their department name and salary. In case of
duplicates, sort the results of department name in ascending order, then by salary in descending
order. If multiple employees have the same salary, then order them alphabetically.

Note: Ensure to utilize the appropriate ranking window function to handle duplicate salaries
effectively.

As of June 18th, we have removed the requirement for unique salaries and revised the sorting
order for the results.

# Define the employee data

data = [
(1, "Emma Thompson", 3800, 1, 6),
(2, "Daniel Rodriguez", 2230, 1, 7),
(3, "Olivia Smith", 7000, 1, 8),
(4, "Noah Johnson", 6800, 2, 9),
(5, "Sophia Martinez", 1750, 1, 11),
(6, "Liam Brown", 13000, 3, None),
(7, "Ava Garcia", 12500, 3, None),
(8, "William Davis", 6800, 2, None),
(9, "Isabella Wilson", 11000, 3, None),
(10, "James Anderson", 4000, 1, 11),
(11, "Mia Taylor", 10800, 3, None),
(12, "Benjamin Hernandez", 9500, 3, 8),
(13, "Charlotte Miller", 7000, 2, 6),
(14, "Logan Moore", 8000, 2, 6),
(15, "Amelia Lee", 4000, 1, 7),
]

# Define the schema

columns = ["employee_id", "name", "salary", "department_id",
"manager_id"]

# Create the DataFrame

employee_df = spark.createDataFrame(data, schema=columns)

# display the DataFrame

employee_df.display()

# Define the department data

department_data = [
(1, "Data Analytics"),
(2, "Data Science"),
(3, "Data Engineering"),
]

# Define the schema for the department DataFrame

department_columns = ["department_id", "department_name"]

# Create the DataFrame for departments

department_df = spark.createDataFrame(department_data,
schema=department_columns)

# Show the DataFrame

department_df.display()

employee_df.createOrReplaceTempView("employee")
department_df.createOrReplaceTempView("department")

%sql
WITH ranked_salary AS (
SELECT
name,
salary,
department_id,
DENSE_RANK() OVER (
PARTITION BY department_id
ORDER BY
salary DESC
) AS ranking
FROM
employee
)
SELECT
d.department_name,
s.name,
s.salary
FROM
ranked_salary AS s
INNER JOIN department AS d ON s.department_id = d.department_id
WHERE
s.ranking <= 3
ORDER BY
d.department_name ASC,
s.salary DESC,
s.name ASC;

from pyspark.sql import functions as F

from pyspark.sql import Window

# Define a window specification for ranking

window_spec =
Window.partitionBy("department_id").orderBy(F.desc("salary"))

# Create a ranked DataFrame

ranked_salary_df = employee_df.withColumn("ranking",
F.dense_rank().over(window_spec))

# Join the ranked DataFrame with the department DataFrame

result_df = (
ranked_salary_df.join(department_df, "department_id")
.filter(ranked_salary_df.ranking <= 3)
.select(
department_df.department_name, ranked_salary_df.name,
ranked_salary_df.salary
)
.orderBy(
department_df.department_name.asc(),
ranked_salary_df.salary.desc(),
ranked_salary_df.name.asc(),
)
)

# Show the result

result_df.display()

Explanation:

Window Specification: A Window is defined to partition the data by department_id and order it
by salary in descending order.

Ranking: The dense_rank() function is used to calculate rankings for each employee within their
department.

Joining: The join() method combines the ranked employee DataFrame with the department
DataFrame on the department_id.

Filtering and Selecting: The filter condition restricts the results to only include the top 3 salaries
in each department, and specific columns are selected for the final output.

Ordering: Finally, the results are ordered according to the department name, salary, and
employee name as required.

You can run this PySpark code in your environment to get the top three employees by salary for
each department along with the department names.
IF YOU FOUND THIS POST
USEFUL, PLEASE SAVE IT.

Ganesh. R
THANK YOU
For Your Support

I Appreciate for your support on

My Account, I will Never Stop to Share the
Knowledge.

rganesh203 (Ganesh R) rganesh203 (Ganesh R)

Pyspark Syntax Using Simple Examples
No ratings yet
Pyspark Syntax Using Simple Examples
28 pages
AWS Certified Security - Specialty Exam - Free Exam Q&As, Page 1 - ExamTopics
No ratings yet
AWS Certified Security - Specialty Exam - Free Exam Q&As, Page 1 - ExamTopics
871 pages
Py Spark
No ratings yet
Py Spark
10 pages
Session 12
No ratings yet
Session 12
67 pages
Pyspark 500
No ratings yet
Pyspark 500
103 pages
Pyspark coding questions from StrataScratch platform
No ratings yet
Pyspark coding questions from StrataScratch platform
23 pages
Redshift Dg (4)
No ratings yet
Redshift Dg (4)
733 pages
unit 4 Spark SQL
No ratings yet
unit 4 Spark SQL
49 pages
Mastering_Pandas_with_103_Practical_Questions_and_Solution_1731584558
No ratings yet
Mastering_Pandas_with_103_Practical_Questions_and_Solution_1731584558
48 pages
journal
No ratings yet
journal
47 pages
Pyspark and SQL
No ratings yet
Pyspark and SQL
57 pages
S905X3 Public Datasheet Hardkernel
No ratings yet
S905X3 Public Datasheet Hardkernel
1,081 pages
ip file class 12
No ratings yet
ip file class 12
26 pages
windows function
No ratings yet
windows function
25 pages
20 PySpark Problems
No ratings yet
20 PySpark Problems
22 pages
Day 60
No ratings yet
Day 60
10 pages
Grade 12 Informatics Practical practice 2024-25
No ratings yet
Grade 12 Informatics Practical practice 2024-25
12 pages
Comparison of SQL
No ratings yet
Comparison of SQL
11 pages
SET 1
No ratings yet
SET 1
16 pages
UINT 1 R
No ratings yet
UINT 1 R
40 pages
Assignment 3 - Shouvik(1159)
No ratings yet
Assignment 3 - Shouvik(1159)
15 pages
Rdbms and SQL: Name:Ibrahim Sameer V S REG NO:17BCA0062
100% (1)
Rdbms and SQL: Name:Ibrahim Sameer V S REG NO:17BCA0062
33 pages
DBMS 3b(employee department location )
No ratings yet
DBMS 3b(employee department location )
9 pages
SQL & Python Interview Q&A
No ratings yet
SQL & Python Interview Q&A
7 pages
Answer Key for SET-1 TO 3
No ratings yet
Answer Key for SET-1 TO 3
7 pages
HTML Code
No ratings yet
HTML Code
4 pages
SQL Practice Queries Solutions 1
No ratings yet
SQL Practice Queries Solutions 1
8 pages
Advanced SQL Interview Questions With Examples
No ratings yet
Advanced SQL Interview Questions With Examples
4 pages
DBMS 3a(employee, department, location)
No ratings yet
DBMS 3a(employee, department, location)
6 pages
CDMP Mock Test 4
100% (1)
CDMP Mock Test 4
20 pages
Big Data With Spark and Hadoop
No ratings yet
Big Data With Spark and Hadoop
9 pages
ScienceQtech Employee Performance Mapping
50% (2)
ScienceQtech Employee Performance Mapping
7 pages
quewtion sql_pyspark
No ratings yet
quewtion sql_pyspark
4 pages
211423205047-Exp1d
No ratings yet
211423205047-Exp1d
6 pages
PYTHON SQL
No ratings yet
PYTHON SQL
5 pages
SQL Zero To Hero DAY-15: Important Window Functions in SQL Part-2
No ratings yet
SQL Zero To Hero DAY-15: Important Window Functions in SQL Part-2
8 pages
Database Systems Spring 2024
No ratings yet
Database Systems Spring 2024
8 pages
htmlCode (1)
No ratings yet
htmlCode (1)
3 pages
Pyspark Interview Questions
No ratings yet
Pyspark Interview Questions
4 pages
Assignment 3 Jayant Devmore
No ratings yet
Assignment 3 Jayant Devmore
6 pages
Cep 1 Employee Performance Mapping Problem Statment
No ratings yet
Cep 1 Employee Performance Mapping Problem Statment
10 pages
SQL Exercises (HR Database) (SUBQUERIES)
80% (10)
SQL Exercises (HR Database) (SUBQUERIES)
6 pages
Assignmet_Hr_MySql
No ratings yet
Assignmet_Hr_MySql
6 pages
MLS Week 3-Solution
No ratings yet
MLS Week 3-Solution
8 pages
60 questions
No ratings yet
60 questions
4 pages
assignment-sql
No ratings yet
assignment-sql
3 pages
Sqlserevr Queries Assignment
No ratings yet
Sqlserevr Queries Assignment
4 pages
Pandas_Dataframe_All_Operations_1735471870
No ratings yet
Pandas_Dataframe_All_Operations_1735471870
4 pages
Lab#3 - SQL Queries – (Subqueries) - Exercises
No ratings yet
Lab#3 - SQL Queries – (Subqueries) - Exercises
3 pages
Pyspark - DataFrame Window Functions
No ratings yet
Pyspark - DataFrame Window Functions
3 pages
Spark-Scala Code
No ratings yet
Spark-Scala Code
3 pages
Question Set A
No ratings yet
Question Set A
3 pages
Pyspark Distinct and Filter
No ratings yet
Pyspark Distinct and Filter
3 pages
Advance Concept of SQL
No ratings yet
Advance Concept of SQL
3 pages
Question Bank-BDA (Module 1&2) 2
No ratings yet
Question Bank-BDA (Module 1&2) 2
5 pages
2.+PracticeExercise+-++Solution
No ratings yet
2.+PracticeExercise+-++Solution
2 pages
Assignment Ds Midterm
No ratings yet
Assignment Ds Midterm
2 pages
GNR651 Spring2023 - C1 C2 C3 C4 Session9++
No ratings yet
GNR651 Spring2023 - C1 C2 C3 C4 Session9++
57 pages
Basic SQL Queries Questions and Answers
No ratings yet
Basic SQL Queries Questions and Answers
8 pages
Session 15 Unit 5 Preview Lesson 1
No ratings yet
Session 15 Unit 5 Preview Lesson 1
59 pages
Pyspark SQL Basics Cheat Sheet: Python For Data Science
No ratings yet
Pyspark SQL Basics Cheat Sheet: Python For Data Science
1 page
Assemply Quiz 1
No ratings yet
Assemply Quiz 1
77 pages
ECPTXv2 Latest
No ratings yet
ECPTXv2 Latest
39 pages
SQL Cheat Sheet Python
100% (1)
SQL Cheat Sheet Python
1 page
BoneReconstructionPlannerSlides
No ratings yet
BoneReconstructionPlannerSlides
11 pages
Sever Exam
No ratings yet
Sever Exam
7 pages
Day 57
No ratings yet
Day 57
11 pages
AWS Learning material
No ratings yet
AWS Learning material
13 pages
Unit 5 IRS
No ratings yet
Unit 5 IRS
17 pages
WickedWhims v182a Exception
No ratings yet
WickedWhims v182a Exception
24 pages
DataBase Mid (Laith Amro)
No ratings yet
DataBase Mid (Laith Amro)
13 pages
7050 Series 10/40G Data Center Switches: Product Highlights
No ratings yet
7050 Series 10/40G Data Center Switches: Product Highlights
6 pages
Day76
No ratings yet
Day76
10 pages
Day 62
No ratings yet
Day 62
9 pages
New Pacifier APT Components Point To Russian-Linked Turla Group
No ratings yet
New Pacifier APT Components Point To Russian-Linked Turla Group
24 pages
Analysis of Algorithms (SET: 1) Solutions
No ratings yet
Analysis of Algorithms (SET: 1) Solutions
8 pages
Stacks Application: Presented By: Group 9 Bonbon, Jhon Rohclem L. Jericho Gapuzan Dave Vasay Jessa Resaba
No ratings yet
Stacks Application: Presented By: Group 9 Bonbon, Jhon Rohclem L. Jericho Gapuzan Dave Vasay Jessa Resaba
15 pages
Day24
No ratings yet
Day24
8 pages
Carrier Objective: Rahul Teja
No ratings yet
Carrier Objective: Rahul Teja
3 pages
Computer Engineering Technology
No ratings yet
Computer Engineering Technology
16 pages
B2B Integration - Architecture
No ratings yet
B2B Integration - Architecture
24 pages
Day27
No ratings yet
Day27
6 pages
Gantt Chart Coffe Co
No ratings yet
Gantt Chart Coffe Co
6 pages
Sudha
No ratings yet
Sudha
13 pages
Day28
No ratings yet
Day28
5 pages
Hospital Management System
No ratings yet
Hospital Management System
11 pages
EEE 316L-S# - (Name - Student ID) - L2-AT24
No ratings yet
EEE 316L-S# - (Name - Student ID) - L2-AT24
5 pages
SDD Assignment 240
No ratings yet
SDD Assignment 240
7 pages
ACL Case Study
No ratings yet
ACL Case Study
8 pages
Function Calls Itself /: Write A Program in C To Find The Factorial of A Number Using Recursion
No ratings yet
Function Calls Itself /: Write A Program in C To Find The Factorial of A Number Using Recursion
6 pages
United States Army ERP Implementation Summary
No ratings yet
United States Army ERP Implementation Summary
5 pages
Arun Kumar Resume . (2)
No ratings yet
Arun Kumar Resume . (2)
1 page
Case Management Information System - Kosovo Case Study
No ratings yet
Case Management Information System - Kosovo Case Study
4 pages
Apache Cassandra Developer Associate - Exam Practice Tests
From Everand
Apache Cassandra Developer Associate - Exam Practice Tests
Cristian Scutaru
No ratings yet
Ms Access 2007: Step by Step
From Everand
Ms Access 2007: Step by Step
Asim Abbasi
5/5 (1)
Administering Microsoft Azure SQL Solutions DP 300
From Everand
Administering Microsoft Azure SQL Solutions DP 300
Manish Soni
No ratings yet
The Essential R Reference
From Everand
The Essential R Reference
Mark Gardener
No ratings yet
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
From Everand
C# Interview Questions, Answers, and Explanations: C Sharp Certification Review
equitypress
4.5/5 (3)

Day77

Uploaded by

Day77

Uploaded by

Scenario Based

#Define the employee data

# Define the schema

# display the DataFrame

# Define the department data

# Define the schema for the department DataFrame

# Create the DataFrame for departments

# Show the DataFrame

# Define the employee data

# Define the schema

# Create the DataFrame

# display the DataFrame

# Define the department data

# Define the schema for the department DataFrame

# Create the DataFrame for departments

# Show the DataFrame

from pyspark.sql import functions as F

# Define a window specification for ranking

# Create a ranked DataFrame

# Join the ranked DataFrame with the department DataFrame

# Show the result

I Appreciate for your support on

rganesh203 (Ganesh R) rganesh203 (Ganesh R)

rganesh203 (Ganesh R) rganesh203 (Ganesh R)

rganesh203 (Ganesh R) rganesh203 (Ganesh R)

You might also like