0% found this document useful (0 votes)
2 views6 pages

Lab_Exam 2021BCS0021

The document outlines the final lab exam for Big Data and Scalable Computing, scheduled for November 7, 2024, with a total of 15 marks. It includes tasks such as performing various join operations using PIG on provided text files, executing joins using multiple keys, counting words in a text file using Spark, and displaying contents of a CSV file in Spark. The exam has a structured time allocation of 30 minutes for writing and 1 hour for execution.

Uploaded by

Vikas Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
2 views6 pages

Lab_Exam 2021BCS0021

The document outlines the final lab exam for Big Data and Scalable Computing, scheduled for November 7, 2024, with a total of 15 marks. It includes tasks such as performing various join operations using PIG on provided text files, executing joins using multiple keys, counting words in a text file using Spark, and displaying contents of a CSV file in Spark. The exam has a structured time allocation of 30 minutes for writing and 1 hour for execution.

Uploaded by

Vikas Kushwaha
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 6

Final Lab Exam Big Data and Scalable Computing

Group A

Date : 07-11-2024 Marks: 15 (8M Written + 7M Execution)

Time : 12PM – 2PM 30min time for writing, 1hr for execution

Name : Vikas Kushwaha


Roll No : 2021BCS0021

Task 1: Perform Join operation on PIG with given files (3M)

• Self-Join on File 1
• Inner join
• Outer join (Left, Right, Full)

File 1

customers.txt

id, name, age, city, amount

1,Ramesh,32,Ahmedabad,2000.00
2,Khilan,25,Delhi,1500.00

3,kaushik,23,Kota,2000.00

4,Chaitali,25,Mumbai,6500.00

5,Hardik,27,Bhopal,8500.00

6,Komal,22,MP,4500.00

7,Muffy,24,Indore,10000.00

File 2

orders.txt order_id, date,

cust_id, amount 102,2009-10-08

00:00:00,3,3000
100,2009-10-08 00:00:00,3,1500
101,2009-11-20 00:00:00,2,1560
103,2008-05-20 00:00:00,4,2060

SELF_JOIN

INNER JOIN

Outer join (Left, Right, Full)


Task 2: Perform JOIN operation using multiple keys – id, jobid (1M)

employee.txt

id, firstname, lastname, age, post, jobid

1,Rajiv,Reddy,21,programmer,113
2,Siddarth,Battacharya,22,programmer,113

3,Rajesh,Khanna,22,programmer,113

4,Preethi,Agarwal,21,programmer,113

5,Trupthi,Mohanthy,23,programmer,113
6,Archana,Mishra,23,programmer,113
7,Komal,Nayak,24,teamlead,112

employee_contact.txt

id, mobileno, mail, age, city, jobid

1,9848022337,[email protected],Hyderabad,113

2,9848022338,[email protected],Kolkata,113

3,9848022339,[email protected],Delhi,113

004,9848022330,[email protected],Pune,113

005,9848022336,[email protected],Bhuwaneshwar,11
3

006,9848022335,[email protected],Chennai,113

007,9848022334,[email protected],trivendram,112

008,9848022333,[email protected],Chennai,111
Task 3: Read any text file in Spark and display count of each word of the text file in
Spark. (3M)

Task 4: Read any csv file in Spark and show its all contents (1M)

You might also like