0% found this document useful (0 votes)
41 views

Spring 2024 - CS441 - 2

Uploaded by

King Boss
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
41 views

Spring 2024 - CS441 - 2

Uploaded by

King Boss
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 3

Virtual University of Pakistan SEMESTER SPRING 2024

CS441 – Big Data Concepts


Assignment No.2 (Graded)

Maximum Marks: 20
Instructions Due Date: 25 June 2024

The purpose of this assignment is to give you hands-on practice. It is expected that students
will solve the assignment themselves. The following rules will apply during the evaluation
of the assignment.

● Cheating from any source will result in zero marks in


the assignment.
● The submitted assignment does NOT open or the file is corrupted.
● No assignment after the due date will be accepted.
● Students can submit HTML, Images & Plain text only in this inline Mode. You
may also insert an image file/table.
● (DOC/pdf File uploading option will not be available) in inline assignment
submission.
Uploading Assignment Instruction
Follow the given instruction to submit the inline assignment.

Students can copy/paste their Hive Code into the submission interface below.
Virtual University of Pakistan SEMESTER SPRING 2024

Lectures Covered
This assignment covers the contents covered in Week 10.

Objective & Learning Outcome


The objective of this assignment is to implement HiveQL code.

After completing the assignment, the student will be able to implement and execute programs
in Hive Query Language.

Question No. 1 (20 Marks)

Suppose we are managing a music streaming service. We have two tables:

1. Songs: Contains information about the songs available on the platform.


2. Plays: Contains information about the plays or streams of songs by users.

Songs Table 1:

Song_id Title Artist Genre Release_date


1 Song A Artist X Pop 2024-01-01
2 Song B Artist Y Rock 2024-02-15
3 Song C Artist Z Hip-Hop 2024-03-10

Plays Table 2:

Play_id User_id Song_id Play_date duration


101 1 1 2024-05-01 180
102 2 2 2024-05-03 200
103 1 3 2024-05-06 220
104 3 1 2024-05-07 180
105 2 3 2024-05-10 220

Consider the given tables and write HiveQL code for the following operations

1. Create a Database with your own VU ID.

2. Create a Table named “Songs table”. Your table should consist of five fields/columns
mentioned in given Table 1. Use the appropriate data type for each field/column.

3. Create a Table named “Plays table”. Your table should consist of five fields/columns
mentioned in given Table 2. Use the appropriate data type for each field/column.
Virtual University of Pakistan SEMESTER SPRING 2024

4. Insert given data in Songs and Plays tables as shown in Table 1& Table 2. You can
store the given data in the text file and then load it in tables.

5. Display all data from both tables.

6. Find the most played song in each genre and display the song title, artist, genre, and the
number of times it was played.

Note: Plagiarism will be checked for the solution provided. Marks will be
awarded based on your answer and plagiarism report.

For any query about the assignment, contact at email [email protected]

You might also like