Big Data Analytics 0th Lecture
Big Data Analytics 0th Lecture
Lecture
Big Data Analytics
INT421 1
Module Name: Big
Data Analytics
2
Poll 1
Edit Master text styles
How familiar are you students with Big Data and Hadoop on the scale of 1-5?
23/05/19 Footer 11 3
Course Assessment Model
Marks break up
• Attendance 5
• CA(Assignment(Case Based)+Test+Test) 25
• MTT 20
• ETE
50
Total 100
4
Detail of Academic Tasks
• AT1: Assignment- Case based
• AT2: Class Test
• AT3: Class Test
CO2 Design and implement a Hive data management system for a given business case, including creating a database,
internal and external tables, and performing various operations on the tables such as sorting, distributing and clustering,
in order to efficiently manage large volumes of structured data.
CO3 Compare and contrast the Spark and MapReduce frameworks, and evaluate the benefits of using Spark for big data
processing tasks.
CO4 Apply the concepts of paired RDDs and structured APIs such as DataFrames and Datasets in Apache Spark, and perform
various operations such as data manipulation, window functions and descriptive statistics.
CO5 Evaluate the importance of optimizing Spark jobs for efficient performance, and analyze Spark jobs to identify potential
performance bottlenecks such as disk IO, network IO and shuffles.
CO6 Design and implement an optimized Spark job that utilizes best practices for working with Apache Spark in a production
environment
Today’s Agenda
7
Introduction to Big Data Analytics
What is Hadoop?
What is Hive?
What is Spark?
What is HDFS?
What is Mapper?
What is Reducer?
17
#LifeKoKaroLift
Thank You!
Kindly follow the steps provided in the video below to download and
install hadoop, hive and derby.
https://www.youtube.com/watch?v=knAS0w-jiUk&ab_channel=IvyPro
School
https://www.youtube.com/watch?v=CRX6OOUFxyQ&ab_channel=U
nboxingBigData