0% found this document useful (0 votes)

3 views

Big Data Analytics 0th Lecture

INT 421 BIG DATA LECTURE0

Uploaded by

JP2B4 197974 S Satya Reddy PMPC

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

3 views

Big Data Analytics 0th Lecture

INT 421 BIG DATA LECTURE0

Uploaded by

JP2B4 197974 S Satya Reddy PMPC

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 19

#LifeKoKaroLift

Lecture
Big Data Analytics
INT421 1
Module Name: Big
Data Analytics

Course : Hadoop , Hive

, Spark
Edit Master text styles
Lecture On : Basics of
Big Data Analytics - Day
-1
Instructor : Sneha
Sharma

2
Poll 1
Edit Master text styles

How familiar are you students with Big Data and Hadoop on the scale of 1-5?

• Practice in teams of 4 students

• Industry expert mentoring to learn better
• Get personalised feedback for improvements

23/05/19 Footer 11 3
Course Assessment Model
Marks break up
• Attendance 5
• CA(Assignment(Case Based)+Test+Test) 25
• MTT 20
• ETE
50
Total 100

4
Detail of Academic Tasks
• AT1: Assignment- Case based
• AT2: Class Test
• AT3: Class Test

(AT1 is compulsory and 1 best of AT2 and AT3 will be

considered)
Course Outcomes
CO1 Evaluate the features and limitations of Hadoop's HDFS (Hadoop Distributed File System), and compare it with
conventional data processing systems, in order to make informed decisions about selecting the appropriate data storage
and processing system for their specific business needs.

CO2 Design and implement a Hive data management system for a given business case, including creating a database,
internal and external tables, and performing various operations on the tables such as sorting, distributing and clustering,
in order to efficiently manage large volumes of structured data.

CO3 Compare and contrast the Spark and MapReduce frameworks, and evaluate the benefits of using Spark for big data
processing tasks.

CO4 Apply the concepts of paired RDDs and structured APIs such as DataFrames and Datasets in Apache Spark, and perform
various operations such as data manipulation, window functions and descriptive statistics.

CO5 Evaluate the importance of optimizing Spark jobs for efficient performance, and analyze Spark jobs to identify potential
performance bottlenecks such as disk IO, network IO and shuffles.

CO6 Design and implement an optimized Spark job that utilizes best practices for working with Apache Spark in a production
environment
Today’s Agenda

● Big Data Analytic

● Big Data Processing
● Introduction to Hadoop
● Hadoop Architecture
● Hadoop Ecosystem

7
Introduction to Big Data Analytics

WHAT IS BIG DATA?

• Big data refers to a vast amount of structured, semi-structured, and unstructured
data that is generated at a high velocity and volume.
• The term "big data" is not defined by a specific size but rather the scale and
complexity of the data involved.
• Sources- Social media, IoT devices, Web and Mobile Applications,
• Types of Data - Structured, Semistructured, Unstructured

Primary characteristics of big data are often referred to as the "3Vs":

• Volume - Huge amount of data
• Velocity - The speed at which data is generated
• Variety - Types of data
• Veracity - Quality and accuracy of the data
• Value - Usefulness of the data
Big Data Processing
• Data collection
• Data storage
• Data preprocessing
• Data processing
• Data analysis
• Data visualization
• Decision making.
● Value: The ultimate goal of big data analytics is to extract value and derive
actionable insights from the data. Despite dealing with large volumes of data with
varying velocity and variety, the primary focus is on discovering meaningful
patterns and trends that can lead to data-driven decisions, better business
strategies, improved operational efficiency, and other valuable outcomes.
Introduction to Hadoop

What is Hadoop?

Hadoop is an open-source framework

based on Java that manages the storage
and processing of large amounts of data for
applications. Hadoop uses distributed
storage and parallel processing to handle
big data. It provides a software framework
for distributed storage and processing of big
data using the MapReduce programming
model
Introduction to Hive

What is Hive?

Apache Hive is a data warehouse software

project that is built on top of the Hadoop ecosystem.
It provides an SQL-like interface to query and
analyze large datasets stored in Hadoop’s
distributed file system (HDFS) or other compatible
storage systems.
Hive uses a language called HiveQL, which is
similar to SQL, to allow users to express data
queries, transformations, and analyses in a familiar
syntax. HiveQL statements are compiled into
MapReduce jobs, which are then executed on the
Hadoop cluster to process the data.
Introduction to Spark

What is Spark?

Apache Spark is a lightning-fast cluster

computing technology, designed for fast
computation. It is based on Hadoop
MapReduce and it extends the MapReduce
model to efficiently use it for more types of
computations, which includes interactive
queries and stream processing. The main
feature of Spark is its in-memory cluster
computing that increases the processing
speed of an application.
HDFS

What is HDFS?

HDFS is a distributed file system that

handles large data sets running on
commodity hardware. It is used to scale a
single Apache Hadoop cluster to hundreds
(and even thousands) of nodes. HDFS is
one of the major components of Apache
Hadoop, the others being MapReduce and
YARN.
Mapper

What is Mapper?

Map-Reduce is a programming model that is

mainly divided into two phases Map Phase
and Reduce Phase. It is designed for
processing the data in parallel which is
divided on various machines(nodes). The
Hadoop Java programs are consist of Mapper
class and Reducer class along with the driver
class. Hadoop Mapper is a function or task
which is used to process all input records
from a file and generate the output which
works as input for Reducer. It produces the
output by returning new key-value pairs.
Reducer

What is Reducer?

The Reducer process the output of the

mapper. After processing the data, it produces
a new set of output. At last HDFS stores this
output data.

Hadoop Reducer takes a set of an

intermediate key-value pair produced by the
mapper as the input and runs a Reducer
function on each of them. One can aggregate,
filter, and combine this data (key, value) in a
number of ways for a wide range of
processing.
Key Takeaway

● Big Data Analytics

● Hive in Hadoop
● Spark
● Hadoop
● Hadoop File System
● Mapper
● Reducer

17
#LifeKoKaroLift

Thank You!
Kindly follow the steps provided in the video below to download and
install hadoop, hive and derby.

https://www.youtube.com/watch?v=knAS0w-jiUk&ab_channel=IvyPro
School

https://www.youtube.com/watch?v=CRX6OOUFxyQ&ab_channel=U
nboxingBigData

Annual Accomplishment Report in English
83% (12)
Annual Accomplishment Report in English
2 pages
Mathematics - CBC Grade 8 End Term 2 Exams 2024 Set 1-1843
100% (2)
Mathematics - CBC Grade 8 End Term 2 Exams 2024 Set 1-1843
4 pages
Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
Bio Data - Text Box - Final
50% (2)
Bio Data - Text Box - Final
1 page
20IT503 - Big Data Analytics - Unit4
No ratings yet
20IT503 - Big Data Analytics - Unit4
73 pages
BDA - Unit-1
No ratings yet
BDA - Unit-1
24 pages
BIG DATA ANALYTICS (1)
No ratings yet
BIG DATA ANALYTICS (1)
20 pages
Bdhs - Ebook
No ratings yet
Bdhs - Ebook
970 pages
CloudxLab BDHS Course Details
No ratings yet
CloudxLab BDHS Course Details
9 pages
Unit 1
No ratings yet
Unit 1
19 pages
B2. Introduction To Big Data With Spark and Hadoop - Coursera
No ratings yet
B2. Introduction To Big Data With Spark and Hadoop - Coursera
12 pages
Big Data analyticsNEW SYLLABUS FRAMING
No ratings yet
Big Data analyticsNEW SYLLABUS FRAMING
3 pages
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
No ratings yet
B.Tech. CS_CE and CSE Syllabus 3rd Year 2024-25
2 pages
Apache Hadoop and Spark:: and Use Cases For Data Analysis
No ratings yet
Apache Hadoop and Spark:: and Use Cases For Data Analysis
48 pages
Ashish_Presentation_Stage1_modify_LR
No ratings yet
Ashish_Presentation_Stage1_modify_LR
24 pages
Unit 6-1
No ratings yet
Unit 6-1
128 pages
Int 421
No ratings yet
Int 421
2 pages
Big Data and Hadoop
No ratings yet
Big Data and Hadoop
5 pages
biggdata
No ratings yet
biggdata
24 pages
Module 2.pptx
No ratings yet
Module 2.pptx
20 pages
Big Data Analytics Digital Notes
No ratings yet
Big Data Analytics Digital Notes
119 pages
Lecture 3 PPT 22
No ratings yet
Lecture 3 PPT 22
25 pages
Big Data Lab Manual
No ratings yet
Big Data Lab Manual
36 pages
BigData and Hadoop - Syllabus
No ratings yet
BigData and Hadoop - Syllabus
2 pages
BigDataProcessingTools HaddopHDFSHiveSpark
No ratings yet
BigDataProcessingTools HaddopHDFSHiveSpark
2 pages
Gag PDF
No ratings yet
Gag PDF
15 pages
Analyzing Big Data in Hadoop Spark
No ratings yet
Analyzing Big Data in Hadoop Spark
30 pages
Lesson 01 Course Introduction
No ratings yet
Lesson 01 Course Introduction
29 pages
Lesson 01 Course Introduction
No ratings yet
Lesson 01 Course Introduction
29 pages
PPT 2.1.1.
No ratings yet
PPT 2.1.1.
24 pages
Big Data Analytics (R18a0529)
No ratings yet
Big Data Analytics (R18a0529)
134 pages
BDA-UNIT-1
No ratings yet
BDA-UNIT-1
32 pages
Chap3_OverviewOfBigDataEcosystem
No ratings yet
Chap3_OverviewOfBigDataEcosystem
91 pages
HADOOP
No ratings yet
HADOOP
55 pages
Introduction To Big Dat1
No ratings yet
Introduction To Big Dat1
6 pages
IIT Kharagpur Data Science PDF
No ratings yet
IIT Kharagpur Data Science PDF
22 pages
MCAD2232 (PRESS) BIG DATA and Its Applications
No ratings yet
MCAD2232 (PRESS) BIG DATA and Its Applications
140 pages
The Age OF: Every Minute
No ratings yet
The Age OF: Every Minute
47 pages
Big Data and Analytics and MapReduce 29052023 054155pm
No ratings yet
Big Data and Analytics and MapReduce 29052023 054155pm
35 pages
Introduction To Big Data Technologies
No ratings yet
Introduction To Big Data Technologies
10 pages
Bigdata Hadoop Spark - Python
No ratings yet
Bigdata Hadoop Spark - Python
8 pages
Hadoop PPT
No ratings yet
Hadoop PPT
25 pages
Big Data Hadoop & Spark: Certification Training
No ratings yet
Big Data Hadoop & Spark: Certification Training
22 pages
IOT and Comp.architecture
No ratings yet
IOT and Comp.architecture
17 pages
LP BigData
No ratings yet
LP BigData
5 pages
Big Data: Introduction To Terms, Concepts and Tools
No ratings yet
Big Data: Introduction To Terms, Concepts and Tools
23 pages
Learn Well Technocraft: Hadoop/Big Data Syllabus
No ratings yet
Learn Well Technocraft: Hadoop/Big Data Syllabus
12 pages
Hadoop Intro - Part1
No ratings yet
Hadoop Intro - Part1
45 pages
Big Data Engineer Course (2) (1)
No ratings yet
Big Data Engineer Course (2) (1)
31 pages
Big Data Analytics With Hadoop and Apache Spark
No ratings yet
Big Data Analytics With Hadoop and Apache Spark
17 pages
Chapter 2-Data Science
No ratings yet
Chapter 2-Data Science
23 pages
Updated Unit-2
0% (1)
Updated Unit-2
55 pages
Bda Unit 2
No ratings yet
Bda Unit 2
57 pages
Spark Seminar Report
100% (1)
Spark Seminar Report
30 pages
0 The BigDataEra
No ratings yet
0 The BigDataEra
36 pages
BDA Unit - II
No ratings yet
BDA Unit - II
66 pages
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
No ratings yet
Big Data Analytics: A Comparative Evaluation of Apache Hadoop and Apache Spark
8 pages
Big Data and Hadoop For Developers - Syllabus
No ratings yet
Big Data and Hadoop For Developers - Syllabus
6 pages
Big Data and Analytics Syllabus 2021
No ratings yet
Big Data and Analytics Syllabus 2021
3 pages
BDH Admin Ebook
No ratings yet
BDH Admin Ebook
807 pages
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
From Everand
Apache Hive Handbook: Query, Analyze, and Optimize Big Data
Robert Johnson
No ratings yet
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
From Everand
Mastering Big Data and Hadoop: From Basics to Expert Proficiency
William Smith
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
A Short Introduction To Tagalog Folk Magic
No ratings yet
A Short Introduction To Tagalog Folk Magic
4 pages
Tentative Schedule Mathematics
No ratings yet
Tentative Schedule Mathematics
2 pages
Punctuationpowerpoint
No ratings yet
Punctuationpowerpoint
17 pages
Practicas Discursivas II
No ratings yet
Practicas Discursivas II
84 pages
NDC Enterprise and Fast Start Overview v2
100% (1)
NDC Enterprise and Fast Start Overview v2
64 pages
Lesson 7 - Thrown in A Well - Tafseer Surah Yusuf - Asim Khan
No ratings yet
Lesson 7 - Thrown in A Well - Tafseer Surah Yusuf - Asim Khan
13 pages
DAA Unit-2 D&C and Greedy R20
No ratings yet
DAA Unit-2 D&C and Greedy R20
18 pages
Shah Abdul Jabbar Rah.
No ratings yet
Shah Abdul Jabbar Rah.
5 pages
Speech For Farewell
No ratings yet
Speech For Farewell
1 page
NASKAH (Inggris)
100% (1)
NASKAH (Inggris)
7 pages
Array Quality Metrics
No ratings yet
Array Quality Metrics
9 pages
18CS653 - NOTES Module 1
No ratings yet
18CS653 - NOTES Module 1
24 pages
Achievement Test 2 - Speaking: Units 5-9 Teacher'S Notes
No ratings yet
Achievement Test 2 - Speaking: Units 5-9 Teacher'S Notes
2 pages
[Ebooks PDF] download (Ebook) Shakespeare and Indian Cinemas: "local Habitations" by Poonam Trivedi; Paromita Chakravarti ISBN 9781138946927, 1138946923 full chapters
100% (1)
[Ebooks PDF] download (Ebook) Shakespeare and Indian Cinemas: "local Habitations" by Poonam Trivedi; Paromita Chakravarti ISBN 9781138946927, 1138946923 full chapters
86 pages
24-Multi-Level Indexing, Dynamic Multilevel Indexing, B-Tree-11-09-2024
No ratings yet
24-Multi-Level Indexing, Dynamic Multilevel Indexing, B-Tree-11-09-2024
40 pages
Akash Cas Project
No ratings yet
Akash Cas Project
40 pages
Word To PDF
No ratings yet
Word To PDF
35 pages
Bad Love Level 1 PDF
No ratings yet
Bad Love Level 1 PDF
3 pages
Chapter Assessment and Vocabulary
No ratings yet
Chapter Assessment and Vocabulary
3 pages
Đề cương ôn tập giữa kỳ 2 môn Anh lớp 10 Trường THPT Yên Hòa năm 2021-2022
No ratings yet
Đề cương ôn tập giữa kỳ 2 môn Anh lớp 10 Trường THPT Yên Hòa năm 2021-2022
14 pages
Aafaaq OBIEE Admin
No ratings yet
Aafaaq OBIEE Admin
7 pages
Module 5 (Free Will and Predestination)
No ratings yet
Module 5 (Free Will and Predestination)
27 pages
Lesson Plan 2 - Polar Bear
No ratings yet
Lesson Plan 2 - Polar Bear
5 pages
Wrong Series
No ratings yet
Wrong Series
14 pages
Cacao vs. Cocoa: What's The Difference?
No ratings yet
Cacao vs. Cocoa: What's The Difference?
2 pages
Web Designing Lab File (KIT451) : Submitted By-Saumya Asawa Roll No: 1900970130108
67% (3)
Web Designing Lab File (KIT451) : Submitted By-Saumya Asawa Roll No: 1900970130108
32 pages
Data Lulus Seleksi Akademik 2019
No ratings yet
Data Lulus Seleksi Akademik 2019
28 pages

Big Data Analytics 0th Lecture

Uploaded by

Big Data Analytics 0th Lecture

Uploaded by

#LifeKoKaroLift

Course : Hadoop , Hive

• Practice in teams of 4 students

(AT1 is compulsory and 1 best of AT2 and AT3 will be

● Big Data Analytic

WHAT IS BIG DATA?

Primary characteristics of big data are often referred to as the "3Vs":

Hadoop is an open-source framework

Apache Hive is a data warehouse software

Apache Spark is a lightning-fast cluster

HDFS is a distributed file system that

Map-Reduce is a programming model that is

The Reducer process the output of the

Hadoop Reducer takes a set of an

● Big Data Analytics

You might also like