0% found this document useful (0 votes)

10 views4 pages

Unit 2 BDA

unit 2 of big data analytics

Uploaded by

saisri.pentapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

10 views4 pages

Unit 2 BDA

unit 2 of big data analytics

Uploaded by

saisri.pentapati

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 4

Unit 2: Apache Hadoop

(10-mark answers for each topic)

1. Introduction to Apache Hadoop

• Definition: Hadoop is an open-source framework for distributed storage and

processing of Big Data using simple programming models.

• Key Features:

1. Handles large-scale data storage (HDFS).

2. Processes data in parallel using MapReduce.

3. Scalable and fault-tolerant.

• Applications:

o Social media analysis (Facebook, Twitter).

o Fraud detection in banking.

o Search engines like Google.

2. System Principle

• Core Concept: Hadoop distributes data across multiple nodes and processes it
in parallel, ensuring high efficiency.

• Key Components:

1. HDFS: Hadoop Distributed File System for storage.

2. MapReduce: Programming model for processing.

3. YARN: Resource management and job scheduling.

3. Hadoop Architecture

• Layers:

1. Storage Layer (HDFS): Manages large data storage across clusters.

2. Processing Layer (MapReduce): Executes parallel computations on

data.

3. Resource Layer (YARN): Allocates system resources efficiently.

• Diagram:
(Diagram of Hadoop architecture with HDFS, MapReduce, and YARN interactions
will be included.)

4. Hadoop Distributed File System (HDFS)

• Overview: HDFS is designed for storing large datasets across multiple nodes.

• Features:

1. Data Blocks: Files are split into blocks (default size: 128 MB).

2. Replication: Data is replicated across nodes for fault tolerance.

3. Write Once, Read Many: Optimized for reading operations.

5. Hadoop MapReduce

• Definition: A programming model for parallel data processing.

• How it Works:

1. Input Split: Data is divided into chunks.

2. Map Phase: Processes data in key-value pairs.

3. Reduce Phase: Aggregates and produces the final output.

• Advantages:

o High scalability and efficiency.

o Works on commodity hardware.

6. YARN (Yet Another Resource Negotiator)

• Definition: A resource management layer in Hadoop for job scheduling.

• Components:

1. Resource Manager: Allocates resources for applications.

2. Node Manager: Monitors individual nodes and reports to the Resource

Manager.

• Advantages:

o Increases cluster utilization.

o Supports multiple workloads (MapReduce, Spark).

7. Hadoop Installation and Modes

• Installation:

1. Download and install Hadoop.

2. Configure HDFS and MapReduce settings.

3. Start Hadoop services.

• Modes:

1. Standalone Mode: Single node for testing.

2. Pseudo-Distributed Mode: Simulates a cluster on one machine.

3. Fully Distributed Mode: Real cluster with multiple nodes.

8. Hadoop Commands

• HDFS Commands:

1. hdfs dfs -ls: List files in HDFS.

2. hdfs dfs -put: Upload files to HDFS.

3. hdfs dfs -get: Download files from HDFS.

• YARN Commands:

1. yarn application -list: View running applications.

2. yarn logs: View application logs.

9. Moving Data In and Out of Hadoop

• Using HDFS:

o Upload data using commands like hdfs dfs -put.

o Retrieve processed data using hdfs dfs -get.

• Integration Tools: Sqoop for transferring data between Hadoop and relational
databases.
10. Hadoop Programming

• Overview: Writing applications in Java or Python to process data using

MapReduce.

• Example Program: Word Count application in Hadoop.

Detailed Big Data and Hadoop Notes
No ratings yet
Detailed Big Data and Hadoop Notes
3 pages
BDA Unit-4 Part-1 HDFS,MapReduce
No ratings yet
BDA Unit-4 Part-1 HDFS,MapReduce
76 pages
BDA Unit-2
No ratings yet
BDA Unit-2
90 pages
CO3 Session 19
No ratings yet
CO3 Session 19
29 pages
Updated Unit-IV Reference PPT 08-02-2022.Pptx
No ratings yet
Updated Unit-IV Reference PPT 08-02-2022.Pptx
103 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
Big Data
No ratings yet
Big Data
45 pages
Paper 1
No ratings yet
Paper 1
21 pages
Module 2
No ratings yet
Module 2
23 pages
The CAP Theorem Overview
No ratings yet
The CAP Theorem Overview
16 pages
Chap5_BigDataComputingAndProcessing
No ratings yet
Chap5_BigDataComputingAndProcessing
72 pages
BD U-2 (Anupam Sir)
No ratings yet
BD U-2 (Anupam Sir)
30 pages
BigData Unit-4 Complete
No ratings yet
BigData Unit-4 Complete
97 pages
L02-Hadoop Framework
No ratings yet
L02-Hadoop Framework
40 pages
Lecture 2
No ratings yet
Lecture 2
70 pages
Unit 3
No ratings yet
Unit 3
12 pages
Bda QB Soln
No ratings yet
Bda QB Soln
22 pages
Lec 2
No ratings yet
Lec 2
19 pages
Unit - 3
No ratings yet
Unit - 3
34 pages
BDA UNIT 2 (1)
No ratings yet
BDA UNIT 2 (1)
16 pages
Big _Data _ISE 2
No ratings yet
Big _Data _ISE 2
12 pages
bigdata short
No ratings yet
bigdata short
8 pages
bdcc-2.2
No ratings yet
bdcc-2.2
12 pages
BDA ESE
No ratings yet
BDA ESE
21 pages
Shortnotes For Cloud
No ratings yet
Shortnotes For Cloud
22 pages
BDA Unit 2 Q&A
No ratings yet
BDA Unit 2 Q&A
14 pages
HADOOP
No ratings yet
HADOOP
10 pages
1 - Big Data and Hadoop Framework
No ratings yet
1 - Big Data and Hadoop Framework
40 pages
CC unit5
No ratings yet
CC unit5
27 pages
Unit 2
No ratings yet
Unit 2
7 pages
IDS Unit3
No ratings yet
IDS Unit3
16 pages
Attachment (21)
No ratings yet
Attachment (21)
11 pages
IDS Unit3
No ratings yet
IDS Unit3
19 pages
BD - Unit - II - Hadoop Frameworks and HDFS
No ratings yet
BD - Unit - II - Hadoop Frameworks and HDFS
37 pages
Unit-2 (HADOOP)
No ratings yet
Unit-2 (HADOOP)
20 pages
Chapter 2 Introduction To Hadoop
No ratings yet
Chapter 2 Introduction To Hadoop
31 pages
Cloudera Hadoop Admin Notes PDF
No ratings yet
Cloudera Hadoop Admin Notes PDF
65 pages
UNIT-I Introduction To Hadoop - A20
No ratings yet
UNIT-I Introduction To Hadoop - A20
24 pages
Chapter2 Bdi
No ratings yet
Chapter2 Bdi
101 pages
Bda Module 2
No ratings yet
Bda Module 2
12 pages
BDA Unit-3
No ratings yet
BDA Unit-3
47 pages
BDM 2
No ratings yet
BDM 2
5 pages
Unit-2 - Introduction To Hadoop and Hadoop Architecture
No ratings yet
Unit-2 - Introduction To Hadoop and Hadoop Architecture
46 pages
Bda Summer 2022 Solution
No ratings yet
Bda Summer 2022 Solution
30 pages
Big Data Analytics unit wise short note
No ratings yet
Big Data Analytics unit wise short note
6 pages
Bda Unit 2
No ratings yet
Bda Unit 2
21 pages
BD by maaz
No ratings yet
BD by maaz
19 pages
BDA simple 1 to 4
No ratings yet
BDA simple 1 to 4
11 pages
HADOOP
No ratings yet
HADOOP
19 pages
Unit III
No ratings yet
Unit III
15 pages
Hadoop Components
No ratings yet
Hadoop Components
5 pages
Unit 3 Bda
No ratings yet
Unit 3 Bda
13 pages
PMIT 14th 181215 Research Project On Inner Map Combiner
No ratings yet
PMIT 14th 181215 Research Project On Inner Map Combiner
53 pages
HADOOP
No ratings yet
HADOOP
4 pages
Bda A1
No ratings yet
Bda A1
5 pages
Big Data Analytics – Unit 4
No ratings yet
Big Data Analytics – Unit 4
32 pages
Be Extc
No ratings yet
Be Extc
126 pages
Chap7 BigData
No ratings yet
Chap7 BigData
35 pages
SDCBDASPARKWEEK1-1
No ratings yet
SDCBDASPARKWEEK1-1
9 pages
DV Co1 All PDF
No ratings yet
DV Co1 All PDF
196 pages
Installation and Configuration System Tool For Hadoop
No ratings yet
Installation and Configuration System Tool For Hadoop
30 pages
Cloud Computing Exam Answers 2024
No ratings yet
Cloud Computing Exam Answers 2024
15 pages
MapReduce Tutorial
No ratings yet
MapReduce Tutorial
32 pages
Hadoop - Quick Guide Hadoop - Big Data Overview
No ratings yet
Hadoop - Quick Guide Hadoop - Big Data Overview
32 pages
Mapreduce Class Notes
No ratings yet
Mapreduce Class Notes
43 pages
SDN Question Bank-CSE
No ratings yet
SDN Question Bank-CSE
8 pages
B E - Computer-Engg
No ratings yet
B E - Computer-Engg
27 pages
Hadoop Questions
No ratings yet
Hadoop Questions
41 pages
Introduction To Big Data With Spark and Hadoop
No ratings yet
Introduction To Big Data With Spark and Hadoop
61 pages
Sem 620
No ratings yet
Sem 620
22 pages
1615888543RME - Detail Syllabus PhD-2020
No ratings yet
1615888543RME - Detail Syllabus PhD-2020
28 pages
Question 1: Your Answer
100% (1)
Question 1: Your Answer
26 pages
Pig and Pig Latin
No ratings yet
Pig and Pig Latin
16 pages
AWS Plus Common Big Data Notes
No ratings yet
AWS Plus Common Big Data Notes
3 pages
PYQ Big Data Analytics 1 SEC May 2024
No ratings yet
PYQ Big Data Analytics 1 SEC May 2024
2 pages
1c MR YARN Transcript
No ratings yet
1c MR YARN Transcript
4 pages
BIG Data Analytics Pipeline
No ratings yet
BIG Data Analytics Pipeline
3 pages
IM Ch14 Big Data Analytics NoSQL Ed12
No ratings yet
IM Ch14 Big Data Analytics NoSQL Ed12
8 pages
Implementation and Comparison of Recommender Systems Using Various Models
100% (1)
Implementation and Comparison of Recommender Systems Using Various Models
13 pages
Seminar Report On Bigdata and Hadoop
No ratings yet
Seminar Report On Bigdata and Hadoop
4 pages
Hive Tutorial For Beginners: Learn With Examples in 3 Days
No ratings yet
Hive Tutorial For Beginners: Learn With Examples in 3 Days
3 pages
Big Data Analytics For R-2017 by ArunPrasath S., Sriram Kumar K., Krishna Sankar P.
No ratings yet
Big Data Analytics For R-2017 by ArunPrasath S., Sriram Kumar K., Krishna Sankar P.
7 pages
Information Management Syllabus
No ratings yet
Information Management Syllabus
9 pages
Int 421
No ratings yet
Int 421
2 pages
Hadoop Online Training
No ratings yet
Hadoop Online Training
7 pages
Big Data Analytics
From Everand
Big Data Analytics
Nitin Kumar Yadav
No ratings yet
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
From Everand
Mastering Data Engineering: Advanced Techniques with Apache Hadoop and Hive
Peter Jones
No ratings yet
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
From Everand
Advanced Hadoop Techniques: A Comprehensive Guide to Mastery
Adam Jones
No ratings yet
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet

Unit 2 BDA

Uploaded by

Unit 2 BDA

Uploaded by

Unit 2: Apache Hadoop

(10-mark answers for each topic)

1. Introduction to Apache Hadoop

• Definition: Hadoop is an open-source framework for distributed storage and

1. Handles large-scale data storage (HDFS).

2. Processes data in parallel using MapReduce.

3. Scalable and fault-tolerant.

o Social media analysis (Facebook, Twitter).

o Fraud detection in banking.

o Search engines like Google.

1. HDFS: Hadoop Distributed File System for storage.

2. MapReduce: Programming model for processing.

3. YARN: Resource management and job scheduling.

1. Storage Layer (HDFS): Manages large data storage across clusters.

2. Processing Layer (MapReduce): Executes parallel computations on

3. Resource Layer (YARN): Allocates system resources efficiently.

4. Hadoop Distributed File System (HDFS)

2. Replication: Data is replicated across nodes for fault tolerance.

3. Write Once, Read Many: Optimized for reading operations.

• Definition: A programming model for parallel data processing.

1. Input Split: Data is divided into chunks.

2. Map Phase: Processes data in key-value pairs.

3. Reduce Phase: Aggregates and produces the final output.

o High scalability and efficiency.

o Works on commodity hardware.

6. YARN (Yet Another Resource Negotiator)

• Definition: A resource management layer in Hadoop for job scheduling.

1. Resource Manager: Allocates resources for applications.

2. Node Manager: Monitors individual nodes and reports to the Resource

o Increases cluster utilization.

7. Hadoop Installation and Modes

1. Download and install Hadoop.

2. Configure HDFS and MapReduce settings.

3. Start Hadoop services.

1. Standalone Mode: Single node for testing.

2. Pseudo-Distributed Mode: Simulates a cluster on one machine.

3. Fully Distributed Mode: Real cluster with multiple nodes.

1. hdfs dfs -ls: List files in HDFS.

2. hdfs dfs -put: Upload files to HDFS.

3. hdfs dfs -get: Download files from HDFS.

1. yarn application -list: View running applications.

2. yarn logs: View application logs.

9. Moving Data In and Out of Hadoop

o Upload data using commands like hdfs dfs -put.

o Retrieve processed data using hdfs dfs -get.

• Overview: Writing applications in Java or Python to process data using

• Example Program: Word Count application in Hadoop.

You might also like