Unit-2 (Hadoop)

The document discusses different aspects of Hadoop including scalability, horizontal and vertical scaling, Hadoop streaming, and Hadoop pipes. Hadoop allows easy scaling of clusters by adding more nodes and supports horizontal and vertical scaling. Hadoop streaming enables using non-Java languages for MapReduce jobs. Hadoop pipes provides a C++ interface to Hadoop MapReduce.

Uploaded by

tripathineeharika

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views

Unit-2 (Hadoop)

Uploaded by

tripathineeharika

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 16

Big Data And Analytics

UNIT-II
Hadoop
Scalability
The primary benefit of Hadoop is its Scalability. One
can easily scale the cluster by adding more nodes.
There are two types of Scalability in Hadoop:
• Vertical
• Horizontal
Vertical scalability
It is also referred to as “scale up”. In vertical
scaling, you can increase the hardware
capacity of the individual machine. In other
words, you can add more RAM or CPU to
your existing system to make it more robust
and powerful.
Horizontal scalability
It is also referred as “scale out” is basically the
addition of more machines or setting up the
cluster. In horizontal scaling instead of increasing
hardware capacity of individual machines you
add more nodes to existing clusters and most
importantly, you can add more machines without
stopping the system.
Therefore we don’t have any downtime or green zone, nothing
of such sort while scaling out. So at last to meet your
requirements you will have more machines working in parallel.
Hadoop Streaming
Hadoop MapReduce
framework is written in
Java and provides
support for writing
map/reduce programs in
Java only. But Hadoop
provides an API for
writing MapReduce
programs in languages
other than Java.
• Hadoop Streaming is the utility that allows us to create and run
MapReduce jobs with any script or executable as the mapper or
the reducer.
• It uses Unix streams as the interface between the Hadoop and
our MapReduce program so that we can use any language
which can read standard input and write to standard output to
write for writing our MapReduce program.
• Hadoop Streaming supports the execution of Java, as well as
non-Java, programmed MapReduce jobs execution over the
Hadoop cluster. It supports the Python, Perl, R, PHP, and C++
programming languages.
Syntax for Hadoop Streaming

You can use the below syntax to run MapReduce code written in a language

other than JAVA to process data using the Hadoop MapReduce framework.

• $HADOOP_HOME/bin/hadoop jar

• $HADOOP_HOME/hadoop-streaming.jar

• -input myInputDirs \

• -output myOutputDir \

• -mapper /bin/cat \

• -reducer /usr/bin/wc
• Parameters Description
Parameter Description
-input myInputDirs \ Input location for mapper
-output myOutputDir \ Output location for reducer
-mapper /bin/cat \ Mapper executable
-reducer /usr/bin/wc Reducer executable
How Streaming Works
Let us now see how Hadoop Streaming works.
• The mapper and the reducer (in the above example) are the
scripts that read the input line-by-line from stdin and emit the
output to stdout.
• The utility creates a Map/Reduce job and submits the job to an
appropriate cluster and monitors the job progress until its
completion.
• When a script is specified for mappers, then each mapper task
launches the script as a separate process when the mapper is
initialized.
• The mapper task converts its inputs (key, value pairs) into
lines and pushes the lines to the standard input of the
process. Meanwhile, the mapper collects the line oriented
outputs from the standard output and converts each line into
a (key, value pair) pair, which is collected as the result of the
mapper.
• When the reducer script is specified, then each reducer
task launches the script as a separate process, and then the
reducer is initialized.
•As the reducer task runs, it converts its input key/value pairs into lines and
feeds the lines to the standard input of the process. Meantime, the reducer
gathers the line-oriented outputs from the stdout of the process and converts
each line collected into a key/value pair, which is then collected as the
result of the reducer.
•For both mapper and reducer, the prefix of a line until the first tab
character is the key, and the rest of the line is the value except the tab
character. In the case of no tab character in the line, the entire line is
considered as key, and the value is considered null. This is customizable by
setting -inputformat command option for mapper and -outputformat option
for reducer
Hadoop Pipes
• Hadoop Pipes is the name of the C++ interface to
Hadoop MapReduce.
• Unlike Streaming, this uses standard input and
output to communicate with the map and reduce
code.
• Pipes uses sockets as the channel over which the
task tracker communicates with the process running
the C++ map or reduce function.
In many ways, the
approach will be
similar to Hadoop
streaming, but
using Writable
serialization to
convert the types
into bytes that are
sent to the process
via a socket.

Apache - Hadoop Streaming
No ratings yet
Apache - Hadoop Streaming
13 pages
Bda - Unit 3
No ratings yet
Bda - Unit 3
29 pages
STREAMING
No ratings yet
STREAMING
12 pages
P.prabu (28x61c) CCS334 BDA - Unit 4
No ratings yet
P.prabu (28x61c) CCS334 BDA - Unit 4
28 pages
Big Data Analytics Unit-3
No ratings yet
Big Data Analytics Unit-3
29 pages
Part 03 Intro To Hadoop
No ratings yet
Part 03 Intro To Hadoop
22 pages
Bda Unit 4
No ratings yet
Bda Unit 4
16 pages
BDA Unit 4 Notes
No ratings yet
BDA Unit 4 Notes
20 pages
Unit 2 - From Hadoop Streaming PDF
No ratings yet
Unit 2 - From Hadoop Streaming PDF
20 pages
BDT UNIT - III
No ratings yet
BDT UNIT - III
12 pages
Mapreduce Types and Formats
No ratings yet
Mapreduce Types and Formats
65 pages
CH 3
No ratings yet
CH 3
4 pages
Unit 3 MapReduce Part 2
No ratings yet
Unit 3 MapReduce Part 2
12 pages
Big Data Unit 2 Notes
No ratings yet
Big Data Unit 2 Notes
6 pages
Map Reduce
No ratings yet
Map Reduce
25 pages
YARN_snuc
No ratings yet
YARN_snuc
14 pages
ProgrammingHadoop ApacheConUS08
No ratings yet
ProgrammingHadoop ApacheConUS08
7 pages
Kcs 061 PPT Unit 2
No ratings yet
Kcs 061 PPT Unit 2
56 pages
Big Data Exam Help
No ratings yet
Big Data Exam Help
7 pages
Unit 3
No ratings yet
Unit 3
14 pages
Big data Unit 4 own
No ratings yet
Big data Unit 4 own
18 pages
Hadoop Introduction PDF
No ratings yet
Hadoop Introduction PDF
3 pages
BigData Unit 2
No ratings yet
BigData Unit 2
56 pages
Job Scheduling in MR
No ratings yet
Job Scheduling in MR
6 pages
CLOUD UNIT 5
No ratings yet
CLOUD UNIT 5
52 pages
Bda Unit 4
No ratings yet
Bda Unit 4
20 pages
Hadoop Streaming Hadoop Pipes Swig: 4 Inputs and Outputs
No ratings yet
Hadoop Streaming Hadoop Pipes Swig: 4 Inputs and Outputs
1 page
BD - Unit - III - MapReduce
100% (1)
BD - Unit - III - MapReduce
31 pages
ADBMS-Module 3
No ratings yet
ADBMS-Module 3
115 pages
S MapReduce Types Formats Features
No ratings yet
S MapReduce Types Formats Features
15 pages
Hadoop
No ratings yet
Hadoop
28 pages
Unit 4 Session 4
No ratings yet
Unit 4 Session 4
43 pages
6. Map Reduce Programming
No ratings yet
6. Map Reduce Programming
67 pages
S MapReduce Types Formats Features 03
No ratings yet
S MapReduce Types Formats Features 03
16 pages
Hadoop Map Reduce Concepts - Teaching - 1
No ratings yet
Hadoop Map Reduce Concepts - Teaching - 1
53 pages
day7
No ratings yet
day7
7 pages
By Pallavi Mandal Class: CS-B Roll No.: 2014BCS1150
No ratings yet
By Pallavi Mandal Class: CS-B Roll No.: 2014BCS1150
17 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
Unit-Iv CC&BD CS62
No ratings yet
Unit-Iv CC&BD CS62
76 pages
System Design and Implementation 5.1 System Design
No ratings yet
System Design and Implementation 5.1 System Design
14 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Hadoop: A Report Writing On
No ratings yet
Hadoop: A Report Writing On
13 pages
12 13 14 Map Reduce
No ratings yet
12 13 14 Map Reduce
57 pages
Survey Paper On Traditional Hadoop and Pipelined Map Reduce: Dhole Poonam B, Gunjal Baisa L
No ratings yet
Survey Paper On Traditional Hadoop and Pipelined Map Reduce: Dhole Poonam B, Gunjal Baisa L
5 pages
Unit-4
No ratings yet
Unit-4
19 pages
Lez.d-01-Hadoop (A) Intro
No ratings yet
Lez.d-01-Hadoop (A) Intro
58 pages
Unit v Programming Model
No ratings yet
Unit v Programming Model
53 pages
Unit-2 (MapReduce-II)
No ratings yet
Unit-2 (MapReduce-II)
11 pages
Unit 3 ETI (BDA)
No ratings yet
Unit 3 ETI (BDA)
34 pages
Analyzing_Data_with_Hadoop
No ratings yet
Analyzing_Data_with_Hadoop
54 pages
Lecture 03
No ratings yet
Lecture 03
26 pages
MapReduce Arch
No ratings yet
MapReduce Arch
29 pages
1.Mrplab Intro
No ratings yet
1.Mrplab Intro
18 pages
Big Data Mapreduce and Streaming
No ratings yet
Big Data Mapreduce and Streaming
10 pages
Unit 4
No ratings yet
Unit 4
11 pages
Lecture - 3
No ratings yet
Lecture - 3
25 pages
Unit 5 - Introduction To Hadoop
No ratings yet
Unit 5 - Introduction To Hadoop
50 pages
Map Reduce
No ratings yet
Map Reduce
30 pages
Relayd and Httpd Mastery: IT Mastery, #11
From Everand
Relayd and Httpd Mastery: IT Mastery, #11
Michael W. Lucas
No ratings yet
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
From Everand
SAP interface programming with RFC and VBA: Edit SAP data with MS Access
Karl Josef Hensel
No ratings yet
Table and Image in HTML
No ratings yet
Table and Image in HTML
21 pages
Big Data-Introduction
No ratings yet
Big Data-Introduction
14 pages
CSS Margin and Padding Properties Box Model
No ratings yet
CSS Margin and Padding Properties Box Model
15 pages
Unit-2 (MapReduce-I)
No ratings yet
Unit-2 (MapReduce-I)
28 pages
Unit-3 (HDFS)
No ratings yet
Unit-3 (HDFS)
59 pages
Unit 4 (MongoDB)
No ratings yet
Unit 4 (MongoDB)
46 pages
Acasde
No ratings yet
Acasde
3 pages
ResponsiveWebDesign PresentandFuturebyMRizwanPasha
No ratings yet
ResponsiveWebDesign PresentandFuturebyMRizwanPasha
21 pages
Download Mobile Development with NET Build cross platform mobile applications with Xamarin Forms 5 and ASP NET Core 5 2nd Edition Can Bilgin ebook All Chapters PDF
100% (2)
Download Mobile Development with NET Build cross platform mobile applications with Xamarin Forms 5 and ASP NET Core 5 2nd Edition Can Bilgin ebook All Chapters PDF
51 pages
Experience Letter Akash Int
No ratings yet
Experience Letter Akash Int
3 pages
SE 2024 Assignment 0
No ratings yet
SE 2024 Assignment 0
5 pages
P Bharath Java FSD Resume
No ratings yet
P Bharath Java FSD Resume
3 pages
SnehilPatel Java
No ratings yet
SnehilPatel Java
2 pages
AksheeVashist (5 0)
No ratings yet
AksheeVashist (5 0)
3 pages
Sde Oa Prep 2022
No ratings yet
Sde Oa Prep 2022
4 pages
Principles of Programming Languages
No ratings yet
Principles of Programming Languages
2 pages
Unit 4(Javascript)
No ratings yet
Unit 4(Javascript)
18 pages
SL Unit2 240611044103 033195a8
No ratings yet
SL Unit2 240611044103 033195a8
48 pages
(ESB) Solutions Compared
No ratings yet
(ESB) Solutions Compared
18 pages
Topic 6 Software Complexity
No ratings yet
Topic 6 Software Complexity
31 pages
My CV English Version
No ratings yet
My CV English Version
1 page
SONIC ESB. An Architecture and Lifecycle Definition
No ratings yet
SONIC ESB. An Architecture and Lifecycle Definition
55 pages
BSc OOAD
No ratings yet
BSc OOAD
96 pages
CSD - Java Programming Standards (1) 1
100% (3)
CSD - Java Programming Standards (1) 1
70 pages
Tech Sample 2
No ratings yet
Tech Sample 2
2 pages
Mohamed Ejlal Resume-7
No ratings yet
Mohamed Ejlal Resume-7
2 pages
Waterfall Model Detailed Presentation
No ratings yet
Waterfall Model Detailed Presentation
13 pages
Research Work (Compiler)
No ratings yet
Research Work (Compiler)
2 pages
unit 1 - oose
No ratings yet
unit 1 - oose
27 pages
Download ebooks file Real Time UML Advances in the UML for Real Time Systems 2nd Edition Edition Douglass all chapters
100% (8)
Download ebooks file Real Time UML Advances in the UML for Real Time Systems 2nd Edition Edition Douglass all chapters
60 pages
20483-Programming-in-C
No ratings yet
20483-Programming-in-C
3 pages
CO3010 Lab1 Report
No ratings yet
CO3010 Lab1 Report
30 pages
(Mastering Computer Science) Sufyan Bin Uzayr - Mastering NativeScript - A Beginner's Guide (2022, CRC Press) - Libgen - Li
No ratings yet
(Mastering Computer Science) Sufyan Bin Uzayr - Mastering NativeScript - A Beginner's Guide (2022, CRC Press) - Libgen - Li
280 pages
Scrum Anti Patterns Guide v50 2022 03 20
No ratings yet
Scrum Anti Patterns Guide v50 2022 03 20
97 pages
Classes in Java
No ratings yet
Classes in Java
9 pages
Ims606-System Development Methodologies (Individual)
No ratings yet
Ims606-System Development Methodologies (Individual)
7 pages
Oosd Remaining
No ratings yet
Oosd Remaining
8 pages
CodeMagen JD-Software Developer.pdf (3)
No ratings yet
CodeMagen JD-Software Developer.pdf (3)
2 pages