0% found this document useful (0 votes)
60 views

What Is Hadoop - Introduction, Architecture, Ecosystem, Components

Uploaded by

Ahmed Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
60 views

What Is Hadoop - Introduction, Architecture, Ecosystem, Components

Uploaded by

Ahmed Mohamed
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

(/)


What is Hadoop? Introduction, Architecture,
Ecosystem, Components
Online Machine Learning
What is Hadoop?
Apache Hadoop is an open source
Compare courses from top universities and online platforms for free.

Coursary OPEN software framework used to develop


data processing applications which
are executed in a distributed computing environment.

 Applications built using HADOOP are run on large data sets distributed across
clusters of commodity computers. Commodity computers are cheap and widely
available. These are mainly useful for achieving greater computational power at
low cost.

Similar to data residing in a local file system of a personal computer system, in


Hadoop, data resides in a distributed file system which is called as a Hadoop
Distributed File system. The processing model is based on 'Data Locality' concept
wherein computational logic is sent to cluster nodes(server) containing data. This
computational logic is nothing, but a compiled version of a program written in a
high-level language such as Java. Such a program, processes data stored in
Hadoop HDFS.

Do you know? Computer cluster consists of a set of multiple processing units


(storage disk + processor) which are connected to each other and acts as a
single system.

In this tutorial, you will learn,

Hadoop EcoSystem and Components


Hadoop Architecture
Features Of 'Hadoop'
Network Topology In Hadoop
Hadoop EcoSystem and Components
Below diagram shows various components in the Hadoop ecosystem-

Build your ML application


Build your machine learning application without coding.

giotto.ai OPEN

(/images/Big_Data/061114_0803_LearnHadoop4.png)

Apache Hadoop consists of two sub-projects –

1. Hadoop MapReduce: MapReduce is a computational model and software


framework for writing applications which are run on Hadoop. These
MapReduce programs are capable of processing enormous data in parallel on
large clusters of computation nodes.
2. HDFS (Hadoop Distributed File System): HDFS takes care of the storage part of
Hadoop applications. MapReduce applications consume data from HDFS.
HDFS creates multiple replicas of data blocks and distributes them on
compute nodes in a cluster. This distribution enables reliable and extremely
rapid computations.

Although Hadoop is best known for MapReduce and its distributed file system-
HDFS, the term is also used for a family of related projects that fall under the
umbrella of distributed computing and large-scale data processing. Other
Hadoop-related projects at Apache (/apache.html)include
are Hive, HBase, Mahout, Sqoop, Flume, and ZooKeeper.

Hadoop Architecture

(/images/1/hadoop-architecture.png)

High Level Hadoop Architecture

Hadoop has a Master-Slave Architecture for data storage and distributed data
processing using MapReduce and HDFS methods.

NameNode:

NameNode represented every files and directory which is used in the namespace

DataNode:

DataNode helps you to manage the state of an HDFS node and allows you to
interacts with the blocks

MasterNode:
The master node allows you to conduct parallel processing of data using Hadoop
MapReduce.

Slave node:

The slave nodes are the additional machines in the Hadoop cluster which allows
you to store data to conduct complex calculations. Moreover, all the slave node
comes with Task Tracker and a DataNode. This allows you to synchronize the
processes with the NameNode and Job Tracker respectively.

In Hadoop, master or slave system can be set up in the cloud or on-premise

Features Of 'Hadoop'
• Suitable for Big Data Analysis

As Big Data tends to be distributed and unstructured in nature, HADOOP clusters


are best suited for analysis of Big Data. Since it is processing logic (not the actual
data) that flows to the computing nodes, less network bandwidth is consumed.
This concept is called as data locality concept which helps increase the efficiency
of Hadoop based applications.

• Scalability

Build your machine learning application


OPEN
without coding.

HADOOP clusters can easily be scaled to any extent by adding additional cluster
nodes and thus allows for the growth of Big Data. Also, scaling does not require
modifications to application logic.

• Fault Tolerance

HADOOP ecosystem has a provision to replicate the input data on to other cluster
nodes. That way, in the event of a cluster node failure, data processing can still
proceed by using data stored on another cluster node.

Network Topology In Hadoop


Topology (Arrangment) of the network, affects the performance of the Hadoop
cluster when the size of the Hadoop cluster grows. In addition to the performance,
one also needs to care about the high availability and handling of failures. In
order to achieve this Hadoop, cluster formation makes use of network topology.

(/images/Big_Data/061114_0803_LearnHadoop13.jpg)

Typically, network bandwidth is an important factor to consider while forming any


network. However, as measuring bandwidth could be difficult, in Hadoop, a
network is represented as a tree and distance between nodes of this tree (number
of hops) is considered as an important factor in the formation of Hadoop cluster.
Here, the distance between two nodes is equal to sum of their distance to their
closest common ancestor.

Hadoop cluster consists of a data center, the rack and the node which actually
executes jobs. Here, data center consists of racks and rack consists of nodes.
Network bandwidth available to processes varies depending upon the location of
the processes. That is, the bandwidth available becomes lesser as we go away
from-

Processes on the same node


Different nodes on the same rack
Nodes on different racks of the same data center
Nodes in different data centers

 Prev (/what-is-big-data.html) Report a Bug

Next  (/how-to-install-hadoop.html)

YOU MIGHT LIKE:

BIGDATA BLOG BLOG

(/big-data-tools.html) (/neoload-selenium- (/testing-ebook-


(/big-data- tutorial.html) pdf.html)
tools.html) (/neoload- (/testing-ebook-
Top 15 Big Data Tools | selenium- pdf.html)
Open Source So ware tutorial.html) 6 Testing eBook Bundle
for Data Analytics NeoLoad with Selenium Just $39
(/big-data-tools.html) Tutorial (/testing-ebook-
(/neoload-selenium- pdf.html)
tutorial.html)

DEVOPS SDLC BIGDATA

(/kubernetes- (/computer- (/big-data-analytics-


tutorial.html) programming- tools.html)
(/kubernetes- tutorial.html) (/big-data-
tutorial.html) (/computer- analytics-tools.html)
Kubernetes Tutorial: programming- 10 Best Data Analytics
Architecture, Basics, tutorial.html) Tools for Big Data
Features with EXAMPLE What is Computer Analysis (2020)
(/kubernetes- Programming? Basics to (/big-data-analytics-
tutorial.html) Learn Coding tools.html)
(/computer-
programming-
tutorial.html)

BigData Tutorials
g
1) What Is Big Data (/what-is-big-data.html)

2) What is Hadoop (/learn-hadoop-in-10-minutes.html)

3) Installation (/how-to-install-hadoop.html)

4) Learn HDFS (/learn-hdfs-a-beginners-guide.html)

5) MAPReduce (/introduction-to-mapreduce.html)

 (https://www.facebook.com/guru99com/)
 (https://twitter.com/guru99com) 
(https://www.linkedin.com/company/guru99/)

(https://www.youtube.com/channel/UC19i1XD6k88KqHlET8atq

(https://forms.aweber.com/form/46/724807646.htm)

About
About Us (/about-us.html)
Advertise with Us (/advertise-us.html)
Write For Us (/become-an-instructor.html)
Contact Us (/contact-us.html)

Career Suggestion
SAP Career Suggestion Tool (/best-sap-module.html)
Software Testing as a Career (/software-testing-career-
complete-guide.html)

Interesting
eBook (/ebook-pdf.html)
Blog (/blog/)
Quiz (/tests.html)
SAP eBook (/sap-ebook-pdf.html)

Execute online
Execute Java Online (/try-java-editor.html)
Execute Javascript (/execute-javascript-online.html)
Execute HTML (/execute-html-online.html)
Execute Python (/execute-python-online.html)

© Copyright - Guru99 2020


        Privacy Policy (/privacy-policy.html)  |  Affiliate
Disclaimer (/affiliate-earning-disclaimer.html)  |  ToS
(/terms-of-service.html)

You might also like