What Is Hadoop - Introduction, Architecture, Ecosystem, Components
What Is Hadoop - Introduction, Architecture, Ecosystem, Components
What is Hadoop? Introduction, Architecture,
Ecosystem, Components
Online Machine Learning
What is Hadoop?
Apache Hadoop is an open source
Compare courses from top universities and online platforms for free.
Applications built using HADOOP are run on large data sets distributed across
clusters of commodity computers. Commodity computers are cheap and widely
available. These are mainly useful for achieving greater computational power at
low cost.
giotto.ai OPEN
(/images/Big_Data/061114_0803_LearnHadoop4.png)
Although Hadoop is best known for MapReduce and its distributed file system-
HDFS, the term is also used for a family of related projects that fall under the
umbrella of distributed computing and large-scale data processing. Other
Hadoop-related projects at Apache (/apache.html)include
are Hive, HBase, Mahout, Sqoop, Flume, and ZooKeeper.
Hadoop Architecture
(/images/1/hadoop-architecture.png)
Hadoop has a Master-Slave Architecture for data storage and distributed data
processing using MapReduce and HDFS methods.
NameNode:
NameNode represented every files and directory which is used in the namespace
DataNode:
DataNode helps you to manage the state of an HDFS node and allows you to
interacts with the blocks
MasterNode:
The master node allows you to conduct parallel processing of data using Hadoop
MapReduce.
Slave node:
The slave nodes are the additional machines in the Hadoop cluster which allows
you to store data to conduct complex calculations. Moreover, all the slave node
comes with Task Tracker and a DataNode. This allows you to synchronize the
processes with the NameNode and Job Tracker respectively.
Features Of 'Hadoop'
• Suitable for Big Data Analysis
• Scalability
HADOOP clusters can easily be scaled to any extent by adding additional cluster
nodes and thus allows for the growth of Big Data. Also, scaling does not require
modifications to application logic.
• Fault Tolerance
HADOOP ecosystem has a provision to replicate the input data on to other cluster
nodes. That way, in the event of a cluster node failure, data processing can still
proceed by using data stored on another cluster node.
(/images/Big_Data/061114_0803_LearnHadoop13.jpg)
Hadoop cluster consists of a data center, the rack and the node which actually
executes jobs. Here, data center consists of racks and rack consists of nodes.
Network bandwidth available to processes varies depending upon the location of
the processes. That is, the bandwidth available becomes lesser as we go away
from-
Next (/how-to-install-hadoop.html)
BigData Tutorials
g
1) What Is Big Data (/what-is-big-data.html)
3) Installation (/how-to-install-hadoop.html)
5) MAPReduce (/introduction-to-mapreduce.html)
(https://www.facebook.com/guru99com/)
(https://twitter.com/guru99com)
(https://www.linkedin.com/company/guru99/)
(https://www.youtube.com/channel/UC19i1XD6k88KqHlET8atq
(https://forms.aweber.com/form/46/724807646.htm)
About
About Us (/about-us.html)
Advertise with Us (/advertise-us.html)
Write For Us (/become-an-instructor.html)
Contact Us (/contact-us.html)
Career Suggestion
SAP Career Suggestion Tool (/best-sap-module.html)
Software Testing as a Career (/software-testing-career-
complete-guide.html)
Interesting
eBook (/ebook-pdf.html)
Blog (/blog/)
Quiz (/tests.html)
SAP eBook (/sap-ebook-pdf.html)
Execute online
Execute Java Online (/try-java-editor.html)
Execute Javascript (/execute-javascript-online.html)
Execute HTML (/execute-html-online.html)
Execute Python (/execute-python-online.html)