DM Hadoop Architecture
DM Hadoop Architecture
Google. Today lots of Big Brand Companies are using Hadoop in their
Organization to deal with big data, eg. Facebook, Yahoo, Netflix, eBay, etc. The
Hadoop Architecture Mainly consists of 4 components.
● MapReduce
MapReduce is nothing but just like an Algorithm or a data structure that is based
on the YARN framework. The major feature of MapReduce is to perform the
distributed processing in parallel in a Hadoop cluster which Makes Hadoop
working so fast. When you are dealing with Big Data, serial processing is no
more of any use. MapReduce has mainly 2 tasks which are divided phase-wise:
● The Map() function here breaks this DataBlocks into Tuples that are
nothing but a key-value pair.
● The Reduce() function then combines this broken Tuples or key-value pair
based on its Key value and form set of Tuples, and performs some
operation like sorting, summation type job, etc.
Map Task:
actually its locational information and value is the data associated with
it.
workflow.
Reduce Task
● Shuffle and Sort: The Task of Reducer starts with this step, the
Shuffling process the system can sort the data using its key value.
Once some of the Mapping tasks are done Shuffling begins, that is why
it is a faster process and does not wait for the completion of the task
performed by Mapper.
● Reduce: The main function or task of the Reduce is to gather the Tuple
generated from Map and then perform some sorting and aggregation
● NameNode(Master)
● DataNode(Slave)
● Namenode is mainly used for storing the Metadata i.e. the data about the
data. Metadata can be the transaction logs that keep track of the user’s
activity in a Hadoop cluster.
● DataNodes works as a Slave. DataNodes are mainly utilized for storing
the data in a Hadoop cluster, the number of DataNodes can be from 1 to
500 or even more than that.
File Block In HDFS: Data in HDFS is always stored in terms of blocks. So the
single block of data is divided into multiple blocks of size 128MB which is
default and you can also change it manually.
Features of YARN
● Multi-Tenancy
● Scalability
● Cluster-Utilization
● Compatibility
Hadoop common or Common utilities are nothing but our java library and java
files or we can say the java scripts that we need for all the other components
present in a Hadoop cluster. These utilities are used by HDFS, YARN, and
MapReduce for running the cluster. Hadoop Common verifies that Hardware