0% found this document useful (0 votes)
136 views5 pages

Hadoop YARN Architecture

Notes

Uploaded by

Priya Elango
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
136 views5 pages

Hadoop YARN Architecture

Notes

Uploaded by

Priya Elango
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 5

Hadoop YARN Architecture

YARN stands for “Yet Another Resource Negotiator“. It was introduced in Hadoop 2.0 to
remove the bottleneck on Job Tracker which was present in Hadoop 1.0. YARN was
described as a “Redesigned Resource Manager” at the time of its launching, but it has now
evolved to be known as large-scale distributed operating system used for Big Data
processing.

YARN architecture basically separates resource management layer from the processing layer.
In Hadoop 1.0 version, the responsibility of Job tracker is split between the resource manager
and application manager.

YARN also allows different data processing engines like graph processing, interactive
processing, stream processing as well as batch processing to run and process data stored in
HDFS (Hadoop Distributed File System) thus making the system much more efficient.
Through its various components, it can dynamically allocate various resources and schedule
the application processing.
For large volume data processing, it is quite necessary to manage the available resources
properly so that every application can leverage them.
YARN Features: YARN gained popularity because of the following features-

 Scalability: The scheduler in Resource manager of YARN architecture allows


Hadoop to extend and manage thousands of nodes and clusters.
 Compatibility: YARN supports the existing map-reduce applications without
disruptions thus making it compatible with Hadoop 1.0 as well.
 Cluster Utilization: Since YARN supports Dynamic utilization of cluster in Hadoop,
which enables optimized Cluster Utilization.
 Multi-tenancy: It allows multiple engine access thus giving organizations a benefit of
multi-tenancy.

Hadoop YARN Architecture

The main components of YARN architecture include:

 Client: It submits map-reduce jobs.


 Resource Manager: It is the master daemon of YARN and is responsible for resource
assignment and management among all the applications.
 Whenever it receives a processing request, it forwards it to the corresponding node
manager and allocates resources for the completion of the request accordingly.
 It has two major components:
o Scheduler: It performs scheduling based on the allocated application and
available resources.
 It is a pure scheduler, means it does not perform other tasks such as
monitoring or tracking and does not guarantee a restart if a task fails.
 The YARN scheduler supports plugins such as Capacity Scheduler and
Fair Scheduler to partition the cluster resources.
o Application manager: It is responsible for accepting the application and
negotiating the first container from the resource manager.
 It also restarts the Application Master container if a task fails.
 Node Manager: It takes care of individual node on Hadoop cluster and manages
application and workflow and that particular node.
o Its primary job is to keep-up with the Resource Manager. It registers with the
Resource Manager and sends heartbeats with the health status of the node.
o It monitors resource usage, performs log management and also kills a
container based on directions from the resource manager.
o It is also responsible for creating the container process and start it on the
request of Application master.
 Application Master: An application is a single job submitted to a framework.
o The application master is responsible for negotiating resources with the
resource manager, tracking the status and monitoring progress of a single
application.
o The application master requests the container from the node manager by
sending a Container Launch Context(CLC) which includes everything an
application needs to run.
o Once the application is started, it sends the health report to the resource
manager from time-to-time.
 Container: It is a collection of physical resources such as RAM, CPU cores and disk
on a single node.
o The containers are invoked by Container Launch Context(CLC) which is a
record that contains information such as environment variables, security
tokens, dependencies etc.
Application workflow in Hadoop YARN:

1. Client submits an application


2. The Resource Manager allocates a container to start the Application Manager
3. The Application Manager registers itself with the Resource Manager
4. The Application Manager negotiates containers from the Resource Manager
5. The Application Manager notifies the Node Manager to launch containers
6. Application code is executed in the container
7. Client contacts Resource Manager/Application Manager to monitor application’s
status
8. Once the processing is complete, the Application Manager un-registers with the
Resource Manager
Advantages :
 Flexibility: YARN offers flexibility to run various types of distributed processing
systems such as Apache Spark, Apache Flink, Apache Storm, and others. It allows
multiple processing engines to run simultaneously on a single Hadoop cluster.
 Resource Management: YARN provides an efficient way of managing resources in
the Hadoop cluster. It allows administrators to allocate and monitor the resources
required by each application in a cluster, such as CPU, memory, and disk space.
 Scalability: YARN is designed to be highly scalable and can handle thousands of
nodes in a cluster. It can scale up or down based on the requirements of the
applications running on the cluster.
 Improved Performance: YARN offers better performance by providing a centralized
resource management system. It ensures that the resources are optimally utilized, and
applications are efficiently scheduled on the available resources.
 Security: YARN provides robust security features such as Kerberos authentication,
Secure Shell (SSH) access, and secure data transmission. It ensures that the data
stored and processed on the Hadoop cluster is secure.
Disadvantages :
 Complexity: YARN adds complexity to the Hadoop ecosystem. It requires additional
configurations and settings, which can be difficult for users who are not familiar with
YARN.
 Overhead: YARN introduces additional overhead, which can slow down the
performance of the Hadoop cluster. This overhead is required for managing resources
and scheduling applications.
 Latency: YARN introduces additional latency in the Hadoop ecosystem. This latency
can be caused by resource allocation, application scheduling, and communication
between components.
 Single Point of Failure: YARN can be a single point of failure in the Hadoop cluster.
If YARN fails, it can cause the entire cluster to go down. To avoid this, administrators
need to set up a backup YARN instance for high availability.
 Limited Support: YARN has limited support for non-Java programming languages.
Although it supports multiple processing engines, some engines have limited
language support, which can limit the usability of YARN in certain environments.

You might also like