How MapReduce completes a task?
Last Updated :
09 Aug, 2019
Application master changes the status for the job to
"successful" when it receives a notification that the last task for a job is complete. Then it learns that the job has completed successfully when the Job polls for status. So, a message returns from the
waitForCompletion()
method after it prints a message, to tell the user about the successful completion of the task. At this point job, statistics and counters are printed. If the application master is configured to do so, it also sends an HTTP job notification. Using the
mapreduce.job.end-notification.url
the property, clients wishing to receive callbacks that can configure it. Finally, the task containers and the application master clean up their working state after completing the job. So, the
OutputCommitter's commitJob()
method is called and the intermediate output is deleted. To enable later interrogation by users if desired, job information is archived by the job history server.
Case of failures?
Real user code can process crash, can be full of bugs or even the machine can fail. The capability of Hadoop to handle such failures is the biggest benefit of using it which allows the job to be completed successfully. Any of the following
components can fail:
- Application master
- Node manager
- Resource manager
- Task

The most common of this is
Task failure. When a user code in the
reduce task or
map task, runtime exception is the most common occurrence of this failure.
JVM reports the error back if this happens, to its parent application master before it exits. The error finally makes it to the user logs. The application frees up the container so its resources are available for another task after marking the task attempt as failed.
To stream the task, the Streaming process is marked as failed if the Streaming process exits with a nonzero exit code.
stream.non.zero.exit.is.failure
property (the default is true) governs this behaviour. The sudden exit of the task, JVM is another failure mode and perhaps due to the exposition of MapReduce user code, there is a JVM bug that causes the JVM to exit for a particular set of circumstances. Node manager notices that the process has exited. So, it can mark the attempt as failed as the application master is informed. Hanging tasks are dealt with differently. Application master proceeds to mark the task as failed and notices that it hasn’t received a progress update for some time. After this period, the task JVM process will be killed automatically. The timeout period can be configured on a per-job basis by setting the
mapreduce.task.timeout
property to a value in milliseconds. After this task, tasks are considered failed is normally 10 minutes. Long-running tasks are never marked as failed because setting the timeout to a value of zero disables the timeout. Over time there may be cluster slowdown as a result and a hanging task will never free up its container. So to make sure that a task is reporting progress periodically should suffice, this approach should be avoided. The application master will reschedule the execution of the task after it is being notified of a task attempt. After the task is failed, the application master will try to avoid rescheduling the task on a node manager. It will not be retried again if a task fails four times. This value is configurable to control the maximum number of the task. It is controlled by the
mapreduce.reduce.maxattempts
for reduce tasks and
mapreduce.map.maxattempts
property for map tasks. The whole job fails by default if any task fails four times. If a few tasks fail, it is undesirable to abort the job for some application because to use the results of the job despite some failures is possible. Without triggering, job failure can be set for the job. Using the
mapreduce.map.failures.maxpercent
and
mapreduce.reduce.failures.maxpercent
properties map tasks and reduce tasks are controlled independently. Task getting killed is different from failing. Because of speculative duplicate or if the node manager was running, a task attempt may also be killed.
mapreduce.map.maxattempts
and
mapreduce.reduce.maxattempts
tasks will not count killed task attempts against the number of attempts to run the task.
Similar Reads
Operating System Tutorial An Operating System(OS) is a software that manages and handles hardware and software resources of a computing device. Responsible for managing and controlling all the activities and sharing of computer resources among different running applications.A low-level Software that includes all the basic fu
4 min read
Types of Operating Systems Operating Systems can be categorized according to different criteria like whether an operating system is for mobile devices (examples Android and iOS) or desktop (examples Windows and Linux). Here, we are going to classify based on functionalities an operating system provides.8 Main Operating System
11 min read
What is an Operating System? An Operating System is a System software that manages all the resources of the computing device. Acts as an interface between the software and different parts of the computer or the computer hardware. Manages the overall resources and operations of the computer. Controls and monitors the execution o
9 min read
CPU Scheduling in Operating Systems CPU scheduling is a process used by the operating system to decide which task or process gets to use the CPU at a particular time. This is important because a CPU can only handle one task at a time, but there are usually many tasks that need to be processed. The following are different purposes of a
8 min read
Introduction of Deadlock in Operating System A deadlock is a situation where a set of processes is blocked because each process is holding a resource and waiting for another resource acquired by some other process. In this article, we will discuss deadlock, its necessary conditions, etc. in detail.Deadlock is a situation in computing where two
11 min read
Page Replacement Algorithms in Operating Systems In an operating system that uses paging for memory management, a page replacement algorithm is needed to decide which page needs to be replaced when a new page comes in. Page replacement becomes necessary when a page fault occurs and no free page frames are in memory. in this article, we will discus
7 min read
Paging in Operating System Paging is the process of moving parts of a program, called pages, from secondary storage (like a hard drive) into the main memory (RAM). The main idea behind paging is to break a program into smaller fixed-size blocks called pages.To keep track of where each page is stored in memory, the operating s
8 min read
Disk Scheduling Algorithms Disk scheduling algorithms are crucial in managing how data is read from and written to a computer's hard disk. These algorithms help determine the order in which disk read and write requests are processed, significantly impacting the speed and efficiency of data access. Common disk scheduling metho
12 min read
Memory Management in Operating System Memory is a hardware component that stores data, instructions and information temporarily or permanently for processing. It consists of an array of bytes or words, each with a unique address. Memory holds both input data and program instructions needed for the CPU to execute tasks.Memory works close
7 min read
Banker's Algorithm in Operating System Banker's Algorithm is a resource allocation and deadlock avoidance algorithm used in operating systems. It ensures that a system remains in a safe state by carefully allocating resources to processes while avoiding unsafe states that could lead to deadlocks.The Banker's Algorithm is a smart way for
8 min read