100% found this document useful (1 vote)
97 views

Hadoop Overview

This document provides an overview of Hadoop, including its core components of HDFS and MapReduce. It summarizes that Hadoop is an open-source framework for distributed storage and processing of large datasets across commodity hardware. It discusses that HDFS provides a distributed file system that stores data across cluster nodes and provides high bandwidth. MapReduce allows applications to split work across processors near the data location and self-heal from failures. The document outlines the master-slave architecture of HDFS with the NameNode and DataNodes and how MapReduce uses a map and reduce phase to distribute tasks.

Uploaded by

Sunil D Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
97 views

Hadoop Overview

This document provides an overview of Hadoop, including its core components of HDFS and MapReduce. It summarizes that Hadoop is an open-source framework for distributed storage and processing of large datasets across commodity hardware. It discusses that HDFS provides a distributed file system that stores data across cluster nodes and provides high bandwidth. MapReduce allows applications to split work across processors near the data location and self-heal from failures. The document outlines the master-slave architecture of HDFS with the NameNode and DataNodes and how MapReduce uses a map and reduce phase to distribute tasks.

Uploaded by

Sunil D Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 16

HADOOP OVERVIEW

This Session covers


1. Brief overview of Hadoop and its architecture
2. Brief overview of Map-Reduce and its architecture
3. Demo on Hadoop Cluster (Pseudo) with one example
4. Hue Interface (Discussion Hadoop Exercises who got the
access to Automotive Hadoop Platform.)

And this Session is not about


1. Implementing MapReduce through Java Programming
2. In depth discussion on Architecture of columnar
databases etc.
Hadoop

 Apache Hadoop is a framework that allows for the distributed


processing of large data sets across clusters of commodity
computers using a simple programming model.

 It is an Open-source Data Management with scale-out storage &


distributed processing.
Hadoop
Hadoop is a framework for running applications on large clusters
built of commodity hardware. The Hadoop framework
transparently provides applications both reliability and data
motion. Hadoop implements a computational paradigm named
Map/Reduce, where the application is divided into many small
fragments of work, each of which may be executed or re-executed
on any node in the cluster. In addition, it provides a distributed
file system (HDFS) that stores data on the compute nodes,
providing very high aggregate bandwidth across the cluster. Both
Map/Reduce and the distributed file system are designed so that
node failures are automatically handled by the framework.

4
Hadoop Key Characteristics

Reliable

Flexible Features Economical

Scalable

5
Hadoop Core Components
Hadoop is a system for large scale data processing.
It has two main components:

 HDFS –Hadoop Distributed File System(Storage)


 Distributed across “nodes”
 Natively redundant
 NameNode tracks locations.

 MapReduce (Processing)
 Splits a task across processors
 “near” the data & assembles results
 Self-Healing, High Bandwidth
 Clustered storage
 JobTracker manages the TaskTrackers

6
HDFS
Hadoop's Distributed File System is designed to reliably store very
large files across machines in a large cluster. It is inspired by the
Google File System. Hadoop DFS stores each file as a sequence of
blocks, all blocks in a file except the last block are the same size.
Blocks belonging to a file are replicated for fault tolerance. The
block size and replication factor are configurable per file. Files in
HDFS are "write once" and have strictly one writer at any time.

Hadoop Distributed File System


• Store large data sets
• Cope with hardware failure
• Emphasize streaming data access
Main Components of HDFS
 NameNode:
 master of the system
 maintains and manages the blocks which are present on the
DataNodes

 DataNodes:
 slaves which are deployed on each machine and provide the
 actual storage
 responsible for serving read and write requests for the clients
Map Reduce
The Hadoop Map/Reduce framework harnesses a cluster of machines and
executes user defined Map/Reduce jobs across the nodes in the cluster. A
Map/Reduce computation has two phases, a map phase and a reduce phase. The
input to the computation is a data set of key/value pairs.
Tasks in each phase are executed in a fault-tolerant manner, if node(s) fail in the
middle of a computation the tasks assigned to them are re-distributed among
the remaining nodes. Having many map and reduce tasks enables good load
balancing and allows failed tasks to be re-run with small runtime overhead.

Goals: – Hadoop Map/Reduce


• Process large data sets
• Cope with hardware failure
• High throughput

Ref: http://labs.google.com/papers/mapreduce.html
Architecture
Like Hadoop Map/Reduce, HDFS follows a master/slave
architecture. An HDFS installation consists of a single Namenode, a
master server that manages the file system namespace and
regulates access to files by clients. In addition, there are a number
of Datanodes, one per node in the cluster, which manage storage
attached to the nodes that they run on. The Namenode makes
filesystem namespace operations like opening, closing, renaming
etc. of files and directories available via an RPC interface. It also
determines the mapping of blocks to Datanodes. The Datanodes are
responsible for serving read and write requests from filesystem
clients, they also perform block creation, deletion, and replication
upon instruction from the Namenode.
Simple Hadoop Architecture
(Small cluster)
Map-Reduce Architecture
Map-Reduce Process
Hadoop Ecosystem
Hadoop Demo
Thank You

16

You might also like