HDFS MAP REDUCE
HDFS MAP REDUCE
2. Data Node
In HDFS there are multiple data nodes exist that manages storages
attached to the node that they run on. They are usually used to store
users’ data on HDFS clusters.
Architecture of HDFS
3. HDFS Client
In Hadoop distributed file system, the user applications access the file
system using the HDFS client. Like any other file systems, HDFS
supports various operations to read, write and delete files, and
operations to create and delete directories
4. HDFS Blocks
In general, the user’s data stored in HDFS in terms of block. The files
in file system are divided in to one or more segments called blocks.
The default size of HDFS block is 64 MB that can be increase as
per need
Map Reduce
The MapReduce is a programming model provided by Hadoop
that allows expressing distributed computations on huge amount of
data.it provides easy scaling of data processing over multiple
computational nodes or clusters.
In MapReduce s model the data processing primitives used are
called mapper and reducer
Features of MapReduce
Synchronization : The MapReduce supports execution of
concurrent tasks. When the concurrent tasks are executed, they
need synchronization.
The synchronization is provided by reading the state of
each MapReduce operation during the execution and uses
shared variables for those.
Map Reduce
• Data locality : In MapReduce although the data resides on different
clusters, it appears like a local to the users’ application. To obtain the
best result the code and data of application should resides on same
machine.
The unit of work in MapReduce is a job. During map phase the input
data is divided in to input splits for analysis where each split is an
independent task. These tasks run in parallel across Hadoop clusters
Working of MapReduce Framework
Google App Engine
Google App Engine (GAE) is a Platform-as-a-Service cloud
computing model that supports many programming languages. GAE
is a scalable runtime environment mostly devoted to execute Web
applications