0% found this document useful (0 votes)

36 views

Part 02 - Big Data Solutions

The document discusses big data solutions and Hadoop. It describes how the traditional enterprise approach has limitations in processing large, scalable data. Google developed MapReduce to divide tasks across many computers. Hadoop, an open source project, was then created using MapReduce. Hadoop allows distributed processing of large datasets across computer clusters. It scales from single servers to thousands of machines and provides fault tolerance. Key components are HDFS for distributed storage, and MapReduce for parallel processing.

Uploaded by

Palak Garhwani

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

36 views

Part 02 - Big Data Solutions

Uploaded by

Palak Garhwani

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 17

Big Data Solutions

Tushar B. Kute,
http://tusharkute.com
Traditional Enterprise Approach
Traditional Enterprise Approach

Limitation
• This approach works fine with those
applications that process less voluminous data
that can be accommodated by standard
database servers, or up to the limit of the
processor that is processing the data.
• But when it comes to dealing with huge
amounts of scalable data, it is a hectic task to
process such data through a single database
bottleneck.
Google's Solution

• Google solved this problem using an

algorithm called MapReduce.
• This algorithm divides the task into small
parts and assigns them to many
computers, and collects the results from
them which when integrated, form the
result dataset.
Google's Solution – MapReduce
Hadoop

• Using the solution provided by Google,

Doug Cutting and his team developed an
Open Source Project called HADOOP.
• Hadoop runs applications using the
MapReduce algorithm, where the data is
processed in parallel with others.
• In short, Hadoop is used to develop
applications that could perform complete
statistical analysis on huge amounts of data.
Hadoop
What is Hadoop ?

• Hadoop is an Apache open source framework

written in Java that allows distributed
processing of large datasets across clusters of
computers using simple programming models.
• The Hadoop framework application works in an
environment that provides distributed storage
and computation across clusters of computers.
• Hadoop is designed to scale up from single
server to thousands of machines, each offering
local computation and storage.
Hadoop Architecture
What is MapReduce?

• MapReduce is a parallel programming

model for writing distributed applications
devised at Google for efficient processing
of large amounts of data (multi-terabyte
data-sets), on large clusters (thousands of
nodes) of commodity hardware in a reliable,
fault-tolerant manner.
• The MapReduce program runs on Hadoop
which is an Apache open-source framework.
Hadoop Distributed File System

• The Hadoop Distributed File System (HDFS) is based

on the Google File System (GFS) and provides a
distributed file system that is designed to run on
commodity hardware.
• It has many similarities with existing distributed file
systems. However, the differences from other
distributed file systems are significant.
• It is highly fault-tolerant and is designed to be
deployed on low-cost hardware.
• It provides high throughput access to application data
and is suitable for applications having large datasets.
Hadoop Distributed File System

• Apart from the above-mentioned two

core components, Hadoop framework
also includes the following two modules:
– Hadoop Common: These are Java libraries
and utilities required by other Hadoop
modules.
– Hadoop YARN: This is a framework for job
scheduling and cluster resource
management.
What is Hadoop ?

• Hadoop is an Apache open source framework

• It is quite expensive to build bigger servers with heavy

configurations that handle large scale processing, but
as an alternative, you can tie together many
commodity computers with single-CPU, as a single
functional distributed system and practically, the
clustered machines can read the dataset in parallel and
provide a much higher throughput.
• Moreover, it is cheaper than one high-end server. So
this is the first motivational factor behind using
Hadoop that it runs across clustered and low-cost
machines.
How does Hadoop work?

• Hadoop runs code across a cluster of computers. This process

includes the following core tasks that Hadoop performs:
– Data is initially divided into directories and files. Files are divided into
uniform sized blocks of 128M and 64M (preferably 128M).
– These files are then distributed across various cluster nodes for
further processing.
– HDFS, being on top of the local file system, supervises the processing.
– Blocks are replicated for handling hardware failure.
– Checking that the code was executed successfully.
– Performing the sort that takes place between the map and reduce
stages.
– Sending the sorted data to a certain computer.
– Writing the debugging logs for each job.
Advantages of Hadoop

• Hadoop framework allows the user to quickly write and test

distributed systems. It is efficient, and it automatic distributes
the data and work across the machines and in turn, utilizes the
underlying parallelism of the CPU cores.
• Hadoop does not rely on hardware to provide fault-tolerance
and high availability (FTHA), rather Hadoop library itself has
been designed to detect and handle failures at the application
layer.
• Servers can be added or removed from the cluster dynamically
and Hadoop continues to operate without interruption.
• Another big advantage of Hadoop is that apart from being
open source, it is compatible on all the platforms since it is Java
based.
Thank you
This presentation is created using LibreOffice Impress 4.2.7.2, can be used freely as per GNU General Public License

Web Resources Blogs

http://mitu.co.in http://digitallocha.blogspot.in
http://tusharkute.com http://kyamputar.blogspot.in

[email protected]

Chapter 3 Hadoop
No ratings yet
Chapter 3 Hadoop
10 pages
Unit 4 Hadoop
No ratings yet
Unit 4 Hadoop
31 pages
hadoop Introduction
No ratings yet
hadoop Introduction
2 pages
Hadoop Lab
100% (1)
Hadoop Lab
32 pages
Hadoop
No ratings yet
Hadoop
14 pages
Hadoop Introduction PDF
No ratings yet
Hadoop Introduction PDF
3 pages
shawn
No ratings yet
shawn
4 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
Report On An Exploratory Analysis of The
No ratings yet
Report On An Exploratory Analysis of The
19 pages
HADOOP
No ratings yet
HADOOP
18 pages
CC Unit - 5
No ratings yet
CC Unit - 5
27 pages
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
No ratings yet
A New Way To Store and Analyze Data: Presented By:: Harsha Jain
20 pages
Unit 2-1
No ratings yet
Unit 2-1
43 pages
Hadoop
No ratings yet
Hadoop
11 pages
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
No ratings yet
Hadoop Presentation: Swarnali B.SC Computer Science Hons. 2 Year Chandernagore Govt. College Halder
8 pages
Hadoop 10
No ratings yet
Hadoop 10
8 pages
BDA Manual
No ratings yet
BDA Manual
57 pages
Big Data RAJNEESH CCC
No ratings yet
Big Data RAJNEESH CCC
11 pages
Hadoop
No ratings yet
Hadoop
13 pages
Bda Unit 4 Material
No ratings yet
Bda Unit 4 Material
37 pages
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
No ratings yet
Big Data ABHISHEK PRAJA C CCCCCCCCCCC
11 pages
Unit-2 Hadoop and MapReduce
No ratings yet
Unit-2 Hadoop and MapReduce
32 pages
Big Data – Introduction to Hadoop
No ratings yet
Big Data – Introduction to Hadoop
61 pages
Hadoop Important Lecture
No ratings yet
Hadoop Important Lecture
38 pages
Hadoop
No ratings yet
Hadoop
5 pages
BD - HadoopEcoSystem Unit 2part 1
No ratings yet
BD - HadoopEcoSystem Unit 2part 1
12 pages
Unit II Big Data
No ratings yet
Unit II Big Data
27 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
No ratings yet
Introduction: Hadoop's History and Advantages 2. Architecture in Detail 3. Hadoop in Industry
53 pages
Chapter - 2 Hadoop
No ratings yet
Chapter - 2 Hadoop
32 pages
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
No ratings yet
Exploring Bigdata With Hadoop: Dr.A.Bazila Banu Associate Professor Department of Cse
23 pages
BIG Data_Unit_2
No ratings yet
BIG Data_Unit_2
24 pages
Big Data - Hadoop
No ratings yet
Big Data - Hadoop
20 pages
Hadoop, A Distributed Framework For Big Data
No ratings yet
Hadoop, A Distributed Framework For Big Data
55 pages
2 Hadoop
No ratings yet
2 Hadoop
20 pages
Apache Hadoop
No ratings yet
Apache Hadoop
27 pages
CC-Unit 3
No ratings yet
CC-Unit 3
22 pages
UNIT II
No ratings yet
UNIT II
30 pages
INTRODUCTION TO DATA SCIENCE
No ratings yet
INTRODUCTION TO DATA SCIENCE
14 pages
CC unit5
No ratings yet
CC unit5
27 pages
Module 2.1
No ratings yet
Module 2.1
21 pages
Big Data?: Hadoop?
No ratings yet
Big Data?: Hadoop?
2 pages
Big Data - Unit 2 Hadoop Framework
100% (1)
Big Data - Unit 2 Hadoop Framework
19 pages
HADOOP
No ratings yet
HADOOP
55 pages
Assignment 10
No ratings yet
Assignment 10
5 pages
CC-KML051-Unit V
No ratings yet
CC-KML051-Unit V
17 pages
Hadoop-How It Works
No ratings yet
Hadoop-How It Works
5 pages
Module 1 Part 7 Bigdata Technology
No ratings yet
Module 1 Part 7 Bigdata Technology
8 pages
Module-2
No ratings yet
Module-2
23 pages
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
No ratings yet
Class: CS 237 Distributed Systems Middleware Instructor: Nalini Venkatasubramanian
55 pages
UNIT 3-1
No ratings yet
UNIT 3-1
14 pages
HADOOP
No ratings yet
HADOOP
10 pages
Cloud Computing
No ratings yet
Cloud Computing
19 pages
BDA Unit 2
No ratings yet
BDA Unit 2
39 pages
Hadoop Intro
No ratings yet
Hadoop Intro
25 pages
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
100% (1)
HADOOP and PYTHON For BEGINNERS - 2 BOOKS in 1 - Learn Coding Fast! HADOOP and PYTHON Crash Course, A QuickStart Guide, Tutorial Book by Program Examples, in Easy Steps!
89 pages
Large-Scale Data Analytics: Traditional Database Systems
No ratings yet
Large-Scale Data Analytics: Traditional Database Systems
11 pages
Big Data Module 2
No ratings yet
Big Data Module 2
23 pages
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
From Everand
Exploring Hadoop Ecosystem (Volume 1): Batch Processing
Wei Liu
No ratings yet
Hadoop Ecosystem for Big Data
From Everand
Hadoop Ecosystem for Big Data
Dr. Zemelak Goraga
No ratings yet
Rebecca Adams: Professional Experience
No ratings yet
Rebecca Adams: Professional Experience
1 page
Part 01 - Overview of Big Data
No ratings yet
Part 01 - Overview of Big Data
11 pages
Architectures of Big Data
No ratings yet
Architectures of Big Data
27 pages
Resume 9
No ratings yet
Resume 9
1 page
Sean Williams: Career Coach
No ratings yet
Sean Williams: Career Coach
2 pages
Palak Garhwani: Profile
No ratings yet
Palak Garhwani: Profile
2 pages
Resume 3
No ratings yet
Resume 3
1 page
Resume 4
No ratings yet
Resume 4
1 page
Resume 5
No ratings yet
Resume 5
1 page
Caroliyn L. Smith: Brand Manager
No ratings yet
Caroliyn L. Smith: Brand Manager
2 pages
Jeanine Rios: Graphic Designer
No ratings yet
Jeanine Rios: Graphic Designer
2 pages
Bizmanualz Human Resources Policies and Procedures Sample
No ratings yet
Bizmanualz Human Resources Policies and Procedures Sample
9 pages
Multi Stage Centrifugal Pump
100% (9)
Multi Stage Centrifugal Pump
72 pages
International Magnesium Association (IMA) - Current Members - Buyers' Guide
No ratings yet
International Magnesium Association (IMA) - Current Members - Buyers' Guide
56 pages
Accounting Assignment
No ratings yet
Accounting Assignment
2 pages
Coade C 114 Safaa Mashhour Autosaved
No ratings yet
Coade C 114 Safaa Mashhour Autosaved
95 pages
Learning Activity 1: Evidence: Unforgettable Restaurant
No ratings yet
Learning Activity 1: Evidence: Unforgettable Restaurant
12 pages
PHD Thesis University of Surrey
100% (3)
PHD Thesis University of Surrey
4 pages
Rushikesh Yadgirikar
No ratings yet
Rushikesh Yadgirikar
1 page
Operating System Marking Guide
No ratings yet
Operating System Marking Guide
9 pages
Idaho Virtual Academy Testimony Regarding Stride/K-12
No ratings yet
Idaho Virtual Academy Testimony Regarding Stride/K-12
25 pages
Andy Me Crisis Transformation on the Lean Journey Second edition Pascal Dennis - Instantly access the full ebook content in just a few seconds
100% (2)
Andy Me Crisis Transformation on the Lean Journey Second edition Pascal Dennis - Instantly access the full ebook content in just a few seconds
61 pages
Trader Vic Methods of A Wall Street Master by Vict - 59ee77721723dd053039224d PDF
No ratings yet
Trader Vic Methods of A Wall Street Master by Vict - 59ee77721723dd053039224d PDF
3 pages
Unit 3 Airport Engineering
100% (1)
Unit 3 Airport Engineering
15 pages
Educational Leadership and Management: Lecturer: Ms. Janet Au Yeung
No ratings yet
Educational Leadership and Management: Lecturer: Ms. Janet Au Yeung
11 pages
SFPC 2006 Edition Application For NFPA 1126 Fireworks
No ratings yet
SFPC 2006 Edition Application For NFPA 1126 Fireworks
7 pages
Inspection Method Statemen2
100% (2)
Inspection Method Statemen2
8 pages
The Morning Calm Korea Weekly - August 8, 2008
No ratings yet
The Morning Calm Korea Weekly - August 8, 2008
24 pages
AccuQuilt Unveils AccuQuiltable™ – an Integrated, Smart Project Design Software for Quilters
No ratings yet
AccuQuilt Unveils AccuQuiltable™ – an Integrated, Smart Project Design Software for Quilters
4 pages
Installing Ubuntu on VMware Workstation [Easiest Guide] - LinuxSimply
No ratings yet
Installing Ubuntu on VMware Workstation [Easiest Guide] - LinuxSimply
37 pages
Inverse Power Method
No ratings yet
Inverse Power Method
16 pages
Week 1 Day 4
No ratings yet
Week 1 Day 4
5 pages
Almarai
No ratings yet
Almarai
10 pages
EMV
No ratings yet
EMV
13 pages
The Physician Assistant Student's Guide to the Clinical Year Emergency Medicine With Free Online Access! - 1st Edition Full-Resolution Download
100% (3)
The Physician Assistant Student's Guide to the Clinical Year Emergency Medicine With Free Online Access! - 1st Edition Full-Resolution Download
14 pages
AKASH QE 3 RESUME
No ratings yet
AKASH QE 3 RESUME
2 pages
A Case Study On Identifying Software Development Lifecycle and Process Framework
No ratings yet
A Case Study On Identifying Software Development Lifecycle and Process Framework
5 pages
Guayaki Company Analysis
No ratings yet
Guayaki Company Analysis
14 pages
Understanding Leadership Traits Behavior and Attitudes
No ratings yet
Understanding Leadership Traits Behavior and Attitudes
4 pages
SMALL-SCALE WEAVING. UNIDO - ILO TECHNICAL MEMORANDUM NO. 3 (13729.en) PDF
No ratings yet
SMALL-SCALE WEAVING. UNIDO - ILO TECHNICAL MEMORANDUM NO. 3 (13729.en) PDF
146 pages
HPE Qualified Providers List
No ratings yet
HPE Qualified Providers List
18 pages

Part 02 - Big Data Solutions

Uploaded by

Part 02 - Big Data Solutions

Uploaded by

Big Data Solutions

• Google solved this problem using an

• Using the solution provided by Google,

• Hadoop is an Apache open source framework

• MapReduce is a parallel programming

• The Hadoop Distributed File System (HDFS) is based

• Apart from the above-mentioned two

• Hadoop is an Apache open source framework

• It is quite expensive to build bigger servers with heavy

• Hadoop runs code across a cluster of computers. This process

• Hadoop framework allows the user to quickly write and test

Web Resources Blogs

You might also like