Big Data With Hadoop & Spark - Introduction
Big Data With Hadoop & Spark - Introduction
Big Data
with
Founder
Software Engineer
Sandeep Giri
Learn To Process
Big Data
With
Hadoop, Spark
&
Related Technologies
Learn by doing
Problem
Statement
Evaluatio
n Introduction to Hadoop &
[email protected]
My Courses
ETL
Extract Transform Load
1.Groups of networked
computers
2.Interact with each other
3.To achieve a common goal.
1.1259x10 ^15
600 TB
• Distributed Architecture
Needed
• Structured / Unstructured
Around 6 hours
Yes.
Most of the existing systems can’t handle it.
X =>
Application
Devices: Connectivity
Social Networks
Smart Phones Wifi, 4G, NFC,
4.6 billion mobile-phones. Internet of
GPS
1 - 2 billion people accessing the
internet. Things
1. CPU Speed
4. Network
Compute Engine
NoSQL
Datastore
Resource
Manager
File Storage
Introduction to Hadoop &
[email protected]
Apach
e• Really fast MapReduce
• 100x faster than Hadoop MapReduce in
memory,
• 10x faster on disk.
• Builds on similar paradigms as MapReduce
• Integrated with Hadoop
Spark Core - A fast and general engine for large-
scale data processing.
HDFS
HBase
Spark Jav Pytho Scal Languages
SQL R a n a
Hive
Dataframe MLLi
Streaming GraphX Libraries
s b
Tachyon
Spark Core
Cassandra
Resource/cluster managers