0% found this document useful (0 votes)

169 views

Big Data With Hadoop & Spark - Introduction

This document provides an introduction to a course on Big Data with Hadoop and Spark. The 3 hour course agenda includes an introduction to Big Data and architecture of Spark and Hadoop. The instructor's introduction notes that the session is being recorded and will share the recording and presentation. The course uses videos, quizzes, hands-on exercises, projects, and case studies to teach students how to process big data with Hadoop, Spark, and related technologies.

Uploaded by

Cit AssocDean Rosario

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

169 views

Big Data With Hadoop & Spark - Introduction

Uploaded by

Cit AssocDean Rosario

Available Formats

Download as PPTX, PDF, TXT or read online on Scribd

You are on page 1/ 42

Welcome to

Big Data
with

Hadoop & Spark

Please introduce yourself while others are

joining

Introduction to Hadoop &

[email protected]
Session 1 - Big Data with Hadoop & Spark
Duration: 3 hours
Agenda:
• Introduction to Big Data
• 10 mins. break
• Spark & Hadoop Architecture
Notes:
• Please introduce yourself using chat window while others are joining
• Session is being recorded & Recording & presentation will be shared
• This is Session 1 out of 18 sessions on Big Data with Hadoop & Spark
specialization.
• It suffices as an introduction to Big Data Technology Stack.
Asking Questions?
• Every one except Instructor is muted
• Please ask questions by typing in Q&A Window
• Instructor will read out the questions before answering
• To get better answers, keep your messages short and avoid chat language
Introduction to Hadoop &
[email protected]
Course Instructor

Founder

Loves Explaining Technologies

Software Engineer

Sandeep Giri

Worked On Large Scale Computing

Graduated from IIT Roorkee

Introduction to Hadoop &

[email protected]
Course Objective

Learn To Process
Big Data
With
Hadoop, Spark
&
Related Technologies

Introduction to Hadoop &

[email protected]
Course Structure

Videos Quizzes Hands-On Projects Case

Studies

Real Life Use Cases

Introduction to Hadoop &

[email protected]
Automated Hands-on Assessments

Learn by doing

Introduction to Hadoop &

[email protected]
Automated Hands-on Assessments

Problem Hands On Assessment

Statement

Introduction to Hadoop &

[email protected]
Automated Hands-on Assessments

Problem
Statement

Evaluatio
n Introduction to Hadoop &
[email protected]
My Courses

Introduction to Hadoop &

[email protected]
My Course List

Introduction to Hadoop &

[email protected]
Topics or PlayLists

Introduction to Hadoop &

[email protected]
Learning Item

Introduction to Hadoop &

[email protected]
Automated Hands-on Assessments

Click when you are done!

Introduction to Hadoop &
[email protected]
Data Variety

Introduction to Hadoop &

[email protected]
Data Variety

ETL
Extract Transform Load

Introduction to Hadoop &

[email protected]
Distributed Systems

1.Groups of networked
computers
2.Interact with each other
3.To achieve a common goal.

Introduction to Hadoop &

[email protected]
Question

How Many Bytes in One

Petabyte?

1.1259x10 ^15

Introduction to Hadoop &

[email protected]
Question

How Much Data Facebook Stores

in One Day?

600 TB

Introduction to Hadoop &

[email protected]
What is Big Data?

• Simply: Data of Very Big

Size

• Can’t process with usual

tools

• Distributed Architecture
Needed

• Structured / Unstructured

Introduction to Hadoop &

[email protected]
Characteristics of Big Data
VOLUME VELOCITY VARIETY
Data At Rest Data In Motion Data in Many Forms

Problems Involving the

handling of data coming at Problems involving
Problems related to complex data
storage of huge data fast rate.
e.g. Number of requests structures
reliably. e.g. Maps, Social
e.g. Storage of Logs of a being received by
Facebook, Youtube Graphs,
website, Storage of data Recommendations
by gmail. streaming, Google
FB: 300 PB. 600TB/ day Analytics

Introduction to Hadoop &

[email protected]
Characteristics of Big Data - Variety

Problems involving complex data structures

e.g. Maps, Social Graphs, Recommendations
Introduction to Hadoop &
[email protected]
Question

Time taken to read 1 TB from

HDD?

Around 6 hours

Introduction to Hadoop &

[email protected]
Is One PetaByte Big Data?

If you have to count just vowels in 1

Petabyte data everyday, do you need
distributed system?

Introduction to Hadoop &

[email protected]
Is One PetaByte Big Data?

Yes.
Most of the existing systems can’t handle it.

Introduction to Hadoop &

[email protected]
Why Big Data?

Introduction to Hadoop &

[email protected]
Why is It Important Now?

X =>
Application
Devices: Connectivity
Social Networks
Smart Phones Wifi, 4G, NFC,
4.6 billion mobile-phones. Internet of
GPS
1 - 2 billion people accessing the
internet. Things

The devices became cheaper, faster and smaller.

The connectivity improved. Result: Many Applications
Introduction to Hadoop &
[email protected]
Computing Components
To process & store data
we need

1. CPU Speed

4. Network

2. RAM - Speed & 3. HDD or SSD

Size Disk Size + Speed
Introduction to Hadoop &
[email protected]
Which Components Impact the Speed
of Computing?
A. CPU
B. Memory Size
C. Memory Read Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above

Introduction to Hadoop &

[email protected]
Which Components Impact the Speed
of Computing?
A. CPU
B. Memory Size
C. Memory Read
Speed
D. Disk Speed
E. Disk Size
F. Network Speed
G. All of Above

Introduction to Hadoop &

[email protected]
Example Big Data Customers
1. Ecommerce - Recommendations

Introduction to Hadoop &

[email protected]
Example Big Data Customers
1. Ecommerce - Recommendations

Introduction to Hadoop &

[email protected]
Example Big Data Problems
Recommendations -
How?
MOVIE
USER ID RATING
ID

KUMAR matrix 4.0

KUMAR Ice age 3.5

USER ID MOVIE ID RATING
apocalyp
GIRI 3.6
se now
KUMAR apocalypse now 3.6
GIRI Ice age 3.5
GIRI matrix 4.0

Introduction to Hadoop &

[email protected]
Example Big Data Customers
2. Ecommerce - A/B Testing

Introduction to Hadoop &

[email protected]
Big Data Customers
Government
1.Fraud Detection
2.Cyber Security
Welfare
3.Justice Telecommunications
1.Customer Churn Prevention
2.Network Performance Optimization
3.Calling Data Record (CDR)
Analysis
4.Analyzing Network to Predict
Failure

Introduction to Hadoop &

[email protected]
Example Big Data Customers

Healthcare & Life Sciences

1.Health information
exchange
2.Gene sequencing
3.Healthcare improvements
4.Drug Safety
Introduction to Hadoop &
[email protected]
Big Data Solutions
1.Apache Hadoop
Apache Spark
2.Cassandra
3.MongoDB
4.Google Compute
Engine
5.AWS

Introduction to Hadoop &

[email protected]
What is Hadoop?

A. Created by Doug Cutting (of Yahoo)

B. Built for Nutch search engine project
C. Joined by Mike Cafarella
D. Based on GFS, GMR & Google Big Table
E. Named after Toy Elephant
F. Open Source - Apache
G. Powerful, Popular & Supported
H. Framework to handle Big Data
I. For distributed, scalable and reliable computing
J. Written in Java
Introduction to Hadoop &
[email protected]
WorkFlow
Components Spark Machin
SQL like e
interface learnin
g/
SQL Interface STATS

Compute Engine

NoSQL
Datastore

Resource
Manager

File Storage
Introduction to Hadoop &
[email protected]
Apach
e• Really fast MapReduce
• 100x faster than Hadoop MapReduce in
memory,
• 10x faster on disk.
• Builds on similar paradigms as MapReduce
• Integrated with Hadoop
Spark Core - A fast and general engine for large-
scale data processing.

Introduction to Hadoop &

[email protected]
Spark Architecture
Data Sources

HDFS

HBase
Spark Jav Pytho Scal Languages
SQL R a n a
Hive
Dataframe MLLi
Streaming GraphX Libraries
s b
Tachyon

Spark Core
Cassandra

Hadoop Amazon Standalon

Apache Mesos
YARN EC2 e

Resource/cluster managers

Introduction to Hadoop &

[email protected]
Thank you. For the full course please enroll at
https://cloudxlab.com/

Introduction to Hadoop &

[email protected]
For the full course please enroll at https://cloudxlab.com/

Introduction to Hadoop &

[email protected]

Databricks How To Data Import PDF
No ratings yet
Databricks How To Data Import PDF
16 pages
Hadoop Notes Unit2
No ratings yet
Hadoop Notes Unit2
24 pages
Learn Hive in 24 Hours
From Everand
Learn Hive in 24 Hours
Alex Nordeen
No ratings yet
Cloudera Administration Handbook
From Everand
Cloudera Administration Handbook
Rohit Menon
No ratings yet
Mastering Hadoop
From Everand
Mastering Hadoop
Sandeep Karanth
No ratings yet
HDInsight Essentials - Second Edition
From Everand
HDInsight Essentials - Second Edition
Rajesh Nadipalli
No ratings yet
Mastering Apache Cassandra - Second Edition
From Everand
Mastering Apache Cassandra - Second Edition
Nishant Neeraj
No ratings yet
Learn Hadoop in 24 Hours
From Everand
Learn Hadoop in 24 Hours
Alex Nordeen
No ratings yet
Hadoop and Java Ques - Ans
No ratings yet
Hadoop and Java Ques - Ans
222 pages
Learning Apache Spark With Python
No ratings yet
Learning Apache Spark With Python
10 pages
Big Data Masters Certification Learnbay
No ratings yet
Big Data Masters Certification Learnbay
12 pages
Unstructured Dataload Into Hive Database Through PySpark
No ratings yet
Unstructured Dataload Into Hive Database Through PySpark
9 pages
Midhun BIGDATA Curicullum
No ratings yet
Midhun BIGDATA Curicullum
17 pages
Cloud Dataproc Workflow Animation
No ratings yet
Cloud Dataproc Workflow Animation
2 pages
PTC Big Data Analysis With ApacheS 27.11-28.11.2019 Handout
No ratings yet
PTC Big Data Analysis With ApacheS 27.11-28.11.2019 Handout
48 pages
Apache Hive
No ratings yet
Apache Hive
3 pages
1.hadoop Admin Brochure
No ratings yet
1.hadoop Admin Brochure
11 pages
Bigdata Interview Preparation Guide
No ratings yet
Bigdata Interview Preparation Guide
292 pages
Sqoop Interview Questions
No ratings yet
Sqoop Interview Questions
6 pages
Spark With Python Notes
No ratings yet
Spark With Python Notes
206 pages
Hadoop Interview Question
No ratings yet
Hadoop Interview Question
25 pages
3 Mapreduce Notes
No ratings yet
3 Mapreduce Notes
25 pages
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
No ratings yet
Hadoop: Fasilkom/Pusilkom UI (Credit: Samuel Louvan)
44 pages
Hadoop Interview Guide
100% (1)
Hadoop Interview Guide
34 pages
Lecture 4 - Pair RDD and DataFrame
No ratings yet
Lecture 4 - Pair RDD and DataFrame
38 pages
Hadoop Big Data Administration
No ratings yet
Hadoop Big Data Administration
6 pages
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
No ratings yet
Deepshikha Agrawal Pushp B.Sc. (IT), MBA (IT) Certification-Hadoop, Spark, Scala, Python, Tableau, ML (Assistant Professor JLBS)
74 pages
Edureka - Scala Interview Questions
No ratings yet
Edureka - Scala Interview Questions
21 pages
Company Interview Question Bank
No ratings yet
Company Interview Question Bank
16 pages
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
No ratings yet
L02 - Spark SQL For Data Processing: CBG1C04 Big Data Programming
23 pages
Databricks Question
No ratings yet
Databricks Question
7 pages
Hadoop Hdfs Commands
No ratings yet
Hadoop Hdfs Commands
5 pages
100+ Hadoop Interview Questions From Interviews
No ratings yet
100+ Hadoop Interview Questions From Interviews
32 pages
Data Warehouse - What Is It
No ratings yet
Data Warehouse - What Is It
5 pages
Hadoop Security S360 2015v8 PDF
No ratings yet
Hadoop Security S360 2015v8 PDF
27 pages
Facebook Hive POC
No ratings yet
Facebook Hive POC
18 pages
Business Intelligence DW
No ratings yet
Business Intelligence DW
17 pages
AWS Certified Big Data Specialty Exam Dumps - Amazondumps - Us
100% (1)
AWS Certified Big Data Specialty Exam Dumps - Amazondumps - Us
5 pages
Parallel Programming With Spark: Matei Zaharia
No ratings yet
Parallel Programming With Spark: Matei Zaharia
40 pages
Snowflake Demo
No ratings yet
Snowflake Demo
13 pages
3 Lecture 3-ETL
100% (1)
3 Lecture 3-ETL
42 pages
Rahul Sharma
100% (1)
Rahul Sharma
2 pages
Distributed Database Systems: - Spark I
No ratings yet
Distributed Database Systems: - Spark I
59 pages
Talend Open Studio For Data Integration: User Guide
No ratings yet
Talend Open Studio For Data Integration: User Guide
452 pages
BK Hdfs Administration
No ratings yet
BK Hdfs Administration
73 pages
Machine Learning Spark ML
No ratings yet
Machine Learning Spark ML
11 pages
Azure Cloud Intro
No ratings yet
Azure Cloud Intro
34 pages
Certification
No ratings yet
Certification
16 pages
AWS Developer Associate Certification Exam: Description Priority Type Cost
No ratings yet
AWS Developer Associate Certification Exam: Description Priority Type Cost
2 pages
Spark Training in Bangalore
No ratings yet
Spark Training in Bangalore
36 pages
Aws Three Practical Use Cases With Databricks Ebook v5 101221
No ratings yet
Aws Three Practical Use Cases With Databricks Ebook v5 101221
34 pages
Hadoop and Mapreduce
No ratings yet
Hadoop and Mapreduce
21 pages
DataStage Faq S
No ratings yet
DataStage Faq S
57 pages
Sqoop User Guide
No ratings yet
Sqoop User Guide
58 pages
049 Hadoop Commands Reference Guide.
No ratings yet
049 Hadoop Commands Reference Guide.
3 pages
Pyspark Learning Hub
No ratings yet
Pyspark Learning Hub
7 pages
BD - Spark - Baladasu A - SightSpectrum
No ratings yet
BD - Spark - Baladasu A - SightSpectrum
3 pages
Best Practices of Apache Airflow
No ratings yet
Best Practices of Apache Airflow
3 pages
DATA ANALYTICS Lab
No ratings yet
DATA ANALYTICS Lab
3 pages
Hadoop Interview Questions
No ratings yet
Hadoop Interview Questions
28 pages
CSWi20Qr2 - Lenovo Smart Performance v062624 (2)
No ratings yet
CSWi20Qr2 - Lenovo Smart Performance v062624 (2)
36 pages
PACS-IW Site Config Tool
No ratings yet
PACS-IW Site Config Tool
51 pages
React JS: Notes For Professionals
No ratings yet
React JS: Notes For Professionals
110 pages
كتابة دوال الادخال و الاخراج
No ratings yet
كتابة دوال الادخال و الاخراج
135 pages
Configuring A Squid Server To Authenticate Off Active Directory
No ratings yet
Configuring A Squid Server To Authenticate Off Active Directory
6 pages
Forcepoint Cloud Security Administrator Virtual Instructor-Led Training
No ratings yet
Forcepoint Cloud Security Administrator Virtual Instructor-Led Training
4 pages
ICITSS PROGRAMME Presentation
No ratings yet
ICITSS PROGRAMME Presentation
38 pages
Yash Amity Computer Academy
No ratings yet
Yash Amity Computer Academy
6 pages
How To Add A System Call in Linux Kernel
No ratings yet
How To Add A System Call in Linux Kernel
19 pages
NMM NSRSQLSV Syntax For An SQL Server Cluster Instance
No ratings yet
NMM NSRSQLSV Syntax For An SQL Server Cluster Instance
3 pages
Ichuso Basic Tool in Learning Lumion
No ratings yet
Ichuso Basic Tool in Learning Lumion
14 pages
Hands On Exercise
No ratings yet
Hands On Exercise
7 pages
MOBILE-PHONE-ART
No ratings yet
MOBILE-PHONE-ART
19 pages
XYZ Mesh Manual PDF
No ratings yet
XYZ Mesh Manual PDF
9 pages
Certification Guide
0% (1)
Certification Guide
27 pages
FileMaker Pro Advanced 11 Development Guide
No ratings yet
FileMaker Pro Advanced 11 Development Guide
52 pages
Ibrahem Emad Abd El-Tawab: - AGC Academy For Programming - It Mcse - Ccna - Ccna Security - Itil
No ratings yet
Ibrahem Emad Abd El-Tawab: - AGC Academy For Programming - It Mcse - Ccna - Ccna Security - Itil
1 page
Ccna Security Ch7 Implementing Aaa Using Ios ACS Server
No ratings yet
Ccna Security Ch7 Implementing Aaa Using Ios ACS Server
28 pages
Competency Standards - E-Commerce Final PDF
No ratings yet
Competency Standards - E-Commerce Final PDF
23 pages
Computational Techniques For Fluid Dynamics: Printed Book
No ratings yet
Computational Techniques For Fluid Dynamics: Printed Book
1 page
Sharp MFP How To Print Out and Clear The Copy Counts in The User Account Control System
No ratings yet
Sharp MFP How To Print Out and Clear The Copy Counts in The User Account Control System
9 pages
Django
No ratings yet
Django
23 pages
DLS Release Notes
No ratings yet
DLS Release Notes
25 pages
Using Pmlnet: 12 Series
No ratings yet
Using Pmlnet: 12 Series
7 pages
Course Syllabus Penetration Testing and Ethical Hacking: Brought To You by
No ratings yet
Course Syllabus Penetration Testing and Ethical Hacking: Brought To You by
7 pages
Erd For Guest Wifi: Form - Requester Users
No ratings yet
Erd For Guest Wifi: Form - Requester Users
2 pages
IISc User Manual CCMD Plant Maintenance V4.0 26.12.2020
No ratings yet
IISc User Manual CCMD Plant Maintenance V4.0 26.12.2020
44 pages
How To Download and Install RHEL8 For Free (Red Hat Enterprise Linux)
No ratings yet
How To Download and Install RHEL8 For Free (Red Hat Enterprise Linux)
12 pages
Ansys User Guide PDF
No ratings yet
Ansys User Guide PDF
33 pages
Super-Magic-Eraser-User-Guide-Eng
No ratings yet
Super-Magic-Eraser-User-Guide-Eng
8 pages

Big Data With Hadoop & Spark - Introduction

Uploaded by

Big Data With Hadoop & Spark - Introduction

Uploaded by

Welcome to

Hadoop & Spark

Please introduce yourself while others are

Introduction to Hadoop &

Loves Explaining Technologies

Worked On Large Scale Computing

Graduated from IIT Roorkee

Introduction to Hadoop &

Introduction to Hadoop &

Videos Quizzes Hands-On Projects Case

Real Life Use Cases

Introduction to Hadoop &

Introduction to Hadoop &

Problem Hands On Assessment

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Click when you are done!

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

How Many Bytes in One

Introduction to Hadoop &

How Much Data Facebook Stores

Introduction to Hadoop &

• Simply: Data of Very Big

• Can’t process with usual

Introduction to Hadoop &

Problems Involving the

Introduction to Hadoop &

Problems involving complex data structures

Time taken to read 1 TB from

Introduction to Hadoop &

If you have to count just vowels in 1

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

The devices became cheaper, faster and smaller.

2. RAM - Speed & 3. HDD or SSD

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

KUMAR matrix 4.0

KUMAR Ice age 3.5

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

Healthcare & Life Sciences

Introduction to Hadoop &

A. Created by Doug Cutting (of Yahoo)

Introduction to Hadoop &

Hadoop Amazon Standalon

Introduction to Hadoop &

Introduction to Hadoop &

Introduction to Hadoop &

You might also like