Big Data Management
Big Data Management
MANAGEMENT
1. WHAT IS BIG DATA?
When does data become BIG?
2,500,000,000,000,000,000
3
What is Big Data?
Volume : Size
Veracity Value
8
3. MANAGEMENT
FRAMEWORKS / TOOLS
11
STORAGE - HADOOP
12
PROCESSING – HADOOP
13
HADOOP - HDFS
400 MB
Infrastructure
Master Node
TASK A
15
RESULT = A1 + A2 + A3 + A4 = OUTPUT_ TASK A
HADOOP – MAP REDUCE PROCESSING/PROGRAMMING
SHUFFLE &
INPUT SPLIT MAP REDUCE
SORT
Malaysia, Saudi Malaysia, 1 Malaysia, 1
Arabia, Comoros Saudi Arabia, 1 Malaysia, 1
Comoros, 1
Malaysia, Saudi Bangladesh, Bangladesh, 1 Saudi Arabia, 1
Arabia, Comoros. Algeria, Malaysia Algeria, 1 Saudi Arabia, 1 Malaysia, 2
Bangladesh, Malaysia, 1 Saudi Arabia, 2
Algeria, Malaysia. Comoros, 2
Comoros. Comoros Comoros, 1 Comoros, 1 Bangladesh, 1
Algeria, Saudi Comoros, 1 Algeria, 2
Arabia Algeria, Saudi Algeria, 1 Bangladesh, 1
Arabia Saudi Arabia, 1
Algeria, 1
Algeria, 1 16
HADOOP - YARN
CLIENT B
Node Manager
YARN
CLIENT C App Master Container
Core Hadoop
Query Engines
19
CORE HADOOP
20
PIG [Procedural Language Platform]
A runtime engine
Compiler producing Sequences
Parsing, Validation & Compilation into a
sequence of MapReduce jobs. 22
Example
1. A = LOAD ‘myfile’
2. AS (x, y, z);
3. B = FILTER A by x > 0;
4. C = GROUP B BY x;
5. D = FOREACH A GENERATE
6. x, COUNT(B);
7. STORE D INTO ‘output’; 23
Data Model
Nested Model
24
HIVE
26
Data Flow
27
Data Modeling
Tables
Same as RDMS
Partitions
Partitioned tables of same data
connected by a key
Buckets
Smaller partitions for efficient querying
28
Example
29
APACHE AMBARI
Provisioning:
Monitoring:
Leverages Ambari Alert Framework for system alerting and will notify you
when your attention is needed (e.g., a node goes down, remaining disk space
is low, etc.)
31
Architecture
32
MESOS – Another Resource Negotiator
33
Example
34
MESOS vs YARN
MESOS YARN
Fault Tolerance
35
Security Trusted Entities Multiple Layers
SPARK
Runs Everywhere : Spark runs on Hadoop, Apache Mesos, Kubernetes, standalone, or in the 36
cloud. It can access diverse data sources
Architecture
Standalone
Mesos
YARN
Kubernetes
37
BARAKALLAH FEEKUM!
Any questions?
38