0% found this document useful (0 votes)

101 views38 pages

Hadoop Week 6

The document outlines the course topics for an 8-week training on Hadoop and its ecosystem. Week 1 covers HDFS introduction, week 2 covers setting up Hadoop clusters, week 3 covers MapReduce basics and types/formats, week 4 covers Pig, week 5 covers Hive, week 6 covers HBase, week 7 covers Zookeeper, and week 8 covers Sqoop.

Uploaded by

Rahul Kolluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

101 views38 pages

Hadoop Week 6

Uploaded by

Rahul Kolluri

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 38

Course Topics

 Week 1  Week 5
– Introduction to HDFS – HIVE

 Week 2  Week 6
– Setting Up Hadoop Cluster – HBASE

 Week 3  Week 7
– Map-Reduce Basics, types and formats – ZOOKEEPER

 Week 4  Week 8
– PIG – SQOOP
What are we going to learn Today..!

• Problems in the real world

• Traditional RDBMS fallacies
• The advent of HBase
• HBase Architecture
• Hands-on creation, updation of HBase table on shell
• Multiple ways of loading data into HBase (Shell, Jvm-Client, MapReduce, Avro, Thrift,
REST Api)
Problem in real world
Linkedin
Revolutionizing Education
Add Targeting
So, what is common?

• Huge Data
• Fast Random access
• Structured Data
• Variable Schema
• Need of Compression
• Need of Distribution (Sharding)
How Traditional RDBMS will solve

Users Follower
Id User_id
Name Follower_id
Sex type
age
Contd.

Users Connections
Id User_id
Name Connection_id
Sex type
age
Characteristics Of Probable Solution

• Distributed database
• Sorted data
• Sparse data store
• Automatic sharding
History of HBase

2006 Big Table paper published

2006 HBase development starts

2008 Microsoft buys powerset

2010 Facebook’s messaging system

Facebook Messaging System

• Facebook monitored their usage and figured out what the really
needed.

• What they needed was a system that could handle two types of data
patterns:
– A short set of temporal data that tends to be volatile
– An ever-growing set of data that rarely gets accessed

real-time, distributed, linearly scalable, robust, BigData, open-source, key-value, column-oriented

HBase Definition

HBase is a key/value store. Specifically it is a

Sparse, Consistent, Distributed, Multidimensional, Sorted map.
More HBase Implementation
uses HBase to power their Messages
A number of applications including infrastructure
people search rely on HBase internally http://sites.computer.org\debull\A12june\facebo
for data generation. ok.pdf

uses HBase to store document fingerprint for

We use HBase as a real time data detecting near-duplications. We have a
storage and analytics platform. cluster of few nodes that runs HDFS,
mapreduce, and HBase.

uses an HBase cluster containing

uses HBase as a foundation for cloud scale
over a billion anonymized clinical
storage for a variety of applications.
records.

Referred - http://wiki.apache.org/hadoop/Hbase/PoweredBy
Data Model
Versions Of Data

Row key Personal_data demographic

Persons ID Name Address Birth Date Gender
1 Harry BTM layout 1988-10-31 M

2 Dhawan 1956-09-16 M
3 Sana whitefield 1189-12-03 F
….. ….. ….. ….. …..
500,000,000 vineet delhi 1964-01-07 M
Physical storage
Col3(Birth date) ->
1926-10-31
Col1(address) ->Budapest
Row 1(1)
Col3(Gender) -> M
Col1(Name) -> H. Houdini
Col3(Birth date) ->
Row 2 (2) val3

Col5(address) -> D. Copper

Row 3 (3) Col4(Gender) -> val4

Family1(personal data) Family2(Demographic)

How does It look like?

What it means?

Column Family:
Row Key Values
Column Qualifier

• Unique for each row • Less number of families • Various versions of values
• Identifies each row gives faster access are maintained
• Families are fixed column • Scan shows only recent
qualifiers are not version
Three Major Components
Data Distribution
Row
s
Logical View – All rows in a
Region
A1
Null -> A3
A2
A22 Region
A3 A3 -> F34
…..
…..
K4 Region
….. F34 -> K80
….. Region
090 K80 -> 095
table

….. Region
….. 095 -> Null
…..
Z30
Z55 Region Region Region
Server Server Server
HBase Components

Zookeeper

Master /hbase/region
1
/hbase/region
2
…..
RegionServers
…..
memstore
/hbase/region

HDFS HFile WAL

HBase Components

• Table made of regions

• Region – a range of rows stored together
- Single Shard, used for scaling
- Dynamically merge if too big
- merge if too small
• Region servers- serves one or more regions
- A region is served by only one region
• Master server – demon responsible for managing HBase cluster
• HBase stores its data into HDFS
- relies on HDFS’s High Availability and fault tolerance
HBase Storage Architecture
Hbase Storage Simpler

Zookeeper Zookeeper Zookeeper

HBase HDFS
HDFS SNN
Master Namenode
Management Management Management
Node Node Node

Zookeeper Zookeeper Scale Zookeeper

HBase HBase Horizontally HBase
Region Server Region Server N Machines Region Server

Data Node Data Node

Data Node
Different Types of regions
Root/Meta Table

Each row in the ROOT and META tables is approximately 1KB in size.
At the default size of 256MB.
Compactions
Compactions

Time Column
Row key Column “anchor:”
Stamp “contents:”
t12 “<html>…”
t11 “<html>…”
“com.apache.www”
t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6 “<html>…”
t5 “<html>…”
t3 “<html>…”
Hstore1
Region Split
Region Splits

Time Column
Row key Column “anchor:”
Stamp “contents:”
t12 “<html>…”

“com.apache.www” t11 “<html>…”

t10 “anchor:apache.com” “APACHE”

t9 “anchor:cnnsi.com” “CNN”

t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www”
t6 “<html>…”
t5 “<html>…”
t3 “<html>…”

Hstore1
HBase Client API
HBase Client API
Scanner and Filters
Search

Get value from table where key=„com.apache.www‟ AND label=„anchor:apache.com‟

Time
Row key Column “anchor:”
Stamp
t12
“com.apache.www” t11
t10 “anchor:apache.com” “APACHE”
t9 “anchor:cnnsi.com” “CNN”
t8 “anchor:my.look.ca” “CNN.com”
“com.cnn.www” t6
t5
t3
Search

Scanner Select value from table

where, anchor=„cnnsi.com‟

• get(row)
• put(row,Map<column,value>)
• scan(key range, filter)
• increment(row, columns)
• Check and Put, delete etc.
Hbase Shell

• hbase(main):003:0> create 'test', 'cf'

0 row(s) in 1.2200 seconds
• hbase(main):004:0> put 'test', 'row1', 'cf:a', 'value1'
0 row(s) in 0.0560 seconds
• hbase(main):005:0> put 'test', 'row2', 'cf:b', 'value2'
0 row(s) in 0.0370 seconds
• hbase(main):006:0> put 'test', 'row3', 'cf:c', 'value3'
0 row(s) in 0.0450 seconds
Hbase Shell Contd.

• hbase(main):007:0> scan 'test'

ROW COLUMN+CELL
• row1 column=cf:a, timestamp=1288380727188,
value=value1
• row2 column=cf:b, timestamp=1288380738440,
value=value2
• row3 column=cf:c, timestamp=1288380747365,
value=value3
3 row(s) in 0.0590 seconds
Thank You
See You in Class Next Week

Hadoop HBASE
No ratings yet
Hadoop HBASE
71 pages
Ba Iift 17-18
No ratings yet
Ba Iift 17-18
40 pages
HBase: Data Management & Architecture
No ratings yet
HBase: Data Management & Architecture
36 pages
Unit 1 P2 HBase
No ratings yet
Unit 1 P2 HBase
22 pages
HBASE
No ratings yet
HBASE
18 pages
HBase
No ratings yet
HBase
31 pages
Hbase - Quick Guide Hbase - Overview
No ratings yet
Hbase - Quick Guide Hbase - Overview
53 pages
Apache HBase Tutorial & Setup Guide
No ratings yet
Apache HBase Tutorial & Setup Guide
19 pages
HBase - Tutorial
No ratings yet
HBase - Tutorial
14 pages
Lesson 6 NoSQL Databases HBase
100% (1)
Lesson 6 NoSQL Databases HBase
47 pages
HBase (Unit 4)
No ratings yet
HBase (Unit 4)
37 pages
HBase
No ratings yet
HBase
6 pages
Unit 5 Big Data
No ratings yet
Unit 5 Big Data
34 pages
HBase
No ratings yet
HBase
39 pages
Chapter 12 HBase
No ratings yet
Chapter 12 HBase
108 pages
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
No ratings yet
NoteGPT - What Is HBase - HBase Architecture - HBase Tutorial For Beginners - Hadoop Tutorial - Simplilearn
5 pages
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
No ratings yet
Hbase - in Detail: Pushpinder Singh Paxcel Technologies
32 pages
10 HBase
No ratings yet
10 HBase
13 pages
Big Data Analytics Unit-5
No ratings yet
Big Data Analytics Unit-5
28 pages
Bda - Unit 5
No ratings yet
Bda - Unit 5
30 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
HBase
No ratings yet
HBase
38 pages
Wa0005.
No ratings yet
Wa0005.
53 pages
HBASE
No ratings yet
HBASE
11 pages
HBase Architecture & Features Guide
No ratings yet
HBase Architecture & Features Guide
35 pages
9 HBase
No ratings yet
9 HBase
77 pages
Unit 5 Lecture No-3 (Hbase)
No ratings yet
Unit 5 Lecture No-3 (Hbase)
35 pages
Unit 5 BDA
No ratings yet
Unit 5 BDA
34 pages
Apache HBase
No ratings yet
Apache HBase
12 pages
Hbase
100% (1)
Hbase
30 pages
BDA Unit 5
No ratings yet
BDA Unit 5
33 pages
HBase Architecture and Its Important Components
No ratings yet
HBase Architecture and Its Important Components
11 pages
UNIT5
No ratings yet
UNIT5
42 pages
Big Data UNIT 5 Own
No ratings yet
Big Data UNIT 5 Own
18 pages
Adobe Scan 06-Aug-2025
No ratings yet
Adobe Scan 06-Aug-2025
9 pages
NoSQL Databases for Tech Enthusiasts
No ratings yet
NoSQL Databases for Tech Enthusiasts
74 pages
HBase
No ratings yet
HBase
27 pages
Lec 18
No ratings yet
Lec 18
18 pages
Unit 5 Hbase
No ratings yet
Unit 5 Hbase
15 pages
HBase NoSQL Database Overview
No ratings yet
HBase NoSQL Database Overview
9 pages
BDA Unit 5 HIVE HBASE
No ratings yet
BDA Unit 5 HIVE HBASE
33 pages
Cs525: Special Topics in DBS: Large-Scale Data Management
No ratings yet
Cs525: Special Topics in DBS: Large-Scale Data Management
35 pages
Bda Unit 5
No ratings yet
Bda Unit 5
16 pages
Big Data Analytics & Technologies: Hbase
No ratings yet
Big Data Analytics & Technologies: Hbase
30 pages
Module 05 HBase - Distributed NoSQL Database
No ratings yet
Module 05 HBase - Distributed NoSQL Database
54 pages
BDA Unit-5
No ratings yet
BDA Unit-5
31 pages
Lec 18
No ratings yet
Lec 18
21 pages
BDA1
No ratings yet
BDA1
42 pages
Unit - 5 Part - 1
No ratings yet
Unit - 5 Part - 1
8 pages
DEV3200 Slide Guide
No ratings yet
DEV3200 Slide Guide
800 pages
Hbase in Practice
No ratings yet
Hbase in Practice
46 pages
Unit - IV - Notes
No ratings yet
Unit - IV - Notes
23 pages
Chapter03HBase DistributedDatabase&Hive
No ratings yet
Chapter03HBase DistributedDatabase&Hive
54 pages
BDT Unit - V
No ratings yet
BDT Unit - V
15 pages
4 4HBase
No ratings yet
4 4HBase
17 pages
Hbase
No ratings yet
Hbase
3 pages
Big Data 22MSM40206
No ratings yet
Big Data 22MSM40206
9 pages
Unit 5 Bda
No ratings yet
Unit 5 Bda
42 pages
Hadoop Week 3
No ratings yet
Hadoop Week 3
60 pages
Azure Migrate
No ratings yet
Azure Migrate
1,264 pages
S&J Company Profile PPT - CPP
No ratings yet
S&J Company Profile PPT - CPP
28 pages
CPG MinIO Implementation Guide
100% (1)
CPG MinIO Implementation Guide
14 pages
Integration of SM
No ratings yet
Integration of SM
9 pages
Advertising Research
No ratings yet
Advertising Research
42 pages
Hostile Takeover Defenses: by Group G.Abhinay: 75 RAHUL.K: 37 THARUN.K:12 MUKESH.T: 65 Triveni.K: 15
100% (1)
Hostile Takeover Defenses: by Group G.Abhinay: 75 RAHUL.K: 37 THARUN.K:12 MUKESH.T: 65 Triveni.K: 15
20 pages
Hadoop Questions and Answers Part 100
No ratings yet
Hadoop Questions and Answers Part 100
34 pages
Spatial Data Editing
No ratings yet
Spatial Data Editing
53 pages
SEO Audit & Analysis - Vibe Education
No ratings yet
SEO Audit & Analysis - Vibe Education
17 pages
Unc 100
No ratings yet
Unc 100
20 pages
Uformer: U-Shaped Transformer for Image Restoration
No ratings yet
Uformer: U-Shaped Transformer for Image Restoration
17 pages
RA Programming Language of Robots
No ratings yet
RA Programming Language of Robots
9 pages
Low-Power Low-Latency BCH Decoders For Energy-Efficient Optical Interconnects
No ratings yet
Low-Power Low-Latency BCH Decoders For Energy-Efficient Optical Interconnects
7 pages
Black and White Modern Technology Presentation
No ratings yet
Black and White Modern Technology Presentation
14 pages
Web Development Basics
No ratings yet
Web Development Basics
23 pages
Principles of Oops.: Procedure Oriented Programming
No ratings yet
Principles of Oops.: Procedure Oriented Programming
34 pages
Invocable Methods
No ratings yet
Invocable Methods
3 pages
Visual Basic .NET Essentials
No ratings yet
Visual Basic .NET Essentials
40 pages
Solution Manual For Operating Systems Internals and Design Principles 9th Edition Stallings 0134670957 9780134670959 PDF Download
100% (2)
Solution Manual For Operating Systems Internals and Design Principles 9th Edition Stallings 0134670957 9780134670959 PDF Download
153 pages
Quadratic Equations (Part-2)
No ratings yet
Quadratic Equations (Part-2)
14 pages
Timers and Counters
No ratings yet
Timers and Counters
5 pages
Software Enginnering
No ratings yet
Software Enginnering
9 pages
Fiber-Optic Technologies - A Brief History of Fiber-Optic Communications PDF
No ratings yet
Fiber-Optic Technologies - A Brief History of Fiber-Optic Communications PDF
17 pages
DBMS Lab Manual for Students
No ratings yet
DBMS Lab Manual for Students
121 pages
AISaksham Brochure 2023
No ratings yet
AISaksham Brochure 2023
4 pages
Black Blue Futuristic Cyber Security Presentation
No ratings yet
Black Blue Futuristic Cyber Security Presentation
8 pages
EE5 Writing Revision Unit1-5 KEY
No ratings yet
EE5 Writing Revision Unit1-5 KEY
8 pages
Overload Switch Manual (Screen Board Function 2.34)
No ratings yet
Overload Switch Manual (Screen Board Function 2.34)
9 pages
CCNASv2 - InstructorPPT - CH3 AAA Updated
No ratings yet
CCNASv2 - InstructorPPT - CH3 AAA Updated
42 pages
IoT Based 2D Indoor Navigation System Using BLE Beacons and Dijkstras Algorithm
No ratings yet
IoT Based 2D Indoor Navigation System Using BLE Beacons and Dijkstras Algorithm
6 pages
Ite Unit 5
No ratings yet
Ite Unit 5
16 pages
VEICHI Burner Operation Manual 1 Product Overview
No ratings yet
VEICHI Burner Operation Manual 1 Product Overview
11 pages
MC60 Series GNSS: AT Commands Manual
No ratings yet
MC60 Series GNSS: AT Commands Manual
22 pages
Advanced Database Management Systems: Assignment 01
No ratings yet
Advanced Database Management Systems: Assignment 01
8 pages
Cloud Security Essentials
No ratings yet
Cloud Security Essentials
45 pages

Hadoop Week 6

Uploaded by

Hadoop Week 6

Uploaded by

Course Topics

• Problems in the real world

2006 Big Table paper published

2006 HBase development starts

2008 Microsoft buys powerset

2010 Facebook’s messaging system

real-time, distributed, linearly scalable, robust, BigData, open-source, key-value, column-oriented

HBase is a key/value store. Specifically it is a

uses HBase to store document fingerprint for

uses an HBase cluster containing

Row key Personal_data demographic

Col5(address) -> D. Copper

Row 3 (3) Col4(Gender) -> val4

Family1(personal data) Family2(Demographic)

HDFS HFile WAL

• Table made of regions

Zookeeper Zookeeper Zookeeper

Zookeeper Zookeeper Scale Zookeeper

Data Node Data Node

“com.apache.www” t11 “<html>…”

t10 “anchor:apache.com” “APACHE”

Get value from table where key=„com.apache.www‟ AND label=„anchor:apache.com‟

Scanner Select value from table

• hbase(main):003:0> create 'test', 'cf'

• hbase(main):007:0> scan 'test'

You might also like