Lecture8 -Big Data (Hadoop)
Lecture8 -Big Data (Hadoop)
Lecture 8
Big Data
“I have been surprised and delighted over the years about how
many people are interested in working with data. There’s
definitely a new geek in town. And in 2015, this geek is a data
geek.”
Christian Chabot, founder and CEO - Tableau
• Its about applying new tools to do more analytics on more data for more
people.
Glen Mules – Big Data University Glen Mules – Big Data University
Big Data - Definition
Bill Howe, UW
Big Data Scenario: Netflix
Big Data Scenario: Amazon
Big Data Characteristics: 3 V’s
• Volume Terabyte = 101 2
Exabyte = 101 8
Zettabyte = 1021
The size of the data Brontobyte = 1027
• Velocity
The speed at which new 1021
data is generated
• Variety
The diversity of sources,
formats, quality, structures
They could also be 4 V’s
Integrate and govern all data Integration, Data Quality, Security, Lifecycle
sources Management, MDM
•Integration
•Analytics
•Visualization
•Development
•Workload Optimization
• Cassandra and Hbase: a non-relational database designed for use with Hadoop
• Hive: a query language similar to SQL (HiveQL) but compatible with Hadoop
• Mahout: an AI tool designed for machine learning; that is, to assist with filtering
data for analysis and exploration
• Pig Latin: A data-flow language and execution framework for parallel computation