BIG DATA
[Link] Rose
Associate Professor
PSG College of Technology
Coimbatore
Big Data Definition
No single standard definition
Big Data is data whose scale, diversity,
and complexity require new architecture,
techniques, algorithms, and analytics to
manage it and extract value and hidden
knowledge from it
First Coined
Large set of data that is almost impossible to manage and process using
traditional business intelligence tools
How long has big data
existed?
Now - Big Data EveryWhere!
Analogy to Telescope and
Microscope
Informally
"How will I be able
to
perform
this
function if I had to
do it at a scale of
100X what it is
Manage
Analyze
Summarize
Visualize
Discover knowledge
Information privacy
In
Timely manner
Scalable fashion
Not
Availability of
data
Ability to collect
data
Characteristics of Big Data:
1-Scale (Volume)
Data Volume
44x increase from 2009 to 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
Exponential increase in
1 zettabyte
collected/generated data
9
= one
sextillion (1021)
= 270 bytes
Characteristics of Big Data:
2-Complexity (Varity)
Various formats, types, and
structures
Text, numerical, images, audio,
video, sequences, time series, social
media data, multi-dim arrays, etc
Static data vs. streaming data
A single application can be
generating/collecting many types of
data
To extract knowledge all these
types of data need to be linked
together
10
Characteristics of Big Data:
3-Speed (Velocity)
Data is begin generated fast and need to be
processed fast
Online Data Analytics
Late decisions missing opportunities
Examples
E-Promotions: Based on your current location, your purchase
history, what you like send promotions right now for store next
to you
Healthcare monitoring: sensors monitoring your activities and
body any abnormal measurements require immediate reaction
11
Big Data: 3Vs
12
Purchase Funnel
Some Make it 4Vs
14
Harnessing Big Data
OLTP: Online Transaction Processing
OLAP: Online Analytical Processing
(DBMSs)
(Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture &
technology)
15
Whos Generating Big Data
Mobile devices
(tracking all objects all the time
Social media and networksScientific instruments
(all of us are generating data)(collecting all sorts of data)
Sensor technology and
networks
(measuring all kinds of data)
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover
knowledge from the collected data in a timely manner and in a scalable fashion
16
The Model Has Changed
The Model of Generating/Consuming Data has
Changed
d Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are
consuming data
17
Whats driving Big Data
-
Optimizations and predictive analytics
Complex statistical analysis
All types of data, and many sources
Very large datasets
More of a real-time
18
Ad-hoc querying and reporting
Data mining techniques
Structured data, typical sources
Small to mid-size datasets
Value of Big Data Analytics
Big data is more real-time in
nature than traditional DW
applications
Traditional DW architectures
(e.g. Exadata, Teradata) are
not well-suited for big data
apps
Shared nothing, massively
parallel processing, scale out
architectures are well-suited
for big data apps
19
Shared Nothing
Architecture
Scale up vs Scale out
Architecture
Challenges in Handling Big Data
The Bottleneck is in technology
New architecture, algorithms, techniques are needed
Also in technical skills
Experts in using the new technology and dealing with big
data
21
What Technology Do We Have
For Big Data ??
22
23
Big Data Technology
24
Data Driven Decision Making
Approach to business governance that values decisions
that can be backed up with verifiable data
A study -MIT Center for Digital Business
organizations driven most by data-based decision making
had 4% higher productivity rates and 6% higher profits
Challenge
integratingmassive amounts of informationfrom different
areas of the business and combining it to derive
actionable data in real time can be easier said than done