Chapter 1
Chapter 1
Intelligence
Course description
1
Course materials
1. Vijayan Sugumaran, Arun Kumar Sangaiah, Arunkumar Thangavelu,
“Computational intelligence applications in business and big data
analytics”, Taylor & Francis, (2017)
2. Daniel O'Reilly, Python for Data Science: The Ultimate Step-by-Step
Guide to Python Programming. Discover How to Master Big Data
Analysis and Understand Machine Learning, ISBN: 979-8719424248,
(2021).
3. Michael Minelli, Michele Chambers, Ambiga Dhiraj, “Big data,big analytics
- emerging business intelligence and analytic trends for today’s
businesses”, John Wiley & Sons, (2013)
4. Steve Williams, ”Business intelligence strategy and big data analytics: a
general management perspective”, Elsevier, (2016)
5. David Dietrich, Barry Heller, Beibei Yang, “Data science and big data
analytics”, Wiley, (2015) Oracle, “Data Mining Concepts”, 18c, E83730-03,
2018
6. https://www.ibm.com/analytics/hadoop/big-data-analytics
Assessment methods
2
Content
Week Content
1 Big data overview
Lecturer 1
3
What’s Big Data?
Big data is the term for a collection of data sets so large and
complex that it becomes difficult to process using on-hand
database management tools or traditional data processing
applications.
The challenges include capture, curation, storage, search, sharing,
transfer, analysis, and visualization.
The trend to larger data sets is due to the additional information
derivable from analysis of a single large set of related data, as
compared to separate smaller sets with the same total amount of
data, allowing correlations to be found to "spot business trends,
determine quality of research, prevent diseases, link legal citations,
combat crime, and determine real-time roadway traffic
conditions.” 7
4
What made Big Data needed?
10
5
Scalability — Scale Up & Scale Out
● Scale out
● Use more resources to distribute workload in parallel
● Higher data access latency is typically incurred
● Scale up
● Efficiently use the resources
● Architecture-aware algorithm design
www.stanford.edu/~cdel/2014.asplos.quasar.pdf
11
12
6
Techniques towards Big Data
• Massive Parallelism
• Huge Data Volumes Storage
• Data Distribution
• High-Speed Networks
• High-Performance Computing
• Task and Thread Management
• Data Mining and Analytics
• Data Retrieval
• Machine Learning
• Data Visualization
13
14
14
7
Why Big Data now?
• High-Volume
➔ • High-Velocity
• High-Variety
➔ Artificial Intelligence
15
15
16
16
8
Volume (Scale)
Data Volume
44x increase from 2009 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially
Exponential increase in
collected/generated data
17
17
4.6
30 billion RFID billion
tags today
12+ TBs (1.3B in 2005)
camera
of tweet data phones
every day world wide
100s of
millions
data every day
of GPS
? TBs of
enabled
devices
sold
annually
25+ TBs of
log data 2+
every day billion
people on
the Web
76 million smart by end
meters in 2009… 2011
200M by 2014
18
9
Maximilien Brice, © CERN
CERN’s Large Hydron Collider (LHC) generates 15 PB a year
19
The Earthscope
20
10
Variety (Complexity)
Streaming Data
You can only scan the data once
21
Social Banking
Media Finance
Our
Gaming
Customer Known
History
Purchas
Entertain
e
22
11
Velocity (Speed)
23
23
Real-time/Fast Data
Mobile devices
(tracking all objects all the time)
24
24
12
Real-Time Analytics/Decision Requirement
Product
Recommendations Learning why Customers
Influence
that are Relevant Behavior Switch to competitors
& Compelling and their offers; in
time to Counter
Friend Invitations
Improving the Customer to join a
Marketing Game or Activity
Effectiveness of a that expands
Promotion while it business
is still in Play
Preventing Fraud
as it is Occurring
& preventing more
proactively
25
26
26
13
Harnessing Big Data
27
27
Old Model: Few companies are generating data, all others are consuming data
New Model: all of us are generating data, and all of us are consuming
data
28
28
14
What’s driving Big Data
29
29
Interactive Business
Speed
Intelligence & Big Data:
In-memory RDBMS Scale
Real Time &
Single View
BI Reporting QliqView, Tableau, HANA
OLAP &
Graph Databases
Dataware house
Business Objects, SAS, Big Data:
Scale Speed
Informatica, Cognos other SQL Batch Processing &
Reporting Tools
Distributed Data Store
Hadoop/Spark; HBase/Cassandra
30
15
Big Data Analytics
31
31
32
16
Big Data Technology
33
33
Cloud Computing
34
17
wikipedia:Cloud Computing
35
Benefits
36
18
Types of Cloud Computing
37
38
19
Infrastructure as a Service (IaaS)
39
Storage-as-a-service
Database-as-a-service
Information-as-a-service
Process-as-a-service
Application-as-a-service
Platform-as-a-service
Integration-as-a-service
Security-as-a-service
Management/
Governance-as-a-service
Testing-as-a-service
Infrastructure-as-a-service
40
20
Key Ingredients in Cloud Computing
41
Hardware Hardware
42
21
Everything as a Service
43
44
22
The Obligatory Timeline Slide
(Mike Culver @ AWS)
COBOL, Amazon.com
Edsel ARPANET Internet
45
AWS
46
23
What does Azure platform offer to developers?
47
Python
BigTable
Other API’s
VMs
Flat File Storage
AppEngine: EC2/S3:
Higher-level functionality Lower-level functionality
(e.g., automatic scaling) More flexible
More restrictive Coarser billing model
(e.g., respond to URL only)
Proprietary lock-in
Go
ogl
48
24
Human brain is a graph/network of 100B nodes and 700T edges.
memory
• Graph Database:
• Large-Scale
Native Store
49
49
50
50
25
Why you want to take this class
22
51
30
52
26
Big Data Examples -- Application Use Cases
1. Expertise Location
2. Recommendation
3. Commerce
4. Financial Analysis
5. Social Media Monitoring
6. Telco Customer Analysis
7. Healthcare Analysis
8. Data Exploration and Visualization
9. Personalized Search
10. Anomaly Detection
11. Fraud Detection
12. Cybersecurity
13. Sensor Monitoring (Smarter another Planet)
14. Cellular Network Monitoring
15. Cloud Monitoring
16. Code Life Cycle Management
17. Traffic Navigation
18. Image and Video Semantic Understanding
19. Genomic Medicine
20. Brain Network Analysis
21. Data Curation
22. Near Earth Object Analysis
31
53
item
Enhancing:
user
Graph Visualizations
54
27
Use Case 1: Social Network Analysis in Enterprise for Productivity
Production Live System used by IBM GBS since 2009 – verified ~$100M contribution
15,000 contributors in 76 countries; 92,000 annual unique IBM users
25,000,000+ emails & SameTime messages (incl. Content features) Shortest
Paths
1,000,000+ Learning clicks; 14M KnowledgeView, SalesOne, …, access data
1,000,000+ Lotus Connections (blogs, file sharing, bookmark) data
Centralities
200,000 people’s consulting project & earning data
Graph
Search
Dynamic networks
of 400,000+
IBMers:
– On BusinessWeek four times, including being the Top Story of Week, April 2009 Shortest Paths
– Help IBM earned the 2012 Most Admired Knowledge Enterprise Award Social Capital
– Wharton School study: $7,010 gain per user per year using the tool Bridges
– In 2012, contributing about 1/3 of GBS Practitioner Portal $228.5 million savings andHubs
benefits
Expertise Search
– APQC (WW leader in Knowledge Practice) April 2013:
Graph Search
“The Industry Leader and Best Practice in Expertise Location” Graph Recomm.
55
55
56
56
28
Use Case 3: Customer Behavior Sequence Analytics
Markov Latent Bayesian
Network Network Network
57
57
▪ Data Source:
– Relationships among 7594
companies, data mining from
NYT 1981 ~ 2009
58
29
Use Case 5: Social Media Monitoring
59
60
30
Category 2: Data Exploration
Enhancing:
61
headache
chill migraine
high fever
stomachache
cough
Graph
Communities
62
62
31
User Case 8: Visualization for Navigation and Exploration
http://systemg.ibm.com/apps/whisper/
index.html
SocialHelix: Visualizaiton of
Sentiment Divergence in
Social Media
63
63
ranking re-ranking
Interest / social network
based content
recommendations
Info-Socio
networks Graph analysis query context
64
64
32
Category 3: Security
Network Ponzi scheme Detection Ego Net
Info Flow Features
Normal:
Attacker:
(1)Clique-like
Near-Star
(2)Two-way links
Detecting DoS
attack
Graph Visualizations
65
Emails
Graph analysis
Instant Messaging
Social sensors
Web Access Behavior analysis Detection,
Click streams capturer Multimodality
Executed Processes Prediction
Feed subscription Semantics analysis Analysis &
Printing
Exploration
Copying Database access Psychological Interface
analysis
Log On/Off
66
33
Use Case 11: Fraud Detection for Bank
Network Ego Net
Info Flow Features
Normal:
Attacker:
(1)Clique-like
Near-Star
(2)Two-way links
67
67
Detecting DoS
attack
68
68
34
Category 4: Operations Analysis
Cloud Service Placement
Network Server
KPIs KPIs Graph
Matching
Bayesian
Network
Varying over
KPI time series (e.g., ? time
server performance/
Causality
load, network analyzer
performance/load)
KPI (a time series)
(potential) pairwise
relationship (e.g., causality)
Graph Visualizations
69
Bayesian Network
* 3 timesteps * 63 variables
*3.9 avg states * 4.0 avg
indegree
* 16,858 CPT entries
Junction Tree
* 67 cliques
* 873,064 PT entries in cliques
70
70
35
Use Case 14: Cellular Network Analytics in Telco Operation
Goal: Efficiently and uniquely identify internal state of
Cellular/Telco networks (e.g., performance and load of
network elements/links) using probes between monitors
placed at selected network elements & endhosts Network load
level report
71
71
Varying over
KPI time series Causality ? time
(e.g., server analyzer
performance/load,
network KPI (a time series)
performance/load) (potential) pairwise
relationship (e.g., causality)
Select KPI pairs (sampling)→ Test link existence → Estimate unsampled links based on history
50 → Overall graph
72
36
Category 5: Data Warehouse Augmentation
73
73
Graph
Graph
Graph objects
Graph objects
Convert from Convert to
Graph DB Graph DB model
Relational
Traditional (relational) model
74
74
37
Use Case 17: Smart Navigation Utilizing Real-time Road
Information
Goal: Enable unprecedented level of accuracy in traffic scheduling (for a fleet of
transportation vehicles) and navigation of individual cars utilizing the dynamic real-
time information of changing road condition and predictive analysis on the data
75
Use Case 18: Graph Analysis for Image and Video Analysis
Vertex Attribute
Correspondence Transformation
ARG s ARG t
76
76
38
Use Case 19: Graph Matching for Genomic Medicine
77
77
56
78
39
Use Case 21: Understanding Brain Network
79
79
80
80
40