0% found this document useful (0 votes)
66 views

Chapter 1

This document provides an overview of a course on big data analytics and business intelligence. It describes the course objectives, materials, assessment methods, and content including topics covered in each week. The document also includes introductory content on defining big data, its characteristics, computing resources needed and techniques used for big data.

Uploaded by

SANG VÕ NGỌC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
66 views

Chapter 1

This document provides an overview of a course on big data analytics and business intelligence. It describes the course objectives, materials, assessment methods, and content including topics covered in each week. The document also includes introductory content on defining big data, its characteristics, computing resources needed and techniques used for big data.

Uploaded by

SANG VÕ NGỌC
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 40

Big Data Analytics and Business

Intelligence

Course description

 The main goal of this course is to help students


understand and solve problems of big data and
applications of big data analytics to business
intelligence.

 The main objective of the course: the concept of big data


overview, algorithms and data analytics technology on big
data support for business.

1
Course materials
1. Vijayan Sugumaran, Arun Kumar Sangaiah, Arunkumar Thangavelu,
“Computational intelligence applications in business and big data
analytics”, Taylor & Francis, (2017)
2. Daniel O'Reilly, Python for Data Science: The Ultimate Step-by-Step
Guide to Python Programming. Discover How to Master Big Data
Analysis and Understand Machine Learning, ISBN: 979-8719424248,
(2021).
3. Michael Minelli, Michele Chambers, Ambiga Dhiraj, “Big data,big analytics
- emerging business intelligence and analytic trends for today’s
businesses”, John Wiley & Sons, (2013)
4. Steve Williams, ”Business intelligence strategy and big data analytics: a
general management perspective”, Elsevier, (2016)
5. David Dietrich, Barry Heller, Beibei Yang, “Data science and big data
analytics”, Wiley, (2015) Oracle, “Data Mining Concepts”, 18c, E83730-03,
2018
6. https://www.ibm.com/analytics/hadoop/big-data-analytics

Assessment methods

 Mini project: 35%


 Presentation: 15%
 Final exam: 50%

2
Content

Week Content
1 Big data overview

2,3 Basics in big data analytics

4,5 Big data analytics

6 Business intelligence system

7,8 Big data analytics supports business

9,10,11,12 Student group reports on major subject


matter topics

Lecturer 1

BIG DATA OVERVIEW

3
What’s Big Data?

No single definition; here is from Wikipedia:

Big data is the term for a collection of data sets so large and
complex that it becomes difficult to process using on-hand
database management tools or traditional data processing
applications.
The challenges include capture, curation, storage, search, sharing,
transfer, analysis, and visualization.
The trend to larger data sets is due to the additional information
derivable from analysis of a single large set of related data, as
compared to separate smaller sets with the same total amount of
data, allowing correlations to be found to "spot business trends,
determine quality of research, prevent diseases, link legal citations,
combat crime, and determine real-time roadway traffic
conditions.” 7

Definition and Characteristics of Big Data

“Big data is high-volume, high-velocity and high-variety information assets that


demand cost-effective, innovative forms of information processing for
enhanced insight and decision making.” -- Gartner

which was derived from:

“While enterprises struggle to consolidate systems and collapse redundant


databases to enable greater operational, analytical, and collaborative
consistencies, changing economic conditions have made this job more difficult.
E-commerce, in particular, has exploded data management challenges along
three dimensions: volumes, velocity and variety. In 2001/02, IT organizations
much compile a variety of approaches to have at their disposal for dealing
each.” – Doug Laney

4
What made Big Data needed?

“Big Data Analytics”, David Loshin


9

Key Computing Resources for Big Data

• Processing capability: CPU, processor, or node.


• Memory
• Storage
• Network

“Big Data Analytics”, David Loshin


1
0

10

5
Scalability — Scale Up & Scale Out

● Scale out
● Use more resources to distribute workload in parallel
● Higher data access latency is typically incurred
● Scale up
● Efficiently use the resources
● Architecture-aware algorithm design

Example: Resource utilization for a large production cluster at


Twitter data center

www.stanford.edu/~cdel/2014.asplos.quasar.pdf

• For independent data ==> scale up may not have obvious


advantage than scale out
• For linked data ==> utilizing scale up as much as possible
before scale out
5

11

Contrasting Approaches in Adopting High-Performance Capabilities

“Big Data Analytics”, David Loshin


6

12

6
Techniques towards Big Data

• Massive Parallelism
• Huge Data Volumes Storage
• Data Distribution
• High-Speed Networks
• High-Performance Computing
• Task and Thread Management
• Data Mining and Analytics
• Data Retrieval
• Machine Learning
• Data Visualization

➔ Techniques exist for years to decades. Why is Big Data


hot now?
13

13

Why Big Data now?

• More data are being collected and stored


• Open source code
• Commodity hardware / Cloud

14

14

7
Why Big Data now?

• More data are being collected and stored


• Open source code
• Commodity hardware / Cloud

• High-Volume
➔ • High-Velocity
• High-Variety

➔ Artificial Intelligence

15

15

Big Data: 3V’s

16

16

8
Volume (Scale)

Data Volume
44x increase from 2009 2020
From 0.8 zettabytes to 35zb
Data volume is increasing exponentially

Exponential increase in
collected/generated data

17

17

4.6
30 billion RFID billion
tags today
12+ TBs (1.3B in 2005)
camera
of tweet data phones
every day world wide

100s of
millions
data every day

of GPS
? TBs of

enabled
devices
sold
annually

25+ TBs of
log data 2+
every day billion
people on
the Web
76 million smart by end
meters in 2009… 2011
200M by 2014

18

9
Maximilien Brice, © CERN
CERN’s Large Hydron Collider (LHC) generates 15 PB a year

19

The Earthscope

• The Earthscope is the world's largest science project.


Designed to track North America's geological evolution,
this observatory records data over 3.8 million square
miles, amassing 67 terabytes of data. It analyzes seismic
slips in the San Andreas fault, sure, but also the plume
of magma underneath Yellowstone and much, much
more.
(http://www.msnbc.msn.com/id/44363598/ns/technol
ogy_and_science-future_of_technology/#.TmetOdQ--
uI)

20

10
Variety (Complexity)

Relational Data (Tables/Transaction/Legacy Data)


Text Data (Web)
Semi-structured Data (XML)
Graph Data
Social Network, Semantic Web (RDF), …

Streaming Data
You can only scan the data once

A single application can be generating/collecting many


types of data

Big Public Data (online, weather, finance, etc)

To extract knowledge all these types of


data need to linked together 21

21

A Single View to the Customer

Social Banking
Media Finance

Our
Gaming
Customer Known
History

Purchas
Entertain
e

22

11
Velocity (Speed)

Data is begin generated fast and need to be processed fast


Online Data Analytics
Late decisions  missing opportunities
Examples
E-Promotions: Based on your current location, your purchase history, what
you like  send promotions right now for store next to you

Healthcare monitoring: sensors monitoring your activities and body  any


abnormal measurements require immediate reaction

23

23

Real-time/Fast Data

Mobile devices
(tracking all objects all the time)

Social media and networks Scientific instruments


(all of us are generating data) (collecting all sorts of data)

Sensor technology and


networks
(measuring all kinds of data)
The progress and innovation is no longer hindered by the ability to collect data
But, by the ability to manage, analyze, summarize, visualize, and discover knowledge
from the collected data in a timely manner and in a scalable fashion

24

24

12
Real-Time Analytics/Decision Requirement

Product
Recommendations Learning why Customers
Influence
that are Relevant Behavior Switch to competitors
& Compelling and their offers; in
time to Counter

Friend Invitations
Improving the Customer to join a
Marketing Game or Activity
Effectiveness of a that expands
Promotion while it business
is still in Play
Preventing Fraud
as it is Occurring
& preventing more
proactively

25

Some Make it 4V’s

26

26

13
Harnessing Big Data

OLTP: Online Transaction Processing (DBMSs)


OLAP: Online Analytical Processing (Data Warehousing)
RTAP: Real-Time Analytics Processing (Big Data Architecture & technology)

27

27

The Model Has Changed…

The Model of Generating/Consuming Data has Changed

Old Model: Few companies are generating data, all others are consuming data

New Model: all of us are generating data, and all of us are consuming
data

28

28

14
What’s driving Big Data

- Optimizations and predictive analytics


- Complex statistical analysis
- All types of data, and many sources
- Very large datasets
- More of a real-time

- Ad-hoc querying and reporting


- Data mining techniques
- Structured data, typical sources
- Small to mid-size datasets

29

29

The evolution of business intelligence

Interactive Business
Speed
Intelligence & Big Data:
In-memory RDBMS Scale
Real Time &
Single View
BI Reporting QliqView, Tableau, HANA
OLAP &
Graph Databases
Dataware house
Business Objects, SAS, Big Data:
Scale Speed
Informatica, Cognos other SQL Batch Processing &
Reporting Tools
Distributed Data Store
Hadoop/Spark; HBase/Cassandra

1990’s 2000’s 2010’s

30

15
Big Data Analytics

Big data is more real-time in nature than


traditional DW applications
Traditional DW architectures (e.g.
Exadata, Teradata) are not well-suited for
big data apps
Shared nothing, massively parallel
processing, scale out architectures are
well-suited for big data apps

31

31

32

16
Big Data Technology

33

33

Cloud Computing

IT resources provided as a service


Compute, storage, databases, queues
Clouds leverage economies of scale of commodity hardware
Cheap storage, high bandwidth networks & multicore processors
Geographically distributed data centers
Offerings from Microsoft, Amazon, Google, …

34

17
wikipedia:Cloud Computing

35

Benefits

Cost & management


Economies of scale, “out-sourced” resource management
Reduced Time to deployment
Ease of assembly, works “out of the box”
Scaling
On demand provisioning, co-locate data and compute
Reliability
Massive, redundant, shared resources
Sustainability
Hardware not owned

36

18
Types of Cloud Computing

Public Cloud: Computing infrastructure is hosted at the vendor’s


premises.
Private Cloud: Computing architecture is dedicated to the customer
and is not shared with other organisations.
Hybrid Cloud: Organisations host some critical, secure applications
in private clouds. The not so critical applications are hosted in the
public cloud
Cloud bursting: the organisation uses its own infrastructure for normal
usage, but cloud is used for peak loads.
Community Cloud

37

Classification of Cloud Computing based on Service


Provided

Infrastructure as a service (IaaS)


Offering hardware related services using the principles of cloud
computing. These could include storage services (database or disk
storage) or virtual servers.
Amazon EC2, Amazon S3, Rackspace Cloud Servers and Flexiscale.
Platform as a Service (PaaS)
Offering a development platform on the cloud.
Google’s Application Engine, Microsofts Azure, Salesforce.com’s
force.com .
Software as a service (SaaS)
Including a complete software offering on the cloud. Users can access
a software application hosted by the cloud vendor on pay-per-use
basis. This is a well-established sector.
Salesforce.coms’ offering in the online Customer Relationship
Management (CRM) space, Googles gmail and Microsofts hotmail,
Google docs.

38

19
Infrastructure as a Service (IaaS)

39

More Refined Categorization

Storage-as-a-service
Database-as-a-service
Information-as-a-service
Process-as-a-service
Application-as-a-service
Platform-as-a-service
Integration-as-a-service
Security-as-a-service
Management/
Governance-as-a-service
Testing-as-a-service
Infrastructure-as-a-service

InfoWorld Cloud Computing Deep Dive

40

20
Key Ingredients in Cloud Computing

Service-Oriented Architecture (SOA)


Utility Computing (on demand)
Virtualization (P2P Network)
SAAS (Software As A Service)
PAAS (Platform AS A Service)
IAAS (Infrastructure AS A Servie)
Web Services in Cloud

41

Enabling Technology: Virtualization

App App App

App App App OS OS OS

Operating System Hypervisor

Hardware Hardware

Traditional Stack Virtualized Stack

42

21
Everything as a Service

Utility computing = Infrastructure as a Service (IaaS)


Why buy machines when you can rent cycles?
Examples: Amazon’s EC2, Rackspace
Platform as a Service (PaaS)
Give me nice API and take care of the maintenance, upgrades, …
Example: Google App Engine
Software as a Service (SaaS)
Just run it for me!
Example: Gmail, Salesforce

43

Cloud versus cloud

Amazon Elastic Compute Cloud


Google App Engine
Microsoft Azure
GoGrid
AppNexus

44

22
The Obligatory Timeline Slide
(Mike Culver @ AWS)

COBOL, Amazon.com
Edsel ARPANET Internet

Web Web as a Web Services,


Darkness
Awareness Platform Resources Eliminated

Dot-Com Bubble Web 2.0 Web Scale


Computing

45

AWS

Elastic Compute Cloud – EC2 (IaaS)


Simple Storage Service – S3 (IaaS)
Elastic Block Storage – EBS (IaaS)
SimpleDB (SDB) (PaaS)
Simple Queue Service – SQS (PaaS)
CloudFront (S3 based Content Delivery Network – PaaS)
Consistent AWS Web Services API

46

23
What does Azure platform offer to developers?

47

Google’s AppEngine vs Amazon’s EC2

Python
BigTable
Other API’s

VMs
Flat File Storage

AppEngine: EC2/S3:
Higher-level functionality Lower-level functionality
(e.g., automatic scaling) More flexible
More restrictive Coarser billing model
(e.g., respond to URL only)
Proprietary lock-in

Go
ogl

48

24
Human brain is a graph/network of 100B nodes and 700T edges.

• Machine Cognition: • Machine Learning:


• Robot Cognition • Machine Learning Tools
Tools • Deep Learning Tools
• Feeling
• Graph Analytics:
• Machine Reasoning: • Network Analysis
recognition • Matching and Search
• Bayesian
Networks perception • Flow Prediction
• Game Theory
Tools • Graph Visualization:
comprehension sensors
• Dynamic Graph
strategy representation • Big Graph

memory

• Graph Database:
• Large-Scale
Native Store

49

49

Big Data AI Platform Example: Graphen Ardi

50

50

25
Why you want to take this class

• Key Differentiator of this class: Focusing on building a full-spectrum


understanding of the latest Big Data Analytics technologies and using
them to build real industry real-world solutions.

• Sapphire Big Data Analytics Open Source Applications: Create a Big


Data open source toolsets for various industries (and disciplines)

• Dataset and Use Cases: Welcome!!

22

51

5 Example Big Data Use Case Categories

Big Data Exploration Enhanced 360o View Security/Intelligence


Find, visualize, understand all of the Customer Extension
big data to improve decision Extend existing customer Lower risk, detect fraud
making views (MDM, CRM, etc) by and monitor cyber security
incorporating additional in real-time
internal and external
information sources

Operations Analysis Data Warehouse Augmentation


Analyze a variety of machine Integrate big data and data warehouse
data for improved business results capabilities to increase operational efficiency

30

52

26
Big Data Examples -- Application Use Cases
1. Expertise Location
2. Recommendation
3. Commerce
4. Financial Analysis
5. Social Media Monitoring
6. Telco Customer Analysis
7. Healthcare Analysis
8. Data Exploration and Visualization
9. Personalized Search
10. Anomaly Detection
11. Fraud Detection
12. Cybersecurity
13. Sensor Monitoring (Smarter another Planet)
14. Cellular Network Monitoring
15. Cloud Monitoring
16. Code Life Cycle Management
17. Traffic Navigation
18. Image and Video Semantic Understanding
19. Genomic Medicine
20. Brain Network Analysis
21. Data Curation
22. Near Earth Object Analysis
31

53

Category 1: 360º View


Recommendation

item

Enhancing:
user

Graph Visualizations

Communities Graph Search Network Info Flow Bayesian Networks


Centralities Graph Query Shortest Paths Latent Net Inference

Ego Net Features Graph Matching Graph Sampling Markov Networks

Middleware and Database


54

54

27
Use Case 1: Social Network Analysis in Enterprise for Productivity
Production Live System used by IBM GBS since 2009 – verified ~$100M contribution
15,000 contributors in 76 countries; 92,000 annual unique IBM users
25,000,000+ emails & SameTime messages (incl. Content features) Shortest
Paths
1,000,000+ Learning clicks; 14M KnowledgeView, SalesOne, …, access data
1,000,000+ Lotus Connections (blogs, file sharing, bookmark) data
Centralities
200,000 people’s consulting project & earning data
Graph
Search

Dynamic networks
of 400,000+
IBMers:

– On BusinessWeek four times, including being the Top Story of Week, April 2009 Shortest Paths
– Help IBM earned the 2012 Most Admired Knowledge Enterprise Award Social Capital
– Wharton School study: $7,010 gain per user per year using the tool Bridges
– In 2012, contributing about 1/3 of GBS Practitioner Portal $228.5 million savings andHubs
benefits
Expertise Search
– APQC (WW leader in Knowledge Practice) April 2013:
Graph Search
“The Industry Leader and Best Practice in Expertise Location” Graph Recomm.
55

55

Use Case 2: Personalized Recommendation

56

56

28
Use Case 3: Customer Behavior Sequence Analytics
Markov Latent Bayesian
Network Network Network

• Behavior Pattern Detection


login browsing
• Help Needed Detection

search comparing Checkout

57

57

Use Case 4: Graph Analytics for Financial Analysis


Goal: Injecting Network Graph Effects for Financial Analysis. Estimating company performance
considering correlated companies, network properties and evolutions, causal parameter analysis, etc.

▪ IBM 2003 ▪ IBM 2009

▪ Data Source:
– Relationships among 7594
companies, data mining from
NYT 1981 ~ 2009

Targets: 20 Fortune Network feature:


companies’ normalized s (current year network
Profits feature),
t (temporal network feature),
Goal: Learn from d (delta value of network
previous 5 years, and feature)
predict next year Financial feature:
Model: Support Vector p (historical profits and
Regression (RBF kernel) Profit prediction by joint network and financial analysis
outperforms network-only by 130% and financial-only by
36 33%.

58

29
Use Case 5: Social Media Monitoring

monitoring categories Monitoring filter

Real-Time Translation, Loca


Live Tweets, Sentiment, KD
e yw
naom
rdisc GraphZsooming / Panning Top Retweets
59

59

Use Case 6: Customer Social Analysis for Telco


Applications
Goal: Extract customer social network High Value Viral
behaviors to enable Call Detail Records (CDRs) Personalized Customer marketing
Advertisement Identification
data monetization for Telco. & targeting
campaign

▪ Applications based on the extracted social enable


profiles
– Personalized advertisement (beyond the scope
of traditional campaign in Telco)
Customer Profiles
– High value customer identification and (influence, community,
targeting etc.)
– Viral marketing campaign
▪ Approach
– Construct social graphs from CDRs based on Degree
Weakly
Maximal
Connected
{caller, callee, call time, call duration} Centrality
Component Cliques

– Extract customer social features (e.g.


influence, communities, etc.) from the Community
Pagerank K-core
constructed social graph as customer social Detection
profiles
– Build analytics applications (e.g. personalized System G Analysis
advertisement) based on the extracted
BigInsights
customer social profiles

PoCs with Chinese and Indian Telecomm companies CDR


60

60

30
Category 2: Data Exploration

Enhancing:

Huge Network Network I2 3D Network Geo Network Graphical


Visualization Propagation Visualization Visualization Model
Visualization
Communities Graph Search Network Info Flow Bayesian Networks
Centralities Graph Query Shortest Paths Latent Net Inference
Ego Net Features Graph Matching Graph Sampling
Markov Networks

Middleware and Database


61

61

Use Case 7: Graph Analytics and Visualization


Graph
Matching
Matches
Query

headache
chill migraine
high fever
stomachache
cough

Graph
Communities

62

62

31
User Case 8: Visualization for Navigation and Exploration

Whisper : Tracing the


information diffusion in
Social Media

http://systemg.ibm.com/apps/whisper/
index.html

SocialHelix: Visualizaiton of
Sentiment Divergence in
Social Media

63

63

Use Case 9: Graph Search

existing search engine Graph


query Search
index Improved search results

ranking re-ranking
Interest / social network
based content
recommendations
Info-Socio
networks Graph analysis query context

64

64

32
Category 3: Security
Network Ponzi scheme Detection Ego Net
Info Flow Features

Normal:
Attacker:
(1)Clique-like
Near-Star
(2)Two-way links
Detecting DoS
attack

Graph Visualizations

Communities Graph Search Network Info Flow Bayesian Networks

Centralities Graph Query Shortest Paths Latent Net Inference

Ego Net Features Graph Matching Graph Sampling Markov Networks

Middleware and Database


65

65

Use Case 10: Anomaly Detection at Multiple Scales

Based on President Executive Order 13587

Goal: System for Detecting and Predicting


Abnormal Behaviors in Organization, through “Enterprise Information
large-scale social network & cognitive analytics Leakage Impacted
and data mining, to decrease insider threats such economy and jobs” Feb
as espionage, sabotage, colleague-shooting, 2013
suicide, etc.
“What's emerged is a
multibillion dollar detective
industry”
npr Jan 10, 2013

Emails
Graph analysis
Instant Messaging
Social sensors
Web Access Behavior analysis Detection,
Click streams capturer Multimodality
Executed Processes Prediction
Feed subscription Semantics analysis Analysis &
Printing
Exploration
Copying Database access Psychological Interface
analysis
Log On/Off

Infrastructure + ~ 490 Analytics


66

66

33
Use Case 11: Fraud Detection for Bank
Network Ego Net
Info Flow Features

Ponzi scheme Detection

Normal:
Attacker:
(1)Clique-like
Near-Star
(2)Two-way links

67

67

Use Case 12: Detecting Cyber Attacks


Network Ego Net
Info Flow Features

Detecting DoS
attack

68

68

34
Category 4: Operations Analysis
Cloud Service Placement
Network Server
KPIs KPIs Graph
Matching

Bayesian
Network

Varying over
KPI time series (e.g., ? time
server performance/
Causality
load, network analyzer
performance/load)
KPI (a time series)
(potential) pairwise
relationship (e.g., causality)

Graph Visualizations

Communities Graph Search Network Info Flow Bayesian Networks


Centralities Graph Query Shortest Paths Latent Net Inference

Ego Net Features Graph Matching Graph Sampling Markov Networks

Middleware and Database


69

69

Use Case 13: Smarter another Planet


Bayesian
Goal: Atmospheric Radiation Measurement (ARM) climate research
Network
facility provides 24x7 continuous field observations of cloud, aerosol
and radiative processes. Graphical models can automate the
validation with improvement efficiency and performance.

Approach: BN is built to represent the dependence among sensors


and replicated across timesteps. BN parameters are learned from
over 15 years of ARM climate data to support distributed climate
sensor validation. Inference validates sensors in the connected
instruments.

Bayesian Network
* 3 timesteps * 63 variables
*3.9 avg states * 4.0 avg
indegree
* 16,858 CPT entries
Junction Tree
* 67 cliques
* 873,064 PT entries in cliques

70

70

35
Use Case 14: Cellular Network Analytics in Telco Operation
Goal: Efficiently and uniquely identify internal state of
Cellular/Telco networks (e.g., performance and load of
network elements/links) using probes between monitors
placed at selected network elements & endhosts Network load
level report

▪ Applied Graph Analytics to telco network analytics


based on CDRs (call detail records): estimate
traffic load on CSP network with low monitoring
overhead
(1)CDRs, already collected for billing purposes, contain
information about voice/data calls
(2)Traditional NMS* and EMS** typically lack of end-to- Network topology
end visibility and topology across vendors Graph
(3)Employ graph algorithms to analyze network elements Analysis
which are not reported by the usage data from CDR
information
▪ Approach
– Cellular network comprises a hierarchy of network
elements
– Map CDR onto network topology and infer load on each
network element using graph analysis
CDR
– Estimate network load and localize potential problems

71

71

Use Case 15: Monitoring Large Cloud


Goal: Monitoring technology that can track the time-varying Network Server
state (e.g., causality relationships between KPIs) of a large KPIs KPIs
Cloud when the processing power of monitoring system cannot
keep up with the scale of the system & the rate of change
• Causality relationships (e.g., Granger causality) are crucial in
performance monitoring & root cause analysis
• Challenge: easy to test pairwise relationship, but hard to test
multi-variate relationship (e.g., a large number of KPIs)

Varying over
KPI time series Causality ? time
(e.g., server analyzer
performance/load,
network KPI (a time series)
performance/load) (potential) pairwise
relationship (e.g., causality)

Our approach: Basic analytics engine


Probabilistic (e.g., pairwise granger causality)
monitoring via
sampling & estimation Link sampling & estimation

Select KPI pairs (sampling)→ Test link existence → Estimate unsampled links based on history
50 → Overall graph

72

36
Category 5: Data Warehouse Augmentation

73

73

Use Case 16: Code Life Cycle Improvement

Graph
Graph

Graph objects
Graph objects
Convert from Convert to
Graph DB Graph DB model

Relational
Traditional (relational) model

● Advantages of working directly with graph DB for graph applications


(1) Smaller and simpler code
(2) Flexible schema ! easy schema evolution
(3) Code is easier and faster to write, debug and manage
(4) Code and Data is easier to transfer and maintain

74

74

37
Use Case 17: Smart Navigation Utilizing Real-time Road
Information
Goal: Enable unprecedented level of accuracy in traffic scheduling (for a fleet of
transportation vehicles) and navigation of individual cars utilizing the dynamic real-
time information of changing road condition and predictive analysis on the data

•Dynamic graph algorithms implemented in


System G provide highly efficient graph
query computation (e.g. shorted path
computation) on time-varying graphs (order of
magnitudes improvement over existing
solutions)

•High-throughput real-time predictive


analytics on graph makes it possible to
estimate the future traffic condition on the route
to make sure that the decision taken now is
optimal overall
Predictive results
Our approach:
Querying over Predictive analytics for graphs
dynamic graph +
Dynamic Graph query problem Query & response
predictive analytics on
graph properties
Graph store
Real-time update
75

75

Use Case 18: Graph Analysis for Image and Video Analysis

Vertex Attribute
Correspondence Transformation

ARG s ARG t
76

76

38
Use Case 19: Graph Matching for Genomic Medicine

77

77

Use Case 20: Data Curation for Enterprise Data Management

56

78

39
Use Case 21: Understanding Brain Network

79

79

Use Case 22: Planet Security


• Big Data on Large-Scale Sky Monitoring

80

80

40

You might also like