0% found this document useful (0 votes)
22 views

Graph Data Science - Vipin Kumar

This document discusses graph data science and analysis. It describes what graphs are from a mathematical perspective and how they can be used to represent complex systems and relationships within data. The document outlines typical graph analysis workflows and discusses various graph algorithms and metrics that are commonly used, such as centrality measures, pattern detection, and community detection.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Graph Data Science - Vipin Kumar

This document discusses graph data science and analysis. It describes what graphs are from a mathematical perspective and how they can be used to represent complex systems and relationships within data. The document outlines typical graph analysis workflows and discusses various graph algorithms and metrics that are commonly used, such as centrality measures, pattern detection, and community detection.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 17

Graph Data Science

- VIPIN KUMAR
About Me –
Vipin Kumar ~ 14 years in Data Science
Origins of Graph Theory
Networks are a representation of complex systems and the complex connections.
By analyzing the structure of this representation, you can answer questions and make
predictions about how the system works or how individuals behave within it.
Graphs are a mathematical representation of complex systems.
Graphs have a history dating back to 1736. The origins of graph theory hail from the city
of Königsberg, which included two large islands connected to each other and the two
mainland portions of the city by seven bridges.
The puzzle was to create a walk through the city,
crossing each bridge once and only once.
Leonhard Euler solved that puzzle by asking
whether it was possible to visit all four areas of a
city connected by seven bridges,
while only crossing each bridge once.
It wasn’t.
What constitutes a Graph ?

Audience Question
Key Components of Graphs /
Representations
Nodes / Entities

Edges / Relationships

Properties of Nodes

Attributes of Edges – can include weights

The structure you set up, guides and limits the questions you ask of the data (Representations)
Some Types of Graphs
Game of Thrones Graph

Social Media Graph

Similarly – Telephone Networks, Global Roadways.. etc


Some Types of Graphs
Graph Data Science
The point of graph data science is to leverage relationships in
data. Most data scientists work with data in tabular formats.
However, to get better insights, to answer questions you can’t
answer without leveraging connections graph is key.
Panama Papers analysis by ICIJ
1. ICIJ, we are a network of around 200 journalists in more than 65 countries that work together
to do cross-border investigations and issues of global concern that speak about systemic
problems that are happening in society.
2. Around 2.6 TB of data of offshore accounts by corporates and individuals, and tax evasion
commitments between lawyers and corporates.
3. Impossible to analyze in the traditional sense, and
how does collaboration happen.
Typical Analysis Flow (Documents to
info)
1.Acquire documents
2.Classify documents
a. Scan / OCR
b. Extract document metadata

3.Whiteboard domain
a. Determine entities and their relationships
b. Determine potential entity and relationship properties
c. Determine sources for those entities and their properties

4.Work out analyzers, rules, parsers and named entity recognition for documents
5.Parse and store document metadata and document and entity relationships
a. Parse by author, named entities, dates, sources and classification

6.Infer entity relationships


7.Compute similarities, transitive cover and triangles
8.Analyze data using graph queries and visualizations
Graph Data Model (Iterative Process)
Data modeling is a journey into lands of the unknown.
A journey of exploration, discovery, and mapping.
Every capable discoverer knows that there are certain things you must be cognizant -

The priorities are :


1) Structure – Nodes/ Relationships / Refactoring or not
2) Content – Properties/ Attributes
3) Purpose

Applications – Social Networks, Drug Discoveries, Financial Fraud detection, Insurance Fraud
Detection, Anomaly Detection
What would you ask of a Graph ?
Audience Question

How would you analyze a graph ?


What can a Graph Answer
Data scientists try to tackle many types of questions when using GDS to evaluate interdependencies, infer meaning,
and predict behavior. At the most abstract level, these questions fall into a few broad areas:
Graph Queries
1. How do things travel (move) through a network? Understanding how things move through a network involves deep path
analysis to find propagation pathways, such as the route of diseases or network failures. It can also be used to optimize for the best
possible route or for flow constraints.
2. What are the most influential points? Identifying influencers involves uncovering the structurally well-placed nodes that
represent the control points in a network. These influencers can act as fast dissemination points, bridges between less connected
groups, or bottlenecks. Influencers can accelerate or slow the flow of items through networks from finances to opinions. The
concept of highly connected and influential nodes in a graph is referred to as centrality. Centrality algorithms are essential for
understanding influence in a network.
3. What are the groups and interactions? Detecting communities requires grouping and partitioning nodes based on the number
and strength of interactions. This method is the primary way to presume group affinity, although neighbor likeness can also be a
factor. Link prediction is about inferring future (or unseen) connections based on network structure. Heuristic Link Prediction
algorithms are often used to predict behavior. In addition to community detection algorithms, similarity algorithms are also used
to understand groupings.
4. What patterns are significant? Uncovering network patterns reveals similarities and can also be used for general exploration.
For example, you may look for a known relationship pattern between a few nodes or compare attributes of all your nodes to find
similarities. Or perhaps you want to evaluate the entire structure of a network, with its intricate hierarchies, to correlate patterns
to certain social behavior to investigate. Aggregating related but ambiguous information in large datasets is a common activity that
relies on finding similar and related information. Finding patterns may employ simple queries or various types of algorithms.
Graph Algorithms
PageRank
Eigenvector Centrality
Connected Components Minimum Spanning Tree
Centrality
•Degree centrality: node with a higher degree has higher centrality
•Eigenvector centrality: adding to the degree of one node, the centralities of neighbor
nodes are considered. As a result, the eigenvector corresponding to the highest
eigenvalue of the adjacency matrix represents the centrality of nodes in the network.
(In pagerank – important sites sending a link to one website, counts more).
•Betweenness centrality: the number of paths between two nodes that go through the i-
th node is considered as the i-th node’s betweenness centrality.
•Closeness centrality: the length of the path from the i-th node to other nodes in the
network is considered as the i-th node’s closeness centrality. With this definition, for
example, this centrality can be applied in the task of defining a suitable evacuation site
in a city.
Ad-hoc Topics
Are all graphs created equal ?
Social Network Analysis.
- Six Degrees of Freedom

Suggested Readings –
https://www.cs.cornell.edu/home/kleinber/networks-book/
http://networksciencebook.com/

You might also like