5 Graph Data Science Basics Everyone Should Know
5 Graph Data Science Basics Everyone Should Know
GRAPH DATA
SCIENCE BASICS
EVERYONE
SHOULD KNOW
5 Graph Data Science Basics Everyone Should Know
1. What is a graph?
Before you can understand graph data science, you need to understand graphs. At its most
fundamental, a graph is simply a different way of structuring data. Instead of rows and columns,
like in a traditional, relational database table or dataframe, graphs use nodes (nouns) and
relationships (verbs) as their primary structure.
In a graph, nouns – people, places, things, organizations – are nodes. The relationships between
them are verbs: friends, works for, likes, and so on.
Relationships
NS
DR
(edges/links) connect
OW
2
5 Graph Data Science Basics Everyone Should Know
Graph data science brings together graph statistics, analytics, and ML to put data in context and
answer pressing questions.
Graph statistics, queries, and visualization drive exploration and insights. Graph statistics
provide basic measures about a graph, such as the number of nodes and the distribution of
relationships. Graph queries answer any question, no matter how deep, whether 6 or 600
degrees of separation. Graph visualization empowers data experts to see their data and
explore patterns that bear further investigation.
Graph analytics builds on graph statistics by answering specific questions and gaining
insights from connections in existing or historical data. Graph queries and algorithms are
typically applied together in “recipes” during graph analytics, and the results are used
directly for analysis.
Graph-enhanced ML is the application of graph data and analytics results to train ML models
or support probabilistic decisions within an AI system. Graph statistics and analytics are
often used in conjunction to answer certain types of questions about complex systems and
the subsequent insights, applied to improve ML.
3
5 Graph Data Science Basics Everyone Should Know
Link prediction fills in the blanks in Node embedding transforms the Node classification models predict the
your data and predicts changes in topology and features of your class of nodes in your graph. A class
your graph’s structure. Link prediction graph into a low-dimensional vector can be a binary indicator, like whether
is a common machine learning task representation of each node. These a user account is engaged in fraud, or
applied to graphs: training a model to vectors, also called embeddings, can a multivalued indicator, such as which
learn where relationships should exist be used for exploratory data analysis, market segment a customer belongs
between pairs of nodes in a graph. You similarity measurements, and ML. Node to. Node classification models can be
can think of link prediction as building a embeddings can aggregate information trained to predict which class nodes
model to predict missing relationships about a node’s position in the graph, (including any new nodes) belong to.
in your dataset or relationships that are its local neighbors, its centrality and Node classification can incorporate a
likely to form in the future. With graph influence, and in some cases, other broad range of input features, including
data science, you can train supervised numeric node properties. the network structure of your graph and
ML models based on the relationships properties from your source data.
and node properties in your graph to
predict the existence – and probability –
of relationships.
Which parts of my
graph are connected to Which nodes are Where will connections What‘s the label
each other? most similar? form next? for this node?
• Centrality
• Embeddings
4
5 Graph Data Science Basics Everyone Should Know
While use cases for graph data science span industries and lines of business – from life sciences to
manufacturing – a few use cases are rapidly becoming the most popular among data scientists.
IT HR
• Network Monitoring • Training
• Cybersecurity • Upskilling & Retention
• DevOps • Promotions
5
5 Graph Data Science Basics Everyone Should Know
Customer 360
Across the globe, businesses try to better understand their customers and improve lifetime value
(LTV). With graph data science, customer knowledge can become more accurate and complete
through entity resolution. This process looks at all the database entries and identifies duplicates.
Creating a complete, master database entry for each customer instead of having multiple,
incomplete entries improves LTV and deepens customer knowledge, allowing for optimized
marketing programs and offers.
Recommendation engines
Recommendation engines became well known through Netflix and online shopping experiences.
However, recommendation engines have uses across the business. From product development
to human resources for retaining employees through upskilling training, recommendation engines
power some of the most important parts of a business.
6
5 Graph Data Science Basics Everyone Should Know
7
5 Graph Data Science Basics Everyone Should Know
Data scientists are typically the primary user of graph data science
tools because they are practitioners of data science with deep
knowledge of algorithms and models.
8
5 Graph Data Science Basics Everyone Should Know
Organizations of all sizes, all industries, and within all departments are using graph data science
to make recommendations, identify anomalies and find fraudsters, improve customer knowledge,
and optimize supply chains.
Improve Customer
Knowledge
Neo4j is the world’s leading graph data platform. We help organizations – including Comcast, ICIJ, NASA, UBS, and Questions about
Volvo Cars – capture the rich context of the real world that exists in their data to solve challenges of any size
and scale. Our customers transform their industries by curbing financial fraud and cybercrime, optimizing global Neo4j? Contact us
networks, accelerating breakthrough research, and providing better recommendations. Neo4j delivers real-time around the globe:
transaction processing, advanced AI/ML, intuitive data visualization, and more. Find us at neo4j.com and follow us
at @Neo4j.
[email protected]
© 2022 Neo4j, Inc. neo4j.com/contact-us