0% found this document useful (0 votes)

2 views

neo4j_sessio11_graphDataModeling

Uploaded by

jofloru023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

2 views

neo4j_sessio11_graphDataModeling

Uploaded by

jofloru023

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 68

Neo4j

Graph Data Modeling

Departament de Ciències de la Computació
Graph Data Modeling

1. Introduction to Graph Data Modeling

2. Designing the Initial Graph Data Model

3. Graph Data Modeling Core Principles

4. Common Graph Structures

5. Refactoring and Evolving a Graph Data Model

Bases de Dades no Relacionals. Neo4j 2

What is Graph Data Modeling?

Graph data modeling is a collaborative effort by stakeholders

including developers
Stakeholders include business analysts, architects, managers,
project leaders…

The application domain is analyzed by stakeholders and developers

▪ They develop a data model
▪ Stakeholders must understand the domain and provide answers

Neo4j is a full-featured graph database

▪ It includes tools used to create property graphs
▪ It supports application access in retrieving data for business use cases by
traversing the graph

3
Bases de Dades no Relacionals. Neo4j
Neo4j Property Graph Model

• Nodes (Entities)

• Relationships

• Properties

• Labels
Graph Traversal

MATCH (r:Residence)<-[:OWNS]-(p:Person)
WHERE r.address = '475 Broad Street'
RETURN p

5
Graph Data Modeling

1. Introduction to Graph Data Modeling

2. Designing the Initial Graph Data Model

3. Graph Data Modeling Core Principles

4. Common Graph Structures

5. Refactoring and Evolving a Graph Data Model

Bases de Dades no Relacionals. Neo4j 6

Designing the Initial Data Model

1. Understand the domain

2. Create high-level sample data

3. Define specific questions for the application

4. Identify entities

5. Identify connections between entities

6. Test the questions against the model

7. Test scalability

7
Identify Entities from Questions

Entities are the nouns in the application questions:

▪ What ingredients are used in a recipe?

▪ Who is married to this person?

o The generic nouns often become labels in the model

o Use domain knowledge when deciding how to
further groupe or differentiate entities

8
Define Properties

Two purposes for properties:

1. Unique identification
2. Answering application questions
Otherwise properties are decoration (these properties should not be added)

Properties are used for:

– Anchoring (where to begin the query)
– Traversing the graph (navigation)
– Returning data from the query

9
Identify Connections Between Entities

Connections are the verbs in the application questions:

▪ What ingredients are used in a recipe?

▪ Who is married to this person?

10
Naming Relationships

▪ Stakeholders must agree upon name (type) for the relationship

▪ Avoid names that could be construed as nouns (e.g. email)

Do not do this: Instead do this:

11
Direction and Type

Direction and type are required for relationships

Select direction and type based on expected questions:

1. What episode follows ‘The Ark in Space’? (NEXT )

2. What episode came before ‘Genesis of the Daleks’? ( PREVIOUS)

12
Node Fanout
firstName: ‘Patrick’
lastName: ‘Scott’
age: 34
addr1: ‘Flat 3B’
addr2: ’83 Landor St’
city: ‘Axebridge'
postalCode: ‘DF3 0AS’

Person

addr1: ‘Flat 3B’

addr2: ’83 Landor St’ firstName: ‘Patrick’
lastName: ‘Scott’
city: ‘Axebridge'
age: 34
postalCode: ‘DF3 0AS’

Residence :LIVES_AT Person

13
How Much Node Fanout?

14
Graph Data Modeling

1. Introduction to Graph Data Modeling

2. Designing the Initial Graph Data Model

3. Graph Data Modeling Core Principles

4. Common Graph Structures

5. Refactoring and Evolving a Graph Data Model

Bases de Dades no Relacionals. Neo4j 15

Graph Modeling Core Principles

● Nodes
○ Uniqueness
○ Fanout ● Properties
● Relationships ● Data object accessibility
○ Naming best practices
○ Semantic redundancy
○ Types vs. Properties

16
Node Best Practices
Uniqueness of Nodes: Before
Notes:
▪ Country nodes are
considered super nodes
(a node with lots of fan-in
or fan-out)
▪ Be careful when using
them in a design
▪ Be aware of queries that
might select all paths in or
out of a super node

17
Node Best Practices
Uniqueness of Nodes: After

18
Complex Data
Use Fanout Judiciously for Complex Data
▪ Reduce property duplication
▪ Reduce gather-and-inspect

20
Best Practices for Modeling Relationships

Data models should address:

• Using specific relationship types

• Using types vs. properties

• Reducing symmetric relationships

Using Specific Relationship Types

22
But Not Too Specific

23
Do Not Use Symmetric Relationships

24
Semantics of Symmetry are Important

25
Using Types vs. Properties

26
Property Best Practices

▪ Property lookups have a cost

▪ Parsing a complex property adds more cost

▪ Anchors and properties used for traversal should be as simple as possible

▪ Identifiers, outputs, and decoration are OK as complex values

27
Best practices for Data Accessibility

For each query, how much work must Neo4j do to evaluate if the
traversal represents a “good” or a “bad” path?

28
Hierarchy of Accessibility
For each data object, how much work must Neo4j do to evaluate if this is a “good”
path or a “bad” one?

Most 1. Anchor node label

Anchor node properties (indexed)
accessible
Anchor
Node
Least processing required
2. Relationship type

3. Anchor node properties (non-

Downstream indexed)
Nodes
4. Downstream node labels

Least 5. Relationship properties

accessible Downstream node properties
Most processing required
Graph Data Modeling

1. Introduction to Graph Data Modeling

2. Designing the Initial Graph Data Model

3. Graph Data Modeling Core Principles

4. Common Graph Structures

5. Refactoring and Evolving a Graph Data Model

Bases de Dades no Relacionals. Neo4j 30

Common Graph Structures

● Intermediate node

● Linked list

● Timeline tree

● Multiple structures in a single model

31
Intermediate Nodes

Create intermediate nodes when you need to:

▪ Connect more than two nodes in a single context

▪ Relate something to a relationship

32
Intermediate Nodes

33
Intermediate Nodes: Sharing Context

34
Intermediate Nodes: Sharing Data

35
Intermediate Nodes: Organizing Data

36
Linked Lists

Do NOT

37
Interleaved Linked List

38
Head and Tail of Linked List
Some possible use cases:
▪ Add episodes as they are broadcast
▪ Maintain pointer to first and last episodes
▪ Find all broadcast episodes
▪ Find latest broadcast episode

39
Timeline Tree

40
Using Multiple Structures

41
Using the Timeline Tree

42
Using Intermediate Nodes

43
Using Linked Lists

44
Graph Data Modeling

1. Introduction to Graph Data Modeling

2. Designing the Initial Graph Data Model

3. Graph Data Modeling Core Principles

4. Common Graph Structures

5. Refactoring and Evolving a Graph Data Model

Bases de Dades no Relacionals. Neo4j 45

What is Refactoring?
Important: Your model depends on your data and your queries

Refactoring is the process of …

– Changing the data structure ...
– Without altering its semantic meaning

Refactoring often involves moving data from one structure to another

Sometimes refactoring involves adding additional data from other
sources
The most common type of refactoring is ...
– Restructure the graph to use a property value
– A property value is used to create a label, a node, or a relationship

46
Hierarchy of Accessibility (reminder)
For each data object, how much work must Neo4j do to evaluate if this is a “good”
path or a “bad” one?

Most 1. Anchor node label

Anchor node properties (indexed)
accessible
Anchor
Node
Least processing required
2. Relationship type

3. Anchor node properties (non-

Downstream indexed)
Nodes
4. Downstream node labels

Least 5. Relationship properties

accessible Downstream node properties
Most processing required
Why Refactor?

Data models can be optimized for:

Note: Improving
– Query performance behavior in one of
these areas
– Model simplicity & intuitiveness
frequently involves
– Query simplicity (i.e., simpler Cypher strings) sacrifices in others

– Easier data updates

Another important reason to refactor is to accommodate new

application questions in the same model

48
Goal: Eliminate Duplicate Data in Properties

49
Refactor Example: Extracting Nodes From Properties

50
Goal: Use Labels Instead of Property Values

51
Refactor Example: Turn Property Values
into Labels for Nodes

52
Goal: Use Nodes Instead of Properties for relationships

Possible dense node

53
Refactor: Extract Nodes from Relationship Properties

54
Graph Data Modeling

1. Introduction to Graph Data Modeling

2. Designing the Initial Graph Data Model

3. Graph Data Modeling Core Principles

4. Common Graph Structures

5. Refactoring and Evolving a Graph Data Model

– Example

Bases de Dades no Relacionals. Neo4j 55

Refactoring example: Modeling airline flights

Leonardo DiCaprio as Frank Abagnale in the Steven Spielberg movie “Catch Me If You Can”

Credit: Max De Marzi https://maxdemarzi.com/2015/08/26/modeling-airline-flights-in-neo4j/

56
Refactoring example: Modeling airline flights
Important: Your model depends on your data and your queries

Our data → Airports and Flights between them

Ask yourself:
• What are the entities?

• What are the connections between the entities?

• What properties do we need?

Initial Question for Our Model

▪ What flights will take me from Malmo to New York on Friday?

57
Initial Model

Question: What flights will take me from Malmo to New York on Friday?

Comment: The concept of a Flight is expressed as a relationship

The model can answer this question, so the model seems fine

58
Initial Model

Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

Comment: To find flight AY189, we need to traverse every relationship in the graph,
because it is impossible to anchor on relationships. This query is very inefficient!

59
Initial Model

Question 1: What flights will take me from Malmo to New York on Friday?

More questions:
• What if we want to connect Customers or Staff to a flight? → Not possible!
• What if a flight was rerouted to another Airport due to weather? → Not possible!

Given some of the queries we imagine for our data a flight really should be a node
60
Refactor: Create Intermediate Flight Nodes
Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

Adding Flight nodes allows to anchor on flight data, reducing traversal

61
Refactor: Create Intermediate Flight Nodes
Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

Adding Flight nodes allows to anchor on flight data, reducing traversal

Note: Airlines are required to publish flight plans 12 months in advance.

How much work must Neo4j do to answer Question 1?
• Neo4j must check every flight leaving Malmo, then consult the flight data.
Then we check which of those flights land in the desired place!
How can we elevate the flight date for better efficiency?
62
Refactor: Create AirportDay Intermediate Nodes
Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

Adding the Intermediate node AirportDay:

▪ It reduces the number of relationships in Airport nodes, since there are fewer days than flights

▪ We still need to check every AirportDay to find the right date, but the traversals are reduced

63
Refactor: Create AirportDay Intermediate Nodes
Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

Adding the Intermediate node AirportDay

If model changes, we must check if older queries are still OK. What about Q1 and Q2? All OK
But… how to reduce wasted traversal even further for DATES?

64
Possible Refactor: Change Relationship Type to Date
Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

We make date a relationship type

▪ It hardly changes the model, but performance improves. Now, we can traverse only to the
relevant AirportDay. And Q2 is unaffected.

65
Possible Refactor: Change Relationship Type to Date
Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

We make date a relationship type

Comment: Are Airport nodes necessary? If we remove them, then:
• We could remove a modest number of Airport nodes and many HAS_DAY relationships

66
Possible Refactor: Remove Airport Nodes
Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

We remove Airport nodes → it is less intuitive but more efficient

Comment: But what if no direct flight available? How to find an itinerary (connecting flights)?
It must check each flight and its destinations, and second-order destinations... → Inefficient!!

67
Refactor: Add Destination Intermediate Nodes
Question 1: What flights will take me from Malmo to New York on Friday?

Question 2: Mom is on flight AY189. When will she land?

Adding the intermediate node Destination → queries on destination are efficient

The scope of the graph grows proportionally to the number of Destinations served by an airport,
not the number of Flights. Airports have multiple flights per destination (at different times of day)

Comment: Is this refactor affecting Q2? No

AT&T Mobility LLC 2022
0% (1)
AT&T Mobility LLC 2022
1 page
Art of Scalping
100% (1)
Art of Scalping
2 pages
Stefan Armbruster Data Modelling With Graphs
No ratings yet
Stefan Armbruster Data Modelling With Graphs
64 pages
Introtoneo4jwebinar331 160331235041
No ratings yet
Introtoneo4jwebinar331 160331235041
117 pages
Lecture02 GraphDatabases Neo4J PDF
No ratings yet
Lecture02 GraphDatabases Neo4J PDF
95 pages
Neo4j Graph Data Modeling - Sample Chapter
100% (1)
Neo4j Graph Data Modeling - Sample Chapter
22 pages
ECS765P - W9 - Large-Scale Graph Processing
No ratings yet
ECS765P - W9 - Large-Scale Graph Processing
51 pages
Neo4j Graph Analytics
No ratings yet
Neo4j Graph Analytics
20 pages
NOSQL Micro Project
No ratings yet
NOSQL Micro Project
42 pages
01 20200203 Graphtouramsterdam2020keynotepresentation Rvbversion 200211094709
No ratings yet
01 20200203 Graphtouramsterdam2020keynotepresentation Rvbversion 200211094709
63 pages
Beginnerpresentation 120429104540 Phpapp01[1]
No ratings yet
Beginnerpresentation 120429104540 Phpapp01[1]
30 pages
An Introduction to Graph Data Management
No ratings yet
An Introduction to Graph Data Management
39 pages
Neo4j - Graph Database PDF
No ratings yet
Neo4j - Graph Database PDF
19 pages
Neo4j Essentials - Sample Chapter
No ratings yet
Neo4j Essentials - Sample Chapter
22 pages
8 - Graph Databases
No ratings yet
8 - Graph Databases
7 pages
Graph Databases For Beginners v3
100% (4)
Graph Databases For Beginners v3
46 pages
Graph Databases For Beginners: Bryce Merkl Sasaki, Joy Chao & Rachel Howard
100% (1)
Graph Databases For Beginners: Bryce Merkl Sasaki, Joy Chao & Rachel Howard
46 pages
9 Neo4j
No ratings yet
9 Neo4j
8 pages
Introduction To GRAPH Database
No ratings yet
Introduction To GRAPH Database
18 pages
Neo4j Fundamentals Summary
No ratings yet
Neo4j Fundamentals Summary
1 page
Neo4j PDF
No ratings yet
Neo4j PDF
30 pages
R23-IDS-Unit4-PPT_2.0
No ratings yet
R23-IDS-Unit4-PPT_2.0
38 pages
Graph Database
No ratings yet
Graph Database
4 pages
NoSQL Database Document
No ratings yet
NoSQL Database Document
5 pages
Data Modeling With Graph Databases
100% (2)
Data Modeling With Graph Databases
68 pages
neo4j
No ratings yet
neo4j
29 pages
Building Web Applications With Python and Neo4j - Sample Chapter
No ratings yet
Building Web Applications With Python and Neo4j - Sample Chapter
29 pages
Neo4j: What's A Graph Database?
No ratings yet
Neo4j: What's A Graph Database?
2 pages
Graphs & AI A Path For Data Science
No ratings yet
Graphs & AI A Path For Data Science
40 pages
Neo 4 J
100% (1)
Neo 4 J
4 pages
M11a1 Final
No ratings yet
M11a1 Final
7 pages
ADO Lecture IX 2023-25
No ratings yet
ADO Lecture IX 2023-25
44 pages
Property Graphs: Neo4j and Cypher
No ratings yet
Property Graphs: Neo4j and Cypher
110 pages
Learning Graph DB in one night — Neo4j _ by Prashant Mudgal _ Towards Data Science
No ratings yet
Learning Graph DB in one night — Neo4j _ by Prashant Mudgal _ Towards Data Science
20 pages
DBMS Unit4
No ratings yet
DBMS Unit4
28 pages
10 Class 2016 Partii (Read-Only)
No ratings yet
10 Class 2016 Partii (Read-Only)
23 pages
Tomaž Bratanic - Graph Algorithms For Data Science - With Examples in Neo4j-Manning Publications (2024)
No ratings yet
Tomaž Bratanic - Graph Algorithms For Data Science - With Examples in Neo4j-Manning Publications (2024)
10 pages
Experiment No. 8: 1. Aim: 2. Objectives
No ratings yet
Experiment No. 8: 1. Aim: 2. Objectives
3 pages
Best of Both Worlds - Combine KG and Vector Search For Enhanced RAG - Neo4j
No ratings yet
Best of Both Worlds - Combine KG and Vector Search For Enhanced RAG - Neo4j
40 pages
SQL 7
No ratings yet
SQL 7
18 pages
nosql_module5
No ratings yet
nosql_module5
8 pages
Instant Access to Graph Algorithms for Data Science: With examples in Neo4j 1st Edition Tomaž Bratanic ebook Full Chapters
100% (2)
Instant Access to Graph Algorithms for Data Science: With examples in Neo4j 1st Edition Tomaž Bratanic ebook Full Chapters
48 pages
Neo4j - Data Model
No ratings yet
Neo4j - Data Model
2 pages
Graph Databases
No ratings yet
Graph Databases
24 pages
Online AppQ HR Q1-Q30
No ratings yet
Online AppQ HR Q1-Q30
30 pages
NoSQL Overview Examples
No ratings yet
NoSQL Overview Examples
15 pages
Implement - Graph Databases
No ratings yet
Implement - Graph Databases
40 pages
Neo4j Manual PDF
No ratings yet
Neo4j Manual PDF
334 pages
Learning Guide 2: Nosql and Newsql: Cloud Computing Databases
No ratings yet
Learning Guide 2: Nosql and Newsql: Cloud Computing Databases
23 pages
Neo4j Use Case Social
No ratings yet
Neo4j Use Case Social
3 pages
LPG - Neo4j GDS Presentation - 2020
No ratings yet
LPG - Neo4j GDS Presentation - 2020
53 pages
Neo4j Manual
50% (2)
Neo4j Manual
529 pages
09 - Introduction to Graph Data Model
No ratings yet
09 - Introduction to Graph Data Model
22 pages
2011 Webber-A Programmatic Introduction To Neo4j
No ratings yet
2011 Webber-A Programmatic Introduction To Neo4j
66 pages
Where Can Buy Graph Algorithms For Data Science: With Examples in Neo4j 1st Edition Tomaž Bratanic Ebook With Cheap Price
100% (4)
Where Can Buy Graph Algorithms For Data Science: With Examples in Neo4j 1st Edition Tomaž Bratanic Ebook With Cheap Price
57 pages
NOSQL_MOD5
No ratings yet
NOSQL_MOD5
12 pages
Chapter 4
No ratings yet
Chapter 4
60 pages
No SQL
No ratings yet
No SQL
16 pages
Course Intro
No ratings yet
Course Intro
7 pages
EUC1502 Module5 Big-Data
No ratings yet
EUC1502 Module5 Big-Data
46 pages
Building a Product Master
From Everand
Building a Product Master
Edufdev
No ratings yet
Data Structures and Algorithms with Python
From Everand
Data Structures and Algorithms with Python
Aadinath Pothuvaal
No ratings yet
HRM 401 Ass 1
No ratings yet
HRM 401 Ass 1
1 page
Create Nonlabor Costs: Cannot Insert Record More Than Once Row Inserted Successfully
No ratings yet
Create Nonlabor Costs: Cannot Insert Record More Than Once Row Inserted Successfully
13 pages
Personal Code of Ethics Essay
100% (2)
Personal Code of Ethics Essay
5 pages
Introduce Yourself Essay
100% (2)
Introduce Yourself Essay
4 pages
Cryptocurrency- A comprehensive guide for beginners
No ratings yet
Cryptocurrency- A comprehensive guide for beginners
34 pages
Eleni Ghawarya
No ratings yet
Eleni Ghawarya
25 pages
Loan Agreement: Parties Party
No ratings yet
Loan Agreement: Parties Party
5 pages
Fundamentals of Abm 2.2
No ratings yet
Fundamentals of Abm 2.2
6 pages
notes on entrepreneurship
No ratings yet
notes on entrepreneurship
7 pages
Quality Audit Template
No ratings yet
Quality Audit Template
8 pages
RISK IN Fidic
No ratings yet
RISK IN Fidic
53 pages
Land Trading New PDF
100% (1)
Land Trading New PDF
2 pages
Ad Plan
No ratings yet
Ad Plan
11 pages
CE 332 Recitation 2024 Presentation
No ratings yet
CE 332 Recitation 2024 Presentation
20 pages
Robles v. CA, G.R No. 123509, March 14, 2000
No ratings yet
Robles v. CA, G.R No. 123509, March 14, 2000
2 pages
2.5.8 Practice_ Comparing Reform_ Progressives versus the New Deal (1)
No ratings yet
2.5.8 Practice_ Comparing Reform_ Progressives versus the New Deal (1)
3 pages
The Paradox of Managing Autonomy and Control An Exploratory Study
No ratings yet
The Paradox of Managing Autonomy and Control An Exploratory Study
14 pages
Steps in Developing A Business Idea
No ratings yet
Steps in Developing A Business Idea
2 pages
Vaishnavi Vaishu Ganesh
No ratings yet
Vaishnavi Vaishu Ganesh
2 pages
Master Direction DNBR - PD.007 - 03.10.119 - 2016-17 Updated 31 - 05 - 2018
No ratings yet
Master Direction DNBR - PD.007 - 03.10.119 - 2016-17 Updated 31 - 05 - 2018
246 pages
Quartic - Ai Intelligent Asset Performance Management (APM) Solution
No ratings yet
Quartic - Ai Intelligent Asset Performance Management (APM) Solution
3 pages
BSBPMG531 Simulation Pack (Case Study)
No ratings yet
BSBPMG531 Simulation Pack (Case Study)
5 pages
Lesson Plan 1-Management
No ratings yet
Lesson Plan 1-Management
3 pages
46 Dec 2019 - Ingram Micro India PVT LTD
No ratings yet
46 Dec 2019 - Ingram Micro India PVT LTD
1 page
What Would You Do?: Eli Lilly Headquarters, Indianapolis, Indiana
No ratings yet
What Would You Do?: Eli Lilly Headquarters, Indianapolis, Indiana
1 page
Corporate Governance: "Satyam Vada Dharmam Chara"
No ratings yet
Corporate Governance: "Satyam Vada Dharmam Chara"
26 pages
SOP For APY
No ratings yet
SOP For APY
11 pages
MayaSavings_SoA_623eac54aa004e72ac27401da0697a51_2025FEB
No ratings yet
MayaSavings_SoA_623eac54aa004e72ac27401da0697a51_2025FEB
2 pages