neo4j_sessio11_graphDataModeling
neo4j_sessio11_graphDataModeling
3
Bases de Dades no Relacionals. Neo4j
Neo4j Property Graph Model
• Nodes (Entities)
• Relationships
• Properties
• Labels
Graph Traversal
MATCH (r:Residence)<-[:OWNS]-(p:Person)
WHERE r.address = '475 Broad Street'
RETURN p
5
Graph Data Modeling
4. Identify entities
7. Test scalability
7
Identify Entities from Questions
8
Define Properties
9
Identify Connections Between Entities
10
Naming Relationships
11
Direction and Type
12
Node Fanout
firstName: ‘Patrick’
lastName: ‘Scott’
age: 34
addr1: ‘Flat 3B’
addr2: ’83 Landor St’
city: ‘Axebridge'
postalCode: ‘DF3 0AS’
Person
13
How Much Node Fanout?
14
Graph Data Modeling
● Nodes
○ Uniqueness
○ Fanout ● Properties
● Relationships ● Data object accessibility
○ Naming best practices
○ Semantic redundancy
○ Types vs. Properties
16
Node Best Practices
Uniqueness of Nodes: Before
Notes:
▪ Country nodes are
considered super nodes
(a node with lots of fan-in
or fan-out)
▪ Be careful when using
them in a design
▪ Be aware of queries that
might select all paths in or
out of a super node
17
Node Best Practices
Uniqueness of Nodes: After
18
Complex Data
Use Fanout Judiciously for Complex Data
▪ Reduce property duplication
▪ Reduce gather-and-inspect
20
Best Practices for Modeling Relationships
22
But Not Too Specific
23
Do Not Use Symmetric Relationships
24
Semantics of Symmetry are Important
25
Using Types vs. Properties
26
Property Best Practices
27
Best practices for Data Accessibility
For each query, how much work must Neo4j do to evaluate if the
traversal represents a “good” or a “bad” path?
28
Hierarchy of Accessibility
For each data object, how much work must Neo4j do to evaluate if this is a “good”
path or a “bad” one?
● Intermediate node
● Linked list
● Timeline tree
31
Intermediate Nodes
32
Intermediate Nodes
33
Intermediate Nodes: Sharing Context
34
Intermediate Nodes: Sharing Data
35
Intermediate Nodes: Organizing Data
36
Linked Lists
Do NOT
37
Interleaved Linked List
38
Head and Tail of Linked List
Some possible use cases:
▪ Add episodes as they are broadcast
▪ Maintain pointer to first and last episodes
▪ Find all broadcast episodes
▪ Find latest broadcast episode
39
Timeline Tree
40
Using Multiple Structures
41
Using the Timeline Tree
42
Using Intermediate Nodes
43
Using Linked Lists
44
Graph Data Modeling
46
Hierarchy of Accessibility (reminder)
For each data object, how much work must Neo4j do to evaluate if this is a “good”
path or a “bad” one?
48
Goal: Eliminate Duplicate Data in Properties
49
Refactor Example: Extracting Nodes From Properties
50
Goal: Use Labels Instead of Property Values
51
Refactor Example: Turn Property Values
into Labels for Nodes
52
Goal: Use Nodes Instead of Properties for relationships
53
Refactor: Extract Nodes from Relationship Properties
54
Graph Data Modeling
– Example
Leonardo DiCaprio as Frank Abagnale in the Steven Spielberg movie “Catch Me If You Can”
56
Refactoring example: Modeling airline flights
Important: Your model depends on your data and your queries
57
Initial Model
Question: What flights will take me from Malmo to New York on Friday?
The model can answer this question, so the model seems fine
58
Initial Model
Question 1: What flights will take me from Malmo to New York on Friday?
Comment: To find flight AY189, we need to traverse every relationship in the graph,
because it is impossible to anchor on relationships. This query is very inefficient!
59
Initial Model
Question 1: What flights will take me from Malmo to New York on Friday?
More questions:
• What if we want to connect Customers or Staff to a flight? → Not possible!
• What if a flight was rerouted to another Airport due to weather? → Not possible!
Given some of the queries we imagine for our data a flight really should be a node
60
Refactor: Create Intermediate Flight Nodes
Question 1: What flights will take me from Malmo to New York on Friday?
61
Refactor: Create Intermediate Flight Nodes
Question 1: What flights will take me from Malmo to New York on Friday?
▪ We still need to check every AirportDay to find the right date, but the traversals are reduced
63
Refactor: Create AirportDay Intermediate Nodes
Question 1: What flights will take me from Malmo to New York on Friday?
If model changes, we must check if older queries are still OK. What about Q1 and Q2? All OK
But… how to reduce wasted traversal even further for DATES?
64
Possible Refactor: Change Relationship Type to Date
Question 1: What flights will take me from Malmo to New York on Friday?
65
Possible Refactor: Change Relationship Type to Date
Question 1: What flights will take me from Malmo to New York on Friday?
66
Possible Refactor: Remove Airport Nodes
Question 1: What flights will take me from Malmo to New York on Friday?
67
Refactor: Add Destination Intermediate Nodes
Question 1: What flights will take me from Malmo to New York on Friday?
The scope of the graph grows proportionally to the number of Destinations served by an airport,
not the number of Flights. Airports have multiple flights per destination (at different times of day)
68