Data-Modeling
Data-Modeling
Data Modeling
Is just as important with relational data!
R M
O Normalize everything
SQL
NoSQL
Embed as 1 piece
Contoso Restaurant Menu
{
"ID": 1,
"ItemName": "hamburger", Non-relational
"ItemDescription": "cheeseburger, no cheese", modeling – each menu
“CategoryId": 5, item is a self-contained
"Category": "sandwiches" document
"CategoryDescription": "2 pieces of bread + filling"
}
Number 1 question…
“Where are my joins?”
But! We can model our data in a way to get the same functionality of a join,
without the tradeoff
Modeling Challenges
{
"id": "Order1",
"customer": "Customer1",
"orderDate": "2018-09-26",
"itemsOrdered": [
{"ID": 1, "ItemName": "hamburger", "Price":9.50, "Qty": 1}
{"ID": 2, "ItemName": "cheeseburger", "Price":9.50, "Qty": 499}
]
}
All customers have email, phone, loyalty number for 1:1 relationship
When To Embed #4, #5
Similar rate of updates – does the data change at the same (slower)
pace? -> Minimize writes
1:few relationships
{
"id": "1",
"name": "Alice",
"email": "[email protected]", //Email, addresses don’t change too often
"addresses": [
{"street": "1 Contoso Way", "city": "Seattle"},
{"street": "15 Fabrikam Lane", "city": "Orlando"}
]
}
When To Embed - Summary
• Data from entities is queried together
• Child data is dependent on a parent
• 1:1 relationship
• Similar rate of updates – does the data change at the same pace
• 1:few – the set of values is bounded
{
"menuID": 1,
"menuName": "Lunch menu",
"items": [
{"ID": 1, "ItemName": "hamburger", "ItemDescription":...}
{"ID": 2, "ItemName": "cheeseburger", "ItemDescription":...}
]
}
Reference
{
"menuID": 1,
"menuName": "Lunch menu",
{"ID": 1, "ItemName": “hamburger", "ItemDescription":...}
"items": [
{"ID": 2, "ItemName": “cheeseburger", "ItemDescription":...}
{"ID": 1}
{"ID": 2}
]
}
When To Reference #1
1 : many (unbounded relationship)
{ Embedding doesn’t make sense:
"id": "1", - Too many writes to same
"name": "Alice", document
"email": "[email protected]",
"Orders": [
{
- 2MB document limit
"id": "Order1",
"orderDate": "2018-09-18",
"itemsOrdered": [
{"ID": 1, "ItemName": "hamburger", "Price":9.50, "Qty": 1}
{"ID": 2, "ItemName": "cheeseburger", "Price":9.50, "Qty": 499}]
},
...
{
"id": "OrderNfinity",
"orderDate": "2018-09-20",
"itemsOrdered": [
{"ID": 1, "ItemName": "hamburger", "Price":9.50, "Qty": 1}]
}]
}
When To Reference #2
Data changes at different rates #2
Ability to query across multiple entity types with a single network request.
{
{ "id": "Ralph",
"id": "Andrew", "type": "Cat",
"type": "Person", "familyId": "Liu",
"familyId": "Liu", "fur": {
"worksOn": "Azure Cosmos DB" "length": "short",
} "color": "brown"
}
}
Approach- Introduce Type Property
Ability to query across multiple entity types with a single network request.
{
{ "id": "Ralph",
"id": "Andrew", "type": “Cat",
"type": "Person", "familyId": "Liu",
"familyId": "Liu", "fur": {
"worksOn": "Azure Cosmos DB" "length": "short",
} "color": "brown"
}
}
We can query both types of documents without needing a JOIN simply by running a query without a filter on type:
SELECT * FROM c WHERE c.familyId = "Liu"
Handle any data with no schema or indexing required
{
"locations": [
{
"country": "Germany",
"city": "Berlin" locations headquarter exports
},
{
"country": "France", 0 1 Belgium 0 1
"city": "Paris"
} country city country city city city
],
"headquarter": "Belgium",
"exports": [ Germany Berlin France Paris Moscow Athens
{ "city": "Moscow" },
{ "city": "Athens" }
]
}
Indexing JSON Documents
{
"locations": [
{
locations headquarter exports
"country": "Germany",
"city": "Bonn",
"revenue": 200 0 Italy 0 1
}
],
"headquarter": "Italy", country city revenue city dealers city
"exports": [
{
Germany Bonn 200 Berlin 0
"city": "Berlin",
"dealers": [
{ "name": "Hans" } name
]
},
Hans
{ "city": "Athens" }
]
}
Indexing JSON Documents
+
0 1 Belgium 0 1 0 Italy 0 1
country city country city city city country city revenue city dealers city
Germany Berlin France Paris Moscow Athens Germany Bonn 200 Berlin 0 Athens
name
Hans
Inverted Index
{1, 2}
{1 {1
{1, 2} 0 1 Belgium {2} Italy {1, 2} 0 {1, 2} 1
} }
{1, 2} country {1, 2} city {1, 2} revenue {1, 2} country {1, 2} city {1, 2} city {2} dealers {1, 2} city
{1 {1 {1 {1
} Berlin } France } Paris } Moscow
{1, 2} Germany
{2} Bonn {2} 200 {2} Berlin {2} 0 {2} Athens
{2} name
{2} Hans
{
"indexingMode": "consistent",
Equality queries:
SELECT * FROM container c WHERE c.property = 'value’
Range queries:
SELECT * FROM container c WHERE c.property > 'value' (works for >, <, >=,
<=, !=)
ORDER BY queries:
SELECT * FROM container c ORDER BY c.property
JOIN queries:
SELECT child FROM container c JOIN child IN c.properties WHERE child =
'value'
Spatial Indexes
These must be added and are needed for geospatial queries:
v1 v2
Policy Policy
t0 t1
Collection
Best Practices