Lesson 3 Unstructured Data
Lesson 3 Unstructured Data
References:
Wikipedia
SearchDataManagement
3pillarglobal
October 18 1
By the end of this lesson, you should know:
• How to model a document NoSQL database.
October 18 2
NoSQL data modelling vs.
Relational modelling
• NoSQL data modeling often starts from the application-specific
queries as opposed to relational modelling:
• Relational modeling is typically driven by the structure of available data. The
main design theme is “What answers do I have?”
• NoSQL data modeling is typically driven by application-specific access
patterns, i.e. the types of queries to be supported. The main design theme
is “What questions do I have?”
October 18 3
October 18 4
October 18 5
October 18 6
Modelling techniques
• Referencing documents.
• Embedding documents.
• Denormalisation.
• Heterogeneous collection.
October 18 7
Referencing documents
• You can reference another document using the document key. This is
similar to normalisation in relational db.
• Referencing enables document databases to cache, store and retrieve
the documents independently.
• Provides better write speed/performance.
• Reading may require more round trips to the server.
October 18 8
{
Example _id : 1,
name : Ryan,
thumbnailUrl : ….,
{ shortProfile : ….
sessionId : session1 }
refer
sessionName : Document modelling, ence
speakers : [{ id: 1 },{ id: 2 }] {
} _id : 2,
name : David,
thumbnailUrl : ….,
shortProfile : ….
}
October 18 9
Referencing documents can be beneficial…
• 1-to-many relationships (unbounded).
• Many-to-many relationships.
• Related data changes with differing volatility (speed of change or
update).
October 18 10
1-to-many relationships (unbounded)
October 18 11
1-to-many relationships (unbounded)
October 18 12
Many-to-many relationships
October 18 13
Many-to-many relationships
Lower volatility
Greater volatility
October 18 15
Related data changes with differing volatility
October 18 16
Embedding documents
• You can embed a document in another document by simply defining
an attribute to be an embedded document.
• Embedding enables document databases to cache, store and retrieve
the complex document with embedded documents as a single piece.
• Eliminates the need to retrieve two separate documents and join
them.
• Provides better read speed/performance.
October 18 17
{
Example sessionId : session1
sessionName : Document modelling,
speakers : [
{ _id : 1,
name : Ryan,
thumbnailUrl : ….,
shortProfile : ….
},
{ _id : 2,
name : David,
thumbnailUrl : ….,
shortProfile : ….
}]
October 18
} 18
Embedding can be advantageous when….
• Two data items are often queried together.
• One data item is dependent on another.
• 1:1 relationship.
• Similar volatility (speed of change or update).
October 18 19
Two data items are often queried together
October 18 20
One data item is dependent on another
Dependent
on Order
October 18 21
1:1 relationship
October 18 22
Similar volatility
October 18 23
Normalised
Query
Two
reads
are
needed
October 18 24
Denormalised
October 18 25
Normalisation vs. Denormalisation
• Normalised:
• Requires multiple reads.
• Doesn’t align with instances.
• Provides faster write speed.
• Denormalised:
• Requires updates in multiple places.
• Provides faster read speed.
October 18 26
Homogeneous collections
• One collection per data type.
• Speaker
• Session
• Room
• But, this would require three different queries over three different
collections.
October 18 27
Heterogeneous collections
• Multiple types in a single collection.
October 18 28