0% found this document useful (0 votes)
11 views

Lesson 3 Unstructured Data

isp610 notes for student uitm

Uploaded by

nsyfqhamirah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
11 views

Lesson 3 Unstructured Data

isp610 notes for student uitm

Uploaded by

nsyfqhamirah
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 28

Unstructured Data

ISP610 BUSINESS DATA ANALYTICS


Prepared by: Ruhaila Maskat (PhD)

References:
Wikipedia
SearchDataManagement
3pillarglobal

October 18 1
By the end of this lesson, you should know:
• How to model a document NoSQL database.

October 18 2
NoSQL data modelling vs.
Relational modelling
• NoSQL data modeling often starts from the application-specific
queries as opposed to relational modelling:
• Relational modeling is typically driven by the structure of available data. The
main design theme is “What answers do I have?”
• NoSQL data modeling is typically driven by application-specific access
patterns, i.e. the types of queries to be supported. The main design theme
is “What questions do I have?”

October 18 3
October 18 4
October 18 5
October 18 6
Modelling techniques
• Referencing documents.
• Embedding documents.
• Denormalisation.
• Heterogeneous collection.

October 18 7
Referencing documents
• You can reference another document using the document key. This is
similar to normalisation in relational db.
• Referencing enables document databases to cache, store and retrieve
the documents independently.
• Provides better write speed/performance.
• Reading may require more round trips to the server.

October 18 8
{
Example _id : 1,
name : Ryan,
thumbnailUrl : ….,
{ shortProfile : ….
sessionId : session1 }
refer
sessionName : Document modelling, ence
speakers : [{ id: 1 },{ id: 2 }] {
} _id : 2,
name : David,
thumbnailUrl : ….,
shortProfile : ….
}
October 18 9
Referencing documents can be beneficial…
• 1-to-many relationships (unbounded).
• Many-to-many relationships.
• Related data changes with differing volatility (speed of change or
update).

October 18 10
1-to-many relationships (unbounded)

October 18 11
1-to-many relationships (unbounded)

October 18 12
Many-to-many relationships

Not efficient, requires two references.


First to speaker documents,
Second to session documents.

October 18 13
Many-to-many relationships

Reference by session Reference by speaker


More efficient, requires only one reference.
October 18 14
Related data changes with differing volatility

Lower volatility

Greater volatility

October 18 15
Related data changes with differing volatility

October 18 16
Embedding documents
• You can embed a document in another document by simply defining
an attribute to be an embedded document.
• Embedding enables document databases to cache, store and retrieve
the complex document with embedded documents as a single piece.
• Eliminates the need to retrieve two separate documents and join
them.
• Provides better read speed/performance.

October 18 17
{
Example sessionId : session1
sessionName : Document modelling,
speakers : [
{ _id : 1,
name : Ryan,
thumbnailUrl : ….,
shortProfile : ….
},
{ _id : 2,
name : David,
thumbnailUrl : ….,
shortProfile : ….
}]
October 18
} 18
Embedding can be advantageous when….
• Two data items are often queried together.
• One data item is dependent on another.
• 1:1 relationship.
• Similar volatility (speed of change or update).

October 18 19
Two data items are often queried together

October 18 20
One data item is dependent on another

Dependent
on Order

October 18 21
1:1 relationship

October 18 22
Similar volatility

Both email and


socialIds do not
change very often

October 18 23
Normalised
Query

Two
reads
are
needed

October 18 24
Denormalised

Embeds speaker into


session with summary
information.
If further information
about a speaker is
needed, only then it will
be loaded.

October 18 25
Normalisation vs. Denormalisation
• Normalised:
• Requires multiple reads.
• Doesn’t align with instances.
• Provides faster write speed.

• Denormalised:
• Requires updates in multiple places.
• Provides faster read speed.

October 18 26
Homogeneous collections
• One collection per data type.
• Speaker
• Session
• Room

• But, this would require three different queries over three different
collections.

October 18 27
Heterogeneous collections
• Multiple types in a single collection.

October 18 28

You might also like