6 Documentdatabases
6 Documentdatabases
Databases
Document Databases
Data Modelling and Partitioning
Prof. Pietro Ducange
Some considerations
A document database could theoretically implement a third
normal form schema.
Image extracted from “Guy Harrison, Next Generation Databases, Apress, 2015”
JSON Databases: An example
Document databases usually
adopts a reduced number of
collections for modeling data.
The solution above allows the user to retrieve a film and all its actors in a single
operation.
In a complex design this could lead to issues and possibly inconsistencies if any of the
“actor” attributes need to be changed.
Moreover, some JSON databases have some limitations of the maximum dimension
of a single document.
Data Modeling: Document Linking
In the solution above, an array of actor IDs has been embedded into the
film document.
The IDs can be used to retrieve the documents of the actors (in on other
collection) who appear in a film.
The basic pattern is that the one entity in a one-to-many relation is the primary
document, and the many entities are represented as an array of embedded
documents.
Data Modelling:
Many to Many Relationships
Let consider an example of application in which:
• A student can be enrolled in many courses
• A course can have many students enrolled to it
We can model this situation considering the following two collections:
We have to take care when updating data in this kind of relationship. Indeed, the
DBMS will not control the referential integrity as in relational DBMSs.
Modeling Hierarchies (I)
We may consider to generate a new document to add to the DB for each data
transmission.
At the end of the day, the DB will include 200 new documents for each truck (we
consider 20 transmissions per hour, 10 working hours)
Data Modeling: An Example (II)
An alternative solution may be to use embedded documents as follows:
Indexes, like in book indexes, are a structured set of information that maps
from one attribute to related information.
In general, indexes are special data structures that store a small portion of the
collection’s data set in an easy to traverse form.
The index stores the value of a specific field or set of fields, ordered by the
value of the field.
The ordering of the index entries supports efficient equality matches and
range-based query operations.
An Example of Index
Read-Heavy Applications
In the figure we show the classical scheme of a read-heavy application (business
intelligence and analytics applications):
In this kind of applications, the use of several indexes allows the user to quickly
access to the database. For example, indexes can be defined for easily retrieve
documents describing objects related to a specific geographic region or to a
specific type.
Write-Heavy Applications
The example of the truck information transmission (each three minutes) is a
typical write-heavy application.
The higher the number of indexes adopted the higher the amount of time
required for closing a write operation.
Indeed, all the indexes must be updated (and created at the beginning).
Reducing the number of indexes, allow us to obtain systems with fast write
operation responses. On the other hand, we have to accept to deal with slow
read operations.
In conclusion, the number and the type of indexes to adopt must be identified as
a trade-off solution.
Transactions Processing Systems
These systems are designed for fast write operation and targeted reads, as
shown in the figure below:
Image extracted from: “Dan Sullivan, NoSQL For Mere Mortals, Addison-Wesley, 2015”
Advantages of Sharding
Examples of shard keys may be: Unique document ID, Name, Date,
such as creation date, Category or type, Geographical region.