Prefer Embedding: Document Schema Design Cheatsheet
Prefer Embedding: Document Schema Design Cheatsheet
Design Cheatsheet
MongoDB’s document model provides the ultimate flexibility in
schema design. These best practices for structuring data promote
optimal usability, performance, and efficiency for growing systems.
Tip: Always analyze your application workloads, data size, and read/write load before you
dig into schema design – and only add complexity when the rewards outweigh the costs.
Prefer Embedding
Data that is accessed together should be stored together.
db.user:
{_id: “abc”, Sub-documents allow you to
email: “[email protected]”, cleanly separate sections of
preferences: {alerts:[{name: “morning”, related fields within a document,
frequency: “daily”, and allow you to make atomic
time: {h: 6, m: 0}}, …], updates without resorting to
colors: {bg: “#cccccc”, …}} multi-document transactions.
}
db.business:
{_id: “def”, Arrays of subdocuments
name: “Bake and Go”, allow you to include lists
addresses: [
{street: “40 Elm”, state: “NY”}, of related information,
{street: “101 Main St”, state: “VT”}] like addresses or accounts.
}
db.manufacturer:
{_id: “ghi”,
name: “Swaab Automotive”, If your subdocuments (or arrays of them)
type: “auto”, are objects you frequently use outside
models: [{ name: “X”, of their parent documents, consider
year: 2018, moving them to their own collection.
sku: “ABCDEF-123Z”}, …]
}
db.model:
{_id: “jkl”, Now we can access models
name: “X”, independently from their parent
year: 2018, manufacturers. This is especially useful
sku: “ABCDEF-123Z”, if we are either reading from or writing
manufacturer_id: “ghi” to the model documents at high volume.
}
db.user:
{_id: “mno”,
Any list that gets added to
username: “frankysezhey”,
continuously shouldn’t be embedded;
login_times: [{d: “1/1/2020”, t: “00:14:09”},
it should be its own collection.
{…}]
}
db.login_audit:
{_id: “pqr”, Collections are much better places
time: {d: “1/1/2020”, t: “00:14:09”}, for lists that can grow, and you can use
user_id: “mno” indexes for fast querying by related ID.
}
Embrace Duplication
What’s used together, should be stored together: 2nd edition.
db.user:
{_id: “stu”,
email: “[email protected]”
Avoid unnecessary application joins
}
by duplicating commonly accessed fields,
at the cost of some added complexity
db.post:
to keep them up to date. (AKA the
{_id: “vwx”,
“extended reference” pattern)
text: “Hello World”,
user_id: “stu”,
user_email: “[email protected]”}
db.user:
{_id: “stu”,
email: “[email protected]”
} In cases where you don’t want duplicated data
to get out of sync, it’s up to you to propagate
db.post: any changes to the canonical object. Check out
{_id: “vwx”, Change Streams as a great mechanism for this.
text: “Hello World”,
user_id: “stu”,
user_email: “[email protected]”}
db.user:
{_id: “stu”,
email: “[email protected]”,
Depending on your access patterns, you can
friend_ids: [(“aaa”, “bbb”]
retrieve related documents via sub-queries
}
from an array of keys (e.g. db.user.friend_ids) or
query against a reference (e.g. db.post.user_id).
db.post:
Application-level joins, when necessary, are
{_id: “vwx”,
nearly as high performance as in-database.
text: “Hello World”,
user_id: “stu”
}
db.user:
{_id: “aaa”,
email: “[email protected]”, When changing references on either side
friend_ids: [“ccc”, “bbb”] of a bidirectional relationship, you need
} to remember to update the other end, too.
{_id: “bbb”, Bidirectional relationships are another
email: “[email protected]”, good use case for Change Streams.
friend_ids: [“eee”, “aaa”]
}