0% found this document useful (0 votes)
37 views

Prefer Embedding: Document Schema Design Cheatsheet

1) MongoDB document schemas should embed related data to optimize performance and usability. However, only embed data that is frequently accessed together, and don't embed unbounded lists. 2) Consider creating references between collections for one-to-many relationships instead of embedding when the referenced data is frequently accessed on its own. 3) Duplicating commonly accessed data across collections can improve performance but requires maintaining consistency between duplicates.

Uploaded by

Ulises Carreon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Prefer Embedding: Document Schema Design Cheatsheet

1) MongoDB document schemas should embed related data to optimize performance and usability. However, only embed data that is frequently accessed together, and don't embed unbounded lists. 2) Consider creating references between collections for one-to-many relationships instead of embedding when the referenced data is frequently accessed on its own. 3) Duplicating commonly accessed data across collections can improve performance but requires maintaining consistency between duplicates.

Uploaded by

Ulises Carreon
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 1

Document Schema

Design Cheatsheet
MongoDB’s document model provides the ultimate flexibility in
schema design. These best practices for structuring data promote
optimal usability, performance, and efficiency for growing systems.

Tip: Always analyze your application workloads, data size, and read/write load before you
dig into schema design – and only add complexity when the rewards outweigh the costs.

Prefer Embedding
Data that is accessed together should be stored together.

Use structure to organize data within a document

db.user:
{_id: “abc”, Sub-documents allow you to
email: “[email protected]”, cleanly separate sections of
preferences: {alerts:[{name: “morning”, related fields within a document,
frequency: “daily”, and allow you to make atomic
time: {h: 6, m: 0}}, …], updates without resorting to
colors: {bg: “#cccccc”, …}} multi-document transactions.
}

Include (bounded) arrays of related information

db.business:
{_id: “def”, Arrays of subdocuments
name: “Bake and Go”, allow you to include lists
addresses: [
{street: “40 Elm”, state: “NY”}, of related information,
{street: “101 Main St”, state: “VT”}] like addresses or accounts.
}

Finding the optimal degree of embedding can take practice, and


evolve with your application. Start with these rules of thumb, and
don’t be afraid to iterate; MongoDB is designed to embrace change.

Know when not to Embed


What’s used apart, should be stored apart.

Move frequently accessed embedded objects to their own collections

db.manufacturer:
{_id: “ghi”,
name: “Swaab Automotive”, If your subdocuments (or arrays of them)
type: “auto”, are objects you frequently use outside
models: [{ name: “X”, of their parent documents, consider
year: 2018, moving them to their own collection.
sku: “ABCDEF-123Z”}, …]
}

db.model:
{_id: “jkl”, Now we can access models
name: “X”, independently from their parent
year: 2018, manufacturers. This is especially useful
sku: “ABCDEF-123Z”, if we are either reading from or writing
manufacturer_id: “ghi” to the model documents at high volume.
}

Don’t embed lists that grow without bounds

db.user:
{_id: “mno”,
Any list that gets added to
username: “frankysezhey”,
continuously shouldn’t be embedded;
login_times: [{d: “1/1/2020”, t: “00:14:09”},
it should be its own collection.
{…}]
}

db.login_audit:
{_id: “pqr”, Collections are much better places
time: {d: “1/1/2020”, t: “00:14:09”}, for lists that can grow, and you can use
user_id: “mno” indexes for fast querying by related ID.
}

When in doubt, it’s usually good to start by embedding data


in objects, especially as you’re getting used to document schemas.
You can always factor sub-documents out into collections later.

Embrace Duplication
What’s used together, should be stored together: 2nd edition.

Store useful data where it’s commonly accessed

db.user:
{_id: “stu”,
email: “[email protected]
Avoid unnecessary application joins
}
by duplicating commonly accessed fields,
at the cost of some added complexity
db.post:
to keep them up to date. (AKA the
{_id: “vwx”,
“extended reference” pattern)
text: “Hello World”,
user_id: “stu”,
user_email: “[email protected]”}

Mind your data consistency

db.user:
{_id: “stu”,
email: “[email protected]
} In cases where you don’t want duplicated data
to get out of sync, it’s up to you to propagate
db.post: any changes to the canonical object. Check out
{_id: “vwx”, Change Streams as a great mechanism for this.
text: “Hello World”,
user_id: “stu”,
user_email: “[email protected]”}

Duplication, or storing the same pieces of data in multiple documents


in your database, is a powerful technique. Always know what’s
canonical, and keep your data consistency rules in mind.

Don’t be Scared to Relate


There’s more than one way to relate.

Use arrays of ids or backreferences for one-to-many relationships

db.user:
{_id: “stu”,
email: “[email protected]”,
Depending on your access patterns, you can
friend_ids: [(“aaa”, “bbb”]
retrieve related documents via sub-queries
}
from an array of keys (e.g. db.user.friend_ids) or
query against a reference (e.g. db.post.user_id).
db.post:
Application-level joins, when necessary, are
{_id: “vwx”,
nearly as high performance as in-database.
text: “Hello World”,
user_id: “stu”
}

Remember upkeep on bidirectional relationships

db.user:
{_id: “aaa”,
email: “[email protected]”, When changing references on either side
friend_ids: [“ccc”, “bbb”] of a bidirectional relationship, you need
} to remember to update the other end, too.
{_id: “bbb”, Bidirectional relationships are another
email: “[email protected]”, good use case for Change Streams.
friend_ids: [“eee”, “aaa”]
}

Just because MongoDB isn’t a traditional “relational” database


doesn’t mean it can’t do relations. By strategically denormalizing
and optimizing your usage, you can have your cake, and eat it too.

Have more questions about schema design?


Join the MongoDB community
Want to virtually meet some like-minded developers?

Join the MongoDB community developer.mongodb.com/community

You might also like