advanced-developer-student-workbook
advanced-developer-student-workbook
MongoDB, Inc.
Contents
1 Advanced Schema Design 2
1.1 Case Study: Time Series Data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.2 Case Study: Content Management System . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.3 Case Study: Social Network . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.4 Case Study: Shopping Cart . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.5 Lab: Data Model for an E-Commerce Site . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2 Application Engineering 22
2.1 MongoMart Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 22
2.2 Java Driver Labs (MongoMart) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 23
2.3 Python Driver Labs (MongoMart) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1
1 Advanced Schema Design
Case Study: Time Series Data (page 2) Case Study: Time Series Data
Case Study: Content Management System (page 5) Case Study: CMS
Case Study: Social Network (page 10) Case Study: Social Network
Case Study: Shopping Cart (page 15) Case Study: Shopping Cart
Lab: Data Model for an E-Commerce Site (page 20) Schema design group exercise
Learning Objectives
RDBMS row for client “1234”, recording 50k database operations, at 2015-05-29 (23:06:37):
2
Translating the Relational Design to MongoDB Documents
RDBMS Row for client “1234”, recording 50k database operations, at 2015-05-29 (23:06:37):
{
"clientid": 1234,
"metric": "op_counter",
"value": 50000,
"timestamp": ISODate("2015-05-29T23:06:37.000Z")
}
{
"clientid" : 1234,
"timestamp": ISODate("2015-05-29T23:06:00.000Z"),
"metric": "op_counter",
"values": {
0: 0,
...
37: 50000,
...
59: 2000000
}
}
Performing Updates
Update the exact minute in the hour where the op_counter was recorded:
> db.metrics_by_minute.updateOne( {
"clientid" : 1234,
"timestamp": ISODate("2015-05-29T23:06:00.000Z"),
"metric": "op_counter"},
{ $set : { "values.37" : 50000 } })
3
Performing Updates By Incrementing Counters
Increment the counter for the exact minute in the hour where the op_counter metric was recorded:
> db.metrics_by_minute.updateOne( {
"clientid" : 1234,
"timestamp": ISODate("2015-05-29T23:06:00.000Z"),
"metric": "insert"},
{ $inc : { "values.37" : 50000 } })
Metrics with 1 minute granularity for the past 24 hours (24 documents):
> db.metrics_by_minute.find( {
"clientid" : 1234,
"metric": "insert"})
.sort ({ "timestamp" : -1 })
.limit(24)
With one minute granularity, we can record a day’s worth of data and update it efficiently with the following structure
(values.<HOUR_IN_DAY>.<MINUTE_IN_HOUR>):
{
"clientid" : 1234,
"timestamp": ISODate("2015-05-29T00:00:00.000Z"),
"metric": "insert",
"values": {
"0": { 0: 123, 1: 345, ..., 59: 123},
...
"23": { 0: 123, 1: 345, ..., 59: 123}
}
}
Considerations
4
Class Exercise
Look through some charts in MongoDB’s Cloud Manager, how would you represent the schema for those charts,
considering:
• 1 minute granularity for 48 hours
• 5 minute granularity for 48 hours
• 1 hour granularity for 2 months
• 1 day granularity forever
• Expiring data
• Rolling up data
• Queries for charts
Learning Objectives
There are many tables for this example, with multiple queries required for every page load.
Potential tables
• article
• author
• comment
• tag
• link_article_tag
• link_article_article (related articles)
• etc.
1 http://nytimes.com
2 http://cnn.com
3 http://huffingtonpost.com
4 http://huffingtonpost.com
5
Building a CMS in MongoDB
{
"_id" : 334456,
"slug" : "/apple-reports-second-quarter-revenue",
"headline" : "Apple Reported Second Quarter Revenue Today",
"date" : ISODate("2015-03-24T22:35:21.908Z"),
"author" : {
"name" : "Bob Walker",
"title" : "Lead Business Editor"
},
"copy" : "Apple beat Wall St expectations by reporting ...",
"tags" : [
"AAPL", "Earnings", "Cupertino"
],
"comments" : [
{ "name" : "Frank", "comment" : "Great Story", "date" : ISODate(...) },
{ "name" : "Wendy", "comment" : "+1", "date" : ISODate(...) }
]
}
• Relational design will provide more efficient writes for some data.
• MongoDB design will provide efficient reads for common query patterns.
• A typical CMS may see 1000 reads (or more) for every article created (write).
6
Optimizations
• Optimizing comments
– What happens when an article has one million comments?
• Include more information associated with each tag
• Include stock price information with each article
• Fields specific to an article type
Changes:
• Include only the last N comments in the “main” document.
• Put all other comments into a separate collection
– One document per comment
Considerations:
• How many comments are shown on the first page of an article?
– This example assumes 10.
• What percentage of users click to read more comments?
{
"_id" : 334456,
"slug" : "/apple-reports-second-quarter-revenue",
"headline" : "Apple Reported Second Quarter Revenue Today",
...
"last_10_comments" : [
{ "name" : "Frank", "comment" : "Great Story", "date" : ISODate() },
{ "name" : "Wendy",
"comment" : "When can I buy an Apple Watch?",
"date" : ISODate() }
]
}
7
Optimizing Comments Option 1
Considerations:
• Adding a new comment requires writing to two collections
• If the 2nd write fails, that’s a problem.
> db.blog.updateOne(
{ "_id" : 334456 },
{ $push: {
"comments": {
$each: [ {
"name" : "Frank",
"comment" : "Great Story",
"date" : ISODate()
} ],
$sort: { date: -1 },
$slice: 10 } } } )
> db.comments.insertOne( { "article_id" : 334456, name" : "Frank",
"comment" : "Great Story", "date" : ISODate() })
Changes:
• Use a separate collection for comments, one document per comment.
Considerations:
• Now every page load will require at least 2 queries
• But adding new comments is less expensive than for Option 1.
– And adding a new comment is an atomic operation
Changes:
• Make each tag a document with multiple fields.
{
"_id" : "/apple-reports-second-quarter-revenue",
...
"tags" : [
{ "type" : "ticker", "label" : "AAPL" },
{ "type" : "financials", "label" : "Earnings" },
{ "type" : "location", "label" : "Cupertino" }
]
}
8
Include More Information With Each Tag
Considerations:
• $elemMatch is now important for queries
> db.article.find( {
"tags" : {
"$elemMatch" : {
"type" : "financials",
"label" : "Earnings"
}
}
} )
Change:
• Fields specific to an article are added to the document.
{
"_id" : 334456,
...
"executive_profile" : {
"name" : "Tim Cook",
"age" : 54,
"hometown" : {
"city" : "Mobile",
"state" : "AL"
},
"photo_url" : "http://..."
}
}
9
Class Exercise 1
Design a CMS similar to the above example, but with the following additional requirements:
• Articles may be in one of three states: “draft”, “copy edit”, “final”
• History of articles as they move between states must be captured, as well as comments by the person moving
the article to a different state
• Within each state, every article must be versioned. If there is a problem, the editor can quickly revert to a
previous version.
Class Exercise 2
• Consult NYTimes, CNN, and huff post for some ideas about other types of views we might want to support.
• How would we support these views?
• Would we require other document types?
Learning Objectives
10
Design Considerations
User Relationships
db.users.find()
{
"_id" : "bigbird",
"fullname" : "Big Bird",
"followers" : [ "oscar", "elmo"],
"following" : [ "elmo", "bert"],
...
}
User Relationships
* Embedding a “followers” array would literally break the app: documents are limited to 16 MB.
– Different types of relationships may have different fields and requirements.
User Relationships
> db.followers.find()
{ "_id" : ObjectId(), "user" : "bigbird", "following" : "elmo" }
{ "_id" : ObjectId(), "user" : "bigbird", "following" : "bert" }
{ "_id" : ObjectId(), "user" : "oscar", "following" : "bigbird" }
{ "_id" : ObjectId(), "user" : "elmo”", "following" : "bigbird" }
11
Improving User Relationships
> db.followers.find()
{
"_id" : ObjectId(),
"user" : "bigbird",
"following" : "elmo",
"group" : "work",
"follow_start_date" : ISODate("2015-05-19T06:01:17.171Z")
}
> db.users.find()
{
"_id" : "bigbird",
"fullname" : "Big Bird",
"followers" : 2,
"following" : 2,
...
}
12
User Relationships
Two options:
• Fanout on Read
• Fanout on Write
Fanout on Read
Fanout on Write
• Modify every users timeline when a new post or activity is created by a person they follow
• Extremely fast page loads
• Optimized for case where there are far less posts than feed views
• Scales better for large systems than fanout on read
• Feed updates can be performed asynchronously
13
Fanout on Write
Fanout on Write
• What happens when Cookie Monster creates a new post for his 1 million followers?
• What happens when posts are edited or updated?
14
Fanout on Write Considerations
Fanout on Write
Class Exercise
Learning Objectives
15
Shopping Cart Requirements
• Shopping cart size will stay relatively small (less than 100 items in most cases)
• Expire the shopping cart after 20 minutes of inactivity
• One simple document per cart (note: optimization for large carts below)
• Sharding to partition workloads during high traffic periods
• Dynamic schema for specific styles/values of an item in a cart (e.g. “Red Sweater”, “17 Inch MacBook Pro
20GB RAM”)
{
"_id": ObjectId("55932ef370c32e23e6552ced"),
"userid": 1234,
"last_activity": ISODate(...),
"status" : "active",
"items" : [
{
"itemid": 4567,
"title": "Milk",
"price": 5.00,
"quantity": 1,
"img_url": "milk.jpg"
},
{
"itemid": 8910,
"title": "Eggs",
"price": 3.00,
"quantity": 1,
"img_url": "eggs.jpg"
} ]
}
• Denormalize item information we need for displaying the cart: item name, image, price, etc.
• Denormalizing item information saves an additional query to the item collection
• Use the “last_activity” field for determining when to expire carts
• All operations to the “cart” document are atomic, e.g. adding/removing items, or changing the cart status to
“processing”
16
Add an Item to a User’s Cart
db.cart.updateOne({
"_id": ObjectId("55932ef370c32e23e6552ced")
}, {
$push : {
"items" : {
"itemid": 1357,
"title": "Bread",
"price": 2.00,
"quantity": 1,
"img_url": "bread.jpg"
}
},
$set : {
"last_activity" : ISODate()
}
}
)
db.cart.updateOne({
"_id": ObjectId("55932ef370c32e23e6552ced"),
"items.itemid" : 4567
}, {
$set : {
"items.$.quantity" : 5,
"last_activity" : ISODate()
}
})
db.cart.updateOne({
"_id": ObjectId("55932ef370c32e23e6552ced")
}, {
$pull : {
"items" : { "itemid" : 4567 }
},
$set : {
"last_activity" : ISODate()
}
})
17
Tracking Inventory for an Item
{
"_id": 8910,
"img_url": "eggs.jpg",
"quantity" : 2000,
"quantity_in_carts" : 3
...
}
Increment “quantity_in_carts”
db.item.updateOne(
{ "_id": 8910 },
{ $inc : { "quantity_in_carts" : 1 } } )
Decrement “quantity_in_carts”
db.item.updateOne(
{ "_id": 8910 },
{ $inc : { "quantity_in_carts" : -1 } } )
• Aggregate can be used to query for number of items across all user carts
db.cart.aggregate(
{ $match : { "items.itemid" : 8910 } },
{ $unwind : "$items" },
{ $group : {
"_id" : "$items.itemid",
"amount" : { "$sum" : "$items.quantity" }
} }
)
18
Expiring the Shopping Cart
Three options:
• Use a background process to expire items in the cart collection and update the “quantity_in_carts” field.
• Create a TTL index on “last_activity” field in “cart” collection. Remove the “quantity_in_carts” field from the
item document and create a query for determining the number of items currently allocated to user carts
• Create a background process to change the “status” field of expired carts to “inactive”
• Efficiently store very large shopping carts (1000+ items per cart)
• Expire items individually
• The array used for the “items” field will lead to performance degradation as the array becomes very large
• Split cart into “cart” and “cart_item” collections
{
"_id": ObjectId("55932ef370c32e23e6552ced"),
"userid": 1234,
"last_activity": ISODate(...),
"status" : "active",
}
{
"_id" : ObjectId("55932f6670c32e23f119073c"),
"cartid" : ObjectId("55932ef370c32e23e6552ced"),
"itemid": 1357,
"title": "Bread",
"price": 2.00,
"quantity": 1,
"img_url": "bread.jpg",
"date_added" : ISODate(...)
}
19
Expire Items Individually
• Add a TTL index to the “cart_item” document for the “date_added” field
• Expiration would occur after a certain amount of time from when the item was added to the cart, similar to a
ticketing site, or flash sale site
Class Exercise
Introduction
• In this group exercise, we’re going to take what we’ve learned about MongoDB and develop a basic but reason-
able data model for an e-commerce site.
• For users of RDBMSs, the most challenging part of the exercise will be figuring out how to construct a data
model when joins aren’t allowed.
• We’re going to model for several entities and features.
Product Catalog
• Products. Products vary quite a bit. In addition to the standard production attributes, we will allow for variations
of product type and custom attributes. E.g., users may search for blue jackets, 11-inch macbooks, or size 12
shoes. The product catalog will contain millions of products.
• Product pricing. Current prices as well as price histories.
• Product categories. Every e-commerce site includes a category hierarchy. We need to allow for both that
hierarchy and the many-to-many relationship between products and categories.
• Product reviews. Every product has zero or more reviews and each review can receive votes and comments.
20
Product Metrics
• Product views and purchases. Keep track of the number of times each product is viewed and when each
product is purchased.
• Top 10 lists. Create queries for top 10 viewed products, top 10 purchased products.
• Graph historical trends. Create a query to graph how a product is viewed/purchased over the past.
• 30 days with 1 hour granularity. This graph will appear on every product page, the query must be very fast.
Deliverables
Break into groups and work together to create the following deliverables:
• Sample document and schema for each collection
• Queries the application will use
• Index definitions
Solution
All slides from now on should be shown only after a solution is found by the groups & presented.
21
2 Application Engineering
What is MongoMart
MongoMart is an on-line store for buying MongoDB merchandise. We’ll use this application to learn more about
interacting with MongoDB through the driver.
• View Items
• View Items by Category
• Text Search
• View Item Details
• Shopping Cart
View Items
• http://localhost:8080
• Pagination and page numbers
• Click on a category
• http://localhost:8080/?category=Apparel
• Pagination and page numbers
• “All” is listed as a category, to return to all items listing
22
Text Search
• http://localhost:8080/search?query=shirt
• Search for any word or phrase in item title, description or slogan
• Pagination
• http://localhost:8080/item?id=1
• Star rating based on reviews
• Add a review
• Related items
• Add item to cart
Shopping Cart
• http://localhost:8080/cart
• Adding an item multiple times increments quantity by 1
• Change quantity of any item
• Changing quantity to 0 removes item
Introduction
• In this lab, we’ll set up and optimize an application called MongoMart. MongoMart is an on-line store for
buying MongoDB merchandise.
• Import the “item” collection to a standalone MongoDB server (without replication) as noted in the README.md
file of the /data directory of MongoMart
• Become familiar with the structure of the Java application in /java/src/main/java/mongomart/
• Modify the MongoMart.java class to properly connect to your local database instance
23
Lab: Populate All Necessary Database Queries
• After running the MongoMart.java class, navigate to “localhost:8080” to view the application
• Initially, all data is static and the application does not query the database
• Modify the ItemDao.java and CartDao.java classes to ensure all information comes from the database (do not
modify the method return types or parameters)
• It is important to use replication for production MongoDB instances, however, Lab 1 advised us to use a stan-
dalone server.
• Convert your local standalone mongod instance to a three node replica set named “shard1”
• Modify MongoMart’s MongoDB connection string to include at least two nodes from the replica set
• Modify your application’s write concern to MAJORITY for all writes to the “cart” collection, any writes to the
“item” collection should continue using the default write concern of W:1
• Currently, all reviews are stored in an “item” document, within a “reviews” array. This is problematic for the
cases when the number of reviews for a product becomes extremely large.
• Create a new collection called “review” and modify the “reviews” array within the “item” collection to only
contain the last 10 reviews.
• Modify the application to update the last 10 reviews for an item, the average number of stars (based on reviews)
for an item, and insert the review into the new “review” collection
• Pagination throughout MongoMart uses the inefficient sort() and limit() method
• Optimize MongoMart to use range based pagination
• You may modify method names and return values for this lab
24
2.3 Python Driver Labs (MongoMart)
Introduction
• Import the “item” collection to a standalone MongoDB server (without replication) as noted in the README.md
file of the /data directory of MongoMart
• Become familiar with the structure of the Python application in /
• Start the application by running “python mongomart.py”, stop it by using ctrl-c
• Modify the mongomart.py file to properly connect to your local database instance
• It is important to use replication for production MongoDB instances, however, Lab 1 advised us to use a stan-
dalone server.
• Convert your local standalone mongod instance to a three node replica set named “rs0”
• Modify MongoMart’s MongoDB connection string to include at least two nodes from the replica set
• Modify your application’s write concern to MAJORITY for all writes to the database
• Currently, all reviews are stored in an “item” document, within a “reviews” array. This is problematic for the
cases when the number of reviews for a product becomes extremely large.
• Create a new collection called “review” and modify the “reviews” array within the “item” collection to only
contain the last 10 reviews (sorted by date).
• Modify the application to update the last 10 reviews for an item, the average number of stars (based on reviews)
for an item, and insert the review into the new “review” collection
25
Lab: Use Range Based Pagination
• Pagination throughout MongoMart uses the inefficient sort() and limit() method
• Optimize MongoMart to use range based pagination
• You may modify method names and return values for this lab
26
Find out more Having trouble? Follow us on twitter
mongodb.com | mongodb.org File a JIRA ticket: @MongoDBInc
university.mongodb.com jira.mongodb.org @MongoDB