MongoDB Schema Design
MongoDB Schema Design
What is MongoDB?
1. Document Database
Not for .PDF & .DOC files
A document is essentially an associative array
Document = JSON object
Document = PHP Array
Document = Python Dict
Document = Ruby Hash
etc
1. Database Landscape
Memcached
MongoDB
RDBMS
Depth of Functionality
Document
Database
Riak
MongoDB
Column-Family
Stores
Graph
Databases
Amazon
SimpleDB
Neo4J
Cassandra
FlockDB
Hbase
OrientDB
Memcache
CouchDB
Project
Voldemort
OrientDB
Redis
Hypertable
BerkeleyDB
1. Database Evolution
2010
1990
2000
RDBMS
Operational
Database
NoSQL
RDBMS
Key-Value/
Wide-column
Document DB
RDBMS
Datawarehousing
OLAP/DW
Hadoop
OLAP/DW
2. Open Source
MongoDB is an open source project
On GitHub
Licensed under the AGPL
Started & sponsored by MongoDB Inc (formerly
known as 10gen)
Commercial licenses available
Contributions welcome
2. Global Community
7,000,000+
MongoDB Downloads
150,000+
35,000+
30,000+
20,000+
3. High Performance
Written in C++
Extensive use of memory-mapped files
3. Performance
Better Data
Locality
In-Memory Caching
In-Place
Updates
4. Scalability
Auto-Sharding
4. High Availability
5. Full Featured
Ad Hoc queries
Real time aggregation
Rich query capabilities
Strongly consistent
Geospatial features
Support for most programming languages
Flexible schema
mongodb.org/downloads
Running MongoDB
$ tar zxvf mongodb-osx-x86_64-2.6.0.tgz
$ cd mongodb-osx-i386-2.6.0/bin
$ mkdir p /data/db
$ ./mongod
Mongo Shell
MacBook-Pro-:~ $ mongo
MongoDB shell version: 2.6.0
connecting to: test
> db.test.insert({text: 'Welcome to MongoDB'})
> db.test.find().pretty()
{
"_id" : ObjectId("51c34130fbd5d7261b4cdb55"),
"text" : "Welcome to MongoDB"
}
_id
_id is the primary key in MongoDB
Automatically indexed
Automatically created as an ObjectId if not provided
Any unique immutable value could be used
ObjectId
ObjectId is a special 12 byte value
Guaranteed to be unique across your cluster
ObjectId("50804d0bd94ccab2da652599")
|----ts-----||---mac---||-pid-||----inc-----|
4
3
2
3
Document Database
Terminology
RDBMS
MongoDB
Table, View
Collection
Row
Document
Index
Index
Join
Embedded Document
Foreign Key
Reference
Partition
Shard
User
Name
Email address
Article
Name
Slug
Publish date
Text
Comment
Comment
Date
Author
Tag
Name
URL
MongoDB ERD
Article
User
Name
Email address
Name
Slug
Publish date
Text
Author
Comment[]
Comment
Date
Author
Tag[]
Value
Category[]
Value
Post
Author
Comment
Post
Author
Comment
Comment
Comment
Comment
Comment
MongoDB Drivers
Official Support for 12 languages
Community drivers for tons more
Drivers connect to mongo servers
Drivers translate BSON into native types
mongo shell is not a driver, but works like one in some
ways
Installed using typical means (maven, npm, pecl, gem,
pip)
_id : ObjectId(..),
title : Schema design in MongoDB,
author : mattbates,
date : ISODate(..),
tags : [MongoDB, schema],
section : schema,
slug : schema-design-in-mongodb,
comments : [ ObjectId(..),]
_id : ObjectId(..),
article_id : 1,
text : A great article, helped me
understand schema design,
date : ISODate(..),,
author : johnsmith
Cons
Comments array is unbounded;
documents will grow in size
(remember 16MB document
limit)
_id : ObjectId(..),
title : Schema design in MongoDB,
author : mattbates,
date : ISODate(..),
tags : [MongoDB,schema],
comments : [
{
text : A great article, helped me
understand schema design,
date : ISODate(..),
author : johnsmith
},
comments_count: 45,
comments_pages : 1
comments : [
{
text : A great article, helped me
understand schema design,
date : ISODate(..),
author : johnsmith
},
]
}
update operation as
comments added/removed
Number of pages
Page is a bucket of 100
_id : ObjectId(..),
article_id : ObjectId(..),
page : 1,
count : 42
comments : [
{
text : A great article, helped me
understand schema design,
date : ISODate(..),
author : johnsmith
},
Modelling interactions
Interactions
Article views
Comments
(Social media sharing)
Requirements
Time series
Pre-aggregated in preparation for analytics
Modelling interactions
Document per article per day
bucketing
Daily counter and hourly sub-
_id : ObjectId(..),
article_id : ObjectId(..),
section : schema,
date : ISODate(..),
daily: { views : 45,comments : 150 }
hours : {
0 : { views : 10 },
1 : { views : 2 },
23 : { comments : 14,views : 10 }
}
Client-side
JSON
(eg AngularJS,
HTTP(S) REST
Python web
app
Pymongo driver
(BSON)
URI
Action
GET
/articles
GET
/articles-by-tag/[tag]
GET
/articles/[article_id]
POST
/articles
GET
/articles/[article_id]/comments
POST
/articles/[article_id]/comments
POST
/users
GET
/users/[username]
PUT
/users/[username]
# push the comment to the latest bucket and $inc the count
page = db['comments'].find_and_modify(
{ 'article_id' : ObjectId(article_id),
'page' : page_id},
{ '$inc' : { 'count' : 1 },
'$push' : {
'comments' : comment } },
fields= {'count' : 1},
upsert=True,
new=True)
{
"articles": "[{\"title\": \"Schema design in MongoDB\", \"text\": \"Data in MongoDB
has a flexible schema..\", \"section\": \"schema\", \"author\": \"prasoonk\", \"date\":
{\"$date\": 1397145312505}, \"_id\": {\"$oid\": \"5346bef5f2610c064a36a793\"},
\"slug\": \"schema-design-in-mongodb\", \"tags\": [\"MongoDB\", \"schema\"]}]"}
Schema iteration
New feature in the backlog?
Documents have dynamic schema so we just iterate the
object schema.
>>> user = { username : matt,
first : Matt,
last : Bates,
preferences : { opt_out : True } }
>>> user.save(user)
docs.mongodb.org
Location
MongoDB Downloads
mongodb.com/download
education.mongodb.com
mongodb.com/events
White Papers
mongodb.com/white-papers
Case Studies
mongodb.com/customers
Presentations
mongodb.com/presentations
Documentation
docs.mongodb.org
Additional Info
Schema Design @
Article
User
Name
Email address
Name
Slug
Publish date
Text
Author
Comment[]
Comment
Date
Author
Tag[]
Value
Category[]
Value
Replication @
Client Application
Driver
Re
a
a
Re
Write
Secondary
Primary
Secondary
Indexing @
16
12
18
21
Sharding @
www.etiennemansard.com
Questions?
#DDJIndia @prasoonk
Thank You
Prasoon Kumar
Consulting Engineer, MongoDB