Mongodb Session 4
Mongodb Session 4
db.places.insert({
"_id": original_id,
"name": "Broadway Center", Manual
"url": "bc.example.net" Refernces
})
db.people.insert({
"name": "Erin",
"places_id": original_id,
"url": "bc.example.net/Erin"
})
Then, when a query returns the document from the people
collection you can, if needed, make a second query for the
document referenced by the places_id field in the places
collection.
Use:
For nearly every case where you want to store a relationship
between two documents, use manual references .
The references are simple to create and your application can
resolve references as needed.
The only limitation of manual linking is that these references do
not convey the database and collection names.
If you have documents in a single collection that relate to
documents in more than one collection, you may need to
consider using DBRefs.
DBRefs:
Background
DBRefs are a convention for representing a document, rather
than a specific reference type.
They include the name of the collection, and in some cases the
database name, in addition to the value from the _id field.
Format:
DBRefs have the following fields:
$ref
The $ref field holds the name of the collection where the referenced
document resides.
$id
The $id field contains the value of the _id field in the referenced
document.
$db
Optional.
Contains the name of the database where the referenced document
resides.
w: 1:
Requests acknowledgment that the write operation has
propagated to the standalone mongod or the primary in a
replica set. w: 1 is the default write concern for MongoDB.
w: 0
Requests no acknowledgment of the write operation.
However, w: 0 may return information about socket
exceptions and networking errors to the application.
If you specify w: 0 but include j: true, the j: true prevails to
request acknowledgment from the standalone mongod or
the primary of a replica set.
w greater than 1 requires acknowledgment from the primary
and as many additional data-bearing secondaries to meet
the specified write concern.
For example, consider a 3-member replica set with no
arbiters. Specifying w: 2 would require acknowledgment from
the primary and one of the secondaries. Specifying w: 3
would require acknowledgment from the primary and both
secondaries.
Majority:
The majority (M) is calculated as the majority of all voting
members [1], but the write operation returns
acknowledgement after propagating to M-number of data-
bearing voting members (primary and secondaries with
members[n].votes greater than 0).
For example, consider a replica set with 3 voting members,
Primary-Secondary-Secondary (P-S-S).
For this replica set, M is two [1], and the write must propagate
to the primary and one secondary to acknowledge the write
concern to the client.
After the write operation returns with a w: "majority"
acknowledgment to the client, the client can read the result
of that write with a "majority" readConcern.
j Option:
The j option requests acknowledgment from MongoDB that the
write operation has been written to the on-disk journal.
j:
If j: true, requests acknowledgment that the mongod instances,
as specified in the w: <value>, have written to the on-disk
journal.
j: true does not by itself guarantee that the write will not be
rolled back due to replica set primary failover.
Changed in version 3.2: With j: true, MongoDB returns only after
the requested number of members, including the primary, have
written to the journal.
Previously j: true write concern in a replica set only requires the
primary to write to the journal, regardless of the w: <value> write
concern.
Wtimeout:
This option specifies a time limit, in milliseconds, for the write
concern. wtimeout is only applicable for w values greater than 1.
With writeConcernMajorityJournalDefault set to false, MongoDB
does not wait for w: "majority" writes to be written to the on-disk
journal before acknowledging the writes.
As such, majority write operations could possibly roll back in
the event of a transient loss (e.g. crash and restart) of a
majority of nodes in a given replica set.
w: “majority”:
j is unspecified:
Acknowledgment depends on the value of
writeConcernMajorityJournalDefault:
If true, acknowledgment requires writing operation to on-disk
journal (j: true).
writeConcernMajorityJournalDefault defaults to true
If false, acknowledgment requires writing operation in memory (j:
false).
j: true:
Acknowledgment requires writing operation to on-disk journal.
j: false:
Acknowledgment requires writing operation in memory.
w: <number>:
j is unspecified:
Acknowledgment requires writing operation in memory (j: false).
j: true:
Acknowledgment requires writing operation to on-disk journal.
j: false:
Acknowledgment requires writing operation in memory.
Aggregation:
Aggregation operations process data records and return
computed results.
Aggregation operations group values from multiple documents
together, and can perform a variety of operations on the
grouped data to return a single result
MongoDB provides three ways to perform aggregation:
Aggregation pipeline
, Map-reduce function
Single purpose aggregation methods
Aggregation pipeline:
The aggregation pipeline is a framework for data aggregation
modeled on the concept of data processing pipelines.
Documents enter a multi-stage pipeline that transforms the
documents into aggregated results.
For example:
db.orders.aggregate([
{ $match: { status: "A" } },
{ $group: { _id: "$cust_id", total: { $sum: "$amount" } } }
])
First Stage:
The $match stage filters the documents by the status field and
passes to the next stage those documents that have status equal to
"A".
Second Stage:
The $group stage groups the documents by the cust_id field to
calculate the sum of the amount for each unique cust_id.
Pipeline:
The MongoDB aggregation pipeline consists of stages.
Each stage transforms the documents as they pass through the
pipeline.
Pipeline stages do not need to produce one output document
for every input document; e.g., some stages may generate new
documents or filter out documents.
Pipeline stages can appear multiple times in the pipeline with
the exception of $out, $merge, and $geoNear stages. .
MongoDB provides the db.collection.aggregate() method in the
mongo shell and the aggregate command to run the
aggregation pipeline.
Early Filtering:
If your aggregation operation requires only a subset of the
data in a collection, use the $match, $limit, and $skip stages to
restrict the documents that enter at the beginning of the
pipeline.
When placed at the beginning of a pipeline, $match
operations use suitable indexes to scan only the matching
documents in a collection.
Placing a $match pipeline stage followed by a $sort stage at
the start of the pipeline is logically equivalent to a single query
with a sort and can use an index.
When possible, place $match operators at the beginning of
the pipeline.
Aggregation with the Zip Code Data Set:
Data Model:
Each document in the zipcodes collection has the
following:
{
"_id": "10280",
"city": "NEW YORK",
"state": "NY",
"pop": 5574,
"loc": [
-74.016323,
40.710537
]
}
The _id field holds the zip code as a string.
The city field holds the city name. A city can have more than
one zip code associated with it as different sections of the city
can each have a different zip code.
The state field holds the two letter state abbreviation.
The pop field holds the population.
The loc field holds the location as a latitude longitude pair.
Return States with Populations above 10 Million:
The following aggregation operation returns all states with total
population greater than 10 million
db.zipcodes.aggregate( [
{ $group: { _id: "$state", totalPop: { $sum: "$pop" } } },
{ $match: { totalPop: { $gte: 10*1000*1000 } } }
])
The $group stage groups the documents of the zipcode
collection by the state field, calculates the totalPop field for
each state, and outputs a document for each unique state.
The new per-state documents have two fields: the _id field and
the totalPop field.
The _id field contains the value of the state; i.e. the group by
field. The totalPop field is a calculated field that contains the
total population of each state.
To calculate the value, $group uses the $sum operator to add
the population field (pop) for each state.
After the $group stage, the documents in the pipeline resemble
the following:
{
"_id" : "AK",
"totalPop" : 550043
}
The $match stage filters these grouped documents to output only
those documents whose totalPop value is greater than or equal to
10 million. The $match stage does not alter the matching
documents but outputs the matching documents unmodified.
The equivalent SQL for this aggregation operation is:
SELECT state, SUM(pop) AS totalPop
FROM zipcodes
GROUP BY state
HAVING totalPop >= (10*1000*1000)
Return Average City Population by State:
db.zipcodes.aggregate( [
{ $group: { _id: { state: "$state", city: "$city" }, pop: { $sum: "$pop" }
} },
{ $group: { _id: "$_id.state", avgCityPop: { $avg: "$pop" } } }
])
The first $group stage groups the documents by the combination
of city and state, uses the $sum expression to calculate the
population for each combination, and outputs a document for
each city and state combination. [1]
After this stage in the pipeline, the documents resemble the
following:
{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}
A second $group stage groups the documents in the pipeline by
the _id.state field (i.e. the state field inside the _id document), uses
the $avg expression to calculate the average city population
(avgCityPop) for each state, and outputs a document for each
state.
The documents that result from this aggregation operation
resembles the following:
{
"_id" : "MN",
"avgCityPop" : 5335
}
Return Largest and Smallest Cities by State:
db.zipcodes.aggregate( [
{ $group:
{
_id: { state: "$state", city: "$city" },
pop: { $sum: "$pop" }
}
},
{ $sort: { pop: 1 } },
{ $group:
{
_id : "$_id.state",
biggestCity: { $last: "$_id.city" },
biggestPop: { $last: "$pop" },
smallestCity: { $first: "$_id.city" },
smallestPop: { $first: "$pop" } } }
// the following $project is optional, and
// modifies the output format.
{ $project:
{ _id: 0,
state: "$_id",
biggestCity: { name: "$biggestCity", pop: "$biggestPop" },
smallestCity: { name: "$smallestCity", pop: "$smallestPop" }
}
}
])])
In this example, the aggregation pipeline consists of a $group
stage, a $sort stage, another $group stage, and a $project stage:
The first $group stage groups the documents by the combination
of the city and state, calculates the sum of the pop values for
each combination, and outputs a document for each city and
state combination.
At this stage in the pipeline, the documents resemble the following:
{
"_id" : {
"state" : "CO",
"city" : "EDGEWATER"
},
"pop" : 13154
}
The $sort stage orders the documents in the pipeline by the pop field
value, from smallest to largest;
The next $group stage groups the now-sorted documents by the _id.state
field (i.e. the state field inside the _id document) and outputs a
document for each state.
The stage also calculates the following four fields for each state. Using the
$last expression, the $group operator creates the biggestCity and
biggestPop fields that store the city with the largest population and that
population. Using the $first expression, the $group operator creates the
smallestCity and smallestPop fields that store the city with the smallest
population and that population.
{
"_id" : "WA",
"biggestCity" : "SEATTLE",
"biggestPop" : 520096,
"smallestCity" : "BENGE",
"smallestPop" : 2
}
The final $project stage renames {
"state" : "RI",
"biggestCity" : {
"name" : "CRANSTON",
"pop" : 176404
},
"smallestCity" : {
"name" : "CLAYVILLE",
"pop" : 45
}}
Aggregation with User Preference Data:
Data Model
Consider a hypothetical sports club with a database that
contains a users collection that tracks the user’s join dates, sport
preferences, and stores these data in documents that resemble
the following:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : ["golf", "racquetball"]
}
{
_id : "joe",
joined : ISODate("2012-07-02"),
likes : ["tennis", "golf", "swimming"]
}
Normalize and Sort Documents:
The following operation returns user names in upper case and
in alphabetical order.
The aggregation includes user names for all documents in the
users collection.
You might do this to normalize user names for processing.
db.users.aggregate(
[
{ $project : { name:{$toUpper:"$_id"} , _id:0 } },
{ $sort : { name : 1 } }
]
)
All documents from the users collection pass through the
pipeline, which consists of the following operations:
The $project operator:
creates a new field called name.
converts the value of the _id to upper case, with the
$toUpper operator. Then the $project creates a new field,
named name to hold this value.
suppresses the id field. $project will pass the _id field by
default, unless explicitly suppressed.
The $sort operator orders the results by the name field.
The results of the aggregation would resemble the following:
{
"name" : "JANE
}, “{
"name" : "JILL"
},"name" : "JOE"
}}
Return Usernames Ordered by Join Month:
The following aggregation operation returns user names sorted by
the month they joined. This kind of aggregation could help generate
membership renewal notices.
db.users.aggregate(
[
{ $project :
{
month_joined : { $month : "$joined" },
name : "$_id",
_id : 0
}
},
{ $sort : { month_joined : 1 } }
]
)
The pipeline passes all documents in the users collection through
the following operations:
The $project operator:
Creates two new fields: month_joined and name.
Suppresses the id from the results. The aggregate() method
includes the _id, unless explicitly suppressed.
The $month operator converts the values of the joined field to
integer representations of the month.
Then the $project operator assigns those values to the
month_joined field.
The $sort operator sorts the results by the month_joined field.
The operation returns results that resemble the following:
{
"month_joined" : 1,
"name" : "ruth"
},
{
"month_joined" : 1,
"name" : "harold"
},
{
"month_joined" : 1,
"name" : "kate"
}{
"month_joined" : 2,
"name" : "jill"
}
Return Total Number of Joins per Month:
The following operation shows how many people joined
each month of the year.
You might use this aggregated data for recruiting and
marketing strategies.
db.users.aggregate(
[
{ $project : { month_joined : { $month : "$joined" } } } ,
{ $group : { _id : {month_joined:"$month_joined"} ,
number : { $sum : 1 } } },
{ $sort : { "_id.month_joined" : 1 } }
]
)
The pipeline passes all documents in the users collection through
the following operations:
The $project operator creates a new field called
month_joined.
The $month operator converts the values of the joined field to
integer representations of the month. Then the $project
operator assigns the values to the month_joined field.
The $group operator collects all documents with a given
month_joined value and counts how many documents there
are for that value. Specifically, for each unique value, $group
creates a new “per-month” document with two fields:
_id, which contains a nested document with the
month_joined field and its value.
number, which is a generated field. The $sum operator
increments this field by 1 for every document containing the
given month_joined value.
The $sort operator sorts the documents created by $group
according to the contents of the month_joined field.
{
"_id" : {
"month_joined" : 1
},
"number" : 3
},
{
"_id" : {
"month_joined" : 2
},
"number" : 9
},
{
"_id" : {
"month_joined" : 3
},
"number" : 5
}
Return the Five Most Common “Likes”:
The following aggregation collects top five most “liked” activities
in the data set.
db.users.aggregate(
[
{ $unwind : "$likes" },
{ $group : { _id : "$likes" , number : { $sum : 1 } } },
{ $sort : { number : -1 } },
{ $limit : 5 }
]
)
The $unwind operator separates each value in the likes array,
and creates a new version of the source document for every
element in the array.
Example
Given the following document from the users collection:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : ["golf", "racquetball"]
}
The $unwind operator would create the following documents:
{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : "golf"
}{
_id : "jane",
joined : ISODate("2011-03-02"),
likes : "racquetball“ }
The $group operator collects all documents with the same value
for the likes field and counts each grouping.
With this information, $group creates a new document with two
fields:
_id, which contains the likes value.
number, which is a generated field. The $sum operator
increments this field by 1 for every document containing the
given likes value.
The $sort operator sorts these documents by the number field in
reverse order.
The $limit operator only includes the first 5 result documents.
{
"_id" : "golf",
"number" : 33
},
"_id" : "racquetball",
"number" : 31
},
{ Aggregated
"_id" : "swimming", Result
"number" : 24
},
"_id" : "handball",
"number" : 19
},
"_id" : "tennis",
"number" : 18
}
Map-Reduce:
Map-reduce is a data processing paradigm for condensing large
volumes of data into useful aggregated results. For map-reduce
operations, MongoDB provides the mapReduce database
command.
All map-reduce functions in MongoDB are JavaScript and run
within the mongod process.
Map-reduce operations take the documents of a single
collection as the input and can perform any arbitrary sorting
and limiting before beginning the map stage.
mapReduce can return the results of a map-reduce
operation as a document, or may write the results to
collections.
Map-Reduce JavaScript Functions¶
In MongoDB, map-reduce operations use custom JavaScript
functions to map, or associate, values to a key.
If a key has multiple values mapped to it, the operation
reduces the values for the key to a single object.
Map-Reduce Results:
In MongoDB, the map-reduce operation can write results to
a collection or return the results inline.
If you write map-reduce output to a collection, you can
perform subsequent map-reduce operations on the same
input collection that merge replace, merge, or reduce new
results with previous results
Map-Reduce Examples:
In the mongo shell, the db.collection.mapReduce() method
is a wrapper around the mapReduce command.
The following examples use the db.collection.mapReduce()
method:
Consider the following map-reduce operations on a collection
orders that contains documents of the following prototype:
{ _id: ObjectId("50a8240b927d5d8b5891743c"),
cust_id: "abc123",
ord_date: new Date("Oct 04, 2012"),
status: 'A',
price: 25,
items: [ { sku: "mmm", qty: 5, price: 2.5 },
{ sku: "nnn", qty: 5, price: 2.5 } ]
}
Return the Total Price Per Customer:
Perform the map-reduce operation on the orders collection to
group by the cust_id, and calculate the sum of the price for
each cust_id:
Define the map function to process each input document:
In the function, this refers to the document that the map-
reduce operation is processing.
The function maps the price to the cust_id for each document
and emits the cust_id and price pair.
var mapFunction1 = function() {
emit(this.cust_id, this.price);
};
Define the corresponding reduce function with two arguments
keyCustId and valuesPrices:
The valuesPrices is an array whose elements are the price values
emitted by the map function and grouped by keyCustId.
The function reduces the valuesPrice array to the sum of its
elements.
var reduceFunction1 = function(keyCustId, valuesPrices) {
return Array.sum(valuesPrices);
};
Perform the map-reduce on all documents in the orders collection
using the mapFunction1 map function and the reduceFunction1
reduce function.
db.orders.mapReduce(
mapFunction1,
reduceFunction1,
{ out: "map_reduce_example" }
)
This operation outputs the results to a collection named
map_reduce_example.
If the map_reduce_example collection already exists, the
operation will replace the contents with the results of this map-
reduce operation:
Calculate Order and Total Quantity with Average Quantity Per
Item:
In this example, you will perform a map-reduce operation on
the orders collection for all documents that have an ord_date
value greater than 01/01/2012.
The operation groups by the item.sku field, and calculates the
number of orders and the total quantity ordered for each sku
. The operation concludes by calculating the average quantity
per order for each sku value:
Define the map function to process each input document:
In the function, this refers to the document that the map-
reduce operation is processing.
For each item, the function associates the sku with a new
object value that contains the count of 1 and the item qty for
the order and emits the sku and value pair.
var mapFunction2 = function() {
for (var idx = 0; idx < this.items.length; idx++) {
var key = this.items[idx].sku;
var value = {
count: 1,
qty: this.items[idx].qty
};
emit(key, value);
}
Define the corresponding reduce function with two arguments
keySKU and countObjVals:
countObjVals is an array whose elements are the objects
mapped to the grouped keySKU values passed by map
function to the reducer function.
The function reduces the countObjVals array to a single object
reducedValue that contains the count and the qty fields.
In reducedVal, the count field contains the sum of the count
fields from the individual array elements, and the qty field
contains the sum of the qty fields from the individual array
elements.
var reduceFunction2 = function(keySKU, countObjVals) {
reducedVal = { count: 0, qty: 0 };
reducedVal.avg =
reducedVal.qty/reducedVal.count;
return reducedVal;
};
Perform the map-reduce operation on the orders collection
using the mapFunction2, reduceFunction2, and finalizeFunction2
functions.
db.orders.mapReduce( mapFunction2,
reduceFunction2,
{
out: { merge: "map_reduce_example" },
query: { ord_date:
{ $gt: new Date('01/01/2012') }
},
finalize: finalizeFunction2
}
)
This operation uses the query field to select only those documents
with ord_date greater than new Date(01/01/2012). Then it output
the results to a collection map_reduce_example. If the
map_reduce_example collection already exists, the operation will
merge the existing contents with the results of this map-reduce
operation.
Query and Projection Operators: Comparison Query
Operators
Namee Description
Write Concern
Summary:
Aggregation