MongoDB Chapter3
MongoDB Chapter3
Donny Winston
Instructor
What is "projection"?
reducing data to fewer dimensions
<pymongo.cursor.Cursor at 0x10d6e69e8>
n_prizes = 0
for doc in :
# count the number of pizes in each doc
n_prizes += len(doc["prizes"])
print(n_prizes)
941
# using comprehension
sum([len(doc["prizes"]) for doc in docs])
941
Donny Winston
Donny Winston
Sorting post-query with Python
docs = list(db.prizes.find({"category": "physics"}, ["year"]))
1967 physics
1967 medicine
1967 literature
1967 chemistry
1968 physics
1968 peace
1968 medicine
1968 literature
1968 chemistry
1969 physics
1969 peace
1969 medicine
1969 literature
1969 economics
1969 chemistry
{"year": 1, "category": 1}
{'category': 1, 'year': 1}
Donny Winston
Instructor
What are indexes?
Large documents
Large collections
%%timeit
docs = list(db.prizes.find({"year": "1901"}))
524 µs ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
%%timeit
docs = list(db.prizes.find({}, sort=[("year", 1)]))
5.18 ms ± 54.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)
db.prizes.create_index([("year", 1)])
%%timeit
# Previously: 5.18 ms ± 54.9 µs
'year_1' docs = list(db.prizes.find({}, sort=[("year", 1)]))
# Before # Before
645 µs ± 3.87 µs per loop 673 µs ± 3.36 µs per loop
(mean ± std. dev. of 7 runs, 1000 loops each) (mean ± std. dev. of 7 runs, 1000 loops each)
# After # After
503 µs ± 4.37 µs per loop 407 µs ± 5.51 µs per loop
(mean ± std. dev. of 7 runs, 1000 loops each) (mean ± std. dev. of 7 runs, 1000 loops each)
...
'winningPlan': {'stage': 'PROJECTION',
db.laureates.find( 'transformBy': {'bornCountry': 1, '_id': 0},
{"firstname": "Marie"}, {"bornCountry": 1, "_id": 0}).explain() 'inputStage': {'stage': 'IXSCAN',
'keyPattern': {'firstname': 1, 'bornCountry': 1},
'indexName': 'firstname_1_bornCountry_1',
...
...
'winningPlan': {'stage': 'PROJECTION',
'transformBy': {'bornCountry': 1, '_id': 0},
'inputStage': {'stage': 'COLLSCAN',
...
Donny Winston
Instructor
Limiting our exploration
for doc in db.prizes.find({}, ["laureates.share"]): for doc in db.prizes.find({"laureates.share": "3"}, limit=3):
share_is_three = [laureate["share"] == "3" print("{year} {category}".format(**doc))
for laureate in doc["laureates"]]
assert all(share_is_three) or not any(share_is_three)
2017 chemistry
2017 medicine
for doc in db.prizes.find({"laureates.share": "3"}): 2016 chemistry
print("{year} {category}".format(**doc))
2017 chemistry
2017 medicine
2016 chemistry
2015 chemistry
2014 physics
2014 chemistry
2013 chemistry
...
2015 chemistry
2014 physics
2014 chemistry
2015 chemistry
2014 physics
2014 chemistry
docs = list(cursor1)
assert docs == list(cursor2) == list(cursor3)
for doc in docs:
print("{year} {category}".format(**doc))
1954 medicine
1956 physics
1956 medicine