0% found this document useful (0 votes)
25 views

MongoDB Chapter3

The document discusses different ways to project or filter fields in MongoDB queries, including using projection as a dictionary or list to include or exclude fields, and how missing fields are handled. It also covers sorting query results in Python versus directly in MongoDB, and using indexes to improve query performance for queries with high specificity, large documents, or large collections.

Uploaded by

massyweb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views

MongoDB Chapter3

The document discusses different ways to project or filter fields in MongoDB queries, including using projection as a dictionary or list to include or exclude fields, and how missing fields are handled. It also covers sorting query results in Python versus directly in MongoDB, and using indexes to improve query performance for queries with high specificity, large documents, or large collections.

Uploaded by

massyweb
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 30

Projection: Getting

only what you need


INTRODUCTION TO MONGODB IN PYTHON

Donny Winston
Instructor
What is "projection"?
reducing data to fewer dimensions

asking certain data to "speak up"!

INTRODUCTION TO MONGODB IN PYTHON


Projection in MongoDB
# include only prizes.affiliations Projection as a dictionary:
# exclude _id
docs = db.laureates.find( Include elds: "field_name" : 1
filter={},
projection={"prizes.affiliations": 1, "_id" is included by default
"_id": 0})
type(docs)

<pymongo.cursor.Cursor at 0x10d6e69e8>

INTRODUCTION TO MONGODB IN PYTHON


Projection in MongoDB
# include only prizes.affiliations # convert to list and slice
# exclude _id list(docs)[:3]
docs = db.laureates.find(
filter={},
[{'prizes': [{'affiliations': [{'city': 'Munich',
projection={"prizes.affiliations": 1,
'country': 'Germany',
"_id": 0})
'name': 'Munich University'}]}]},
type(docs) {'prizes': [{'affiliations': [{'city': 'Leiden',
'country': 'the Netherlands',

<pymongo.cursor.Cursor at 0x10d6e69e8> 'name': 'Leiden University'}]}]},


{'prizes': [{'affiliations': [{'city': 'Amsterdam',
'country': 'the Netherlands',
'name': 'Amsterdam University'}]}]}]

INTRODUCTION TO MONGODB IN PYTHON


Missing fields
# use "gender":"org" to select organizations Projection as a list
# organizations have no bornCountry
docs = db.laureates.find(
list the elds to include
filter={"gender": "org"},
projection=["bornCountry", "firstname"]) ["field_name1", "field_name2"]
list(docs)
"_id" is included by default
[{'_id': ObjectId('5bc56154f35b634065ba1dff'),
'firstname': 'United Nations Peacekeeping Forces'},
{'_id': ObjectId('5bc56154f35b634065ba1df3'),
'firstname': 'Amnesty International'},
...
]

INTRODUCTION TO MONGODB IN PYTHON


Missing fields
# use "gender":"org" to select organizations - only projected elds that exist are returned
# organizations have no bornCountry
docs = db.laureates.find(
docs = db.laureates.find({}, ["favoriteIceCreamFlavor"])
filter={"gender": "org"},
list(docs)
projection=["bornCountry", "firstname"])
list(docs)
[{'_id': ObjectId('5bc56154f35b634065ba1dff')},
{'_id': ObjectId('5bc56154f35b634065ba1df3')},
[{'_id': ObjectId('5bc56154f35b634065ba1dff'),
{'_id': ObjectId('5bc56154f35b634065ba1db1')},
'firstname': 'United Nations Peacekeeping Forces'},
...
{'_id': ObjectId('5bc56154f35b634065ba1df3'),
]
'firstname': 'Amnesty International'},
...
]

INTRODUCTION TO MONGODB IN PYTHON


Simple aggregation
docs = db.laureates.find({}, ["prizes"])

n_prizes = 0
for doc in :
# count the number of pizes in each doc
n_prizes += len(doc["prizes"])
print(n_prizes)

941

# using comprehension
sum([len(doc["prizes"]) for doc in docs])

941

INTRODUCTION TO MONGODB IN PYTHON


Let's project!
INTRODUCTION TO MONGODB IN PYTHON
Sorting
INTRODUCTION TO MONGODB IN PYTHON

Donny Winston
Donny Winston
Sorting post-query with Python
docs = list(db.prizes.find({"category": "physics"}, ["year"]))

print([doc["year"] for doc in docs][:5])

['2018', '2017', '2016', '2015', '2014']

from operator import itemgetter

docs = sorted(docs, key=itemgetter("year"))


print([doc["year"] for doc in docs][:5])

['1901', '1902', '1903', '1904', '1905']

docs = sorted(docs, key=itemgetter("year"), reverse=True)


print([doc["year"] for doc in docs][:5])

['2018', '2017', '2016', '2015', '2014']

['2018', '2017', '2016', '2015', '2014']

INTRODUCTION TO MONGODB IN PYTHON


Sorting in-query with MongoDB
cursor = db.prizes.find({"category": "physics"}, ["year"],
sort=[("year", 1)])
print([doc["year"] for doc in cursor][:5])

['1901', '1902', '1903', '1904', '1905']

cursor = db.prizes.find({"category": "physics"}, ["year"],


sort=[("year", -1)])
print([doc["year"] for doc in cursor][:5])

['2018', '2017', '2016', '2015', '2014']

['2018', '2017', '2016', '2015', '2014']

INTRODUCTION TO MONGODB IN PYTHON


Primary and secondary sorting
for doc in db.prizes.find(
{"year": {"$gt": "1966", "$lt": "1970"}},
["category", "year"],
sort=[("year", 1), ("category", -1)]):
print("{year} {category}".format(**doc))

1967 physics
1967 medicine
1967 literature
1967 chemistry
1968 physics
1968 peace
1968 medicine
1968 literature
1968 chemistry
1969 physics
1969 peace
1969 medicine
1969 literature
1969 economics
1969 chemistry

INTRODUCTION TO MONGODB IN PYTHON


Sorting with pymongo versus MongoDB shell
In MongoDB shell:

Example sort argument: {"year": 1, "category": -1}

JavaScript objects retain key order as entered

In Python (< 3.7):

{"year": 1, "category": 1}

{'category': 1, 'year': 1}

[("year", 1), ("category", 1)]

[('year', 1), ('category', 1)]

INTRODUCTION TO MONGODB IN PYTHON


Let's get sorted!
INTRODUCTION TO MONGODB IN PYTHON
What are indexes?
INTRODUCTION TO MONGODB IN PYTHON

Donny Winston
Instructor
What are indexes?

INTRODUCTION TO MONGODB IN PYTHON


What are indexes?

INTRODUCTION TO MONGODB IN PYTHON


What are indexes?

INTRODUCTION TO MONGODB IN PYTHON


When to use indexes?
Queries with high speci city

Large documents

Large collections

INTRODUCTION TO MONGODB IN PYTHON


Gauging performance before indexing
Jupyter Notebook %%timeit magic (same as python -m timeit "[expression]" )

%%timeit
docs = list(db.prizes.find({"year": "1901"}))

524 µs ± 7.34 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)

%%timeit
docs = list(db.prizes.find({}, sort=[("year", 1)]))

5.18 ms ± 54.9 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

INTRODUCTION TO MONGODB IN PYTHON


Adding a single-field index
index model: list of (field, direction) %%timeit
# Previously: 524 µs ± 7.34 µs
pairs.
docs = list(db.prizes.find({"year": "1901"}))

directions: 1 (ascending) and -1


379 µs ± 1.62 µs per loop
(descending) (mean ± std. dev. of 7 runs, 1000 loops each)

db.prizes.create_index([("year", 1)])
%%timeit
# Previously: 5.18 ms ± 54.9 µs
'year_1' docs = list(db.prizes.find({}, sort=[("year", 1)]))

4.28 ms ± 95.7 µs per loop


(mean ± std. dev. of 7 runs, 100 loops each)

4.28 ms ± 95.7 µs per loop (mean ± std. dev. of 7 runs, 100

INTRODUCTION TO MONGODB IN PYTHON


Adding a compound (multiple-field) index
db.prizes.create_index([("category", 1), ("year", 1)]) index "covering" a query with projection
and sorting
index "covering" a query with projection
db.prizes.find_one({"category": "economics"},
list(db.prizes.find({"category": "economics"}, {"year": 1, "_id": 0},
{"year": 1, "_id": 0})) sort=[("year", 1)])

# Before # Before
645 µs ± 3.87 µs per loop 673 µs ± 3.36 µs per loop
(mean ± std. dev. of 7 runs, 1000 loops each) (mean ± std. dev. of 7 runs, 1000 loops each)
# After # After
503 µs ± 4.37 µs per loop 407 µs ± 5.51 µs per loop
(mean ± std. dev. of 7 runs, 1000 loops each) (mean ± std. dev. of 7 runs, 1000 loops each)

INTRODUCTION TO MONGODB IN PYTHON


Learn more: ask your collection and your queries
db.laureates.index_information() # always an index on "_id" field db.laureates.create_index([("firstname", 1), ("bornCountry", 1)])
db.laureates.find(
{"firstname": "Marie"}, {"bornCountry": 1, "_id": 0}).explain()
{'_id_': {'v': 2, 'key': [('_id', 1)], 'ns': 'nobel.laureates'}}

...
'winningPlan': {'stage': 'PROJECTION',
db.laureates.find( 'transformBy': {'bornCountry': 1, '_id': 0},
{"firstname": "Marie"}, {"bornCountry": 1, "_id": 0}).explain() 'inputStage': {'stage': 'IXSCAN',
'keyPattern': {'firstname': 1, 'bornCountry': 1},
'indexName': 'firstname_1_bornCountry_1',
...
...
'winningPlan': {'stage': 'PROJECTION',
'transformBy': {'bornCountry': 1, '_id': 0},
'inputStage': {'stage': 'COLLSCAN',
...

INTRODUCTION TO MONGODB IN PYTHON


Let's practice!
INTRODUCTION TO MONGODB IN PYTHON
Limits and Skips
with Sorts, Oh My!
INTRODUCTION TO MONGODB IN PYTHON

Donny Winston
Instructor
Limiting our exploration
for doc in db.prizes.find({}, ["laureates.share"]): for doc in db.prizes.find({"laureates.share": "3"}, limit=3):
share_is_three = [laureate["share"] == "3" print("{year} {category}".format(**doc))
for laureate in doc["laureates"]]
assert all(share_is_three) or not any(share_is_three)
2017 chemistry
2017 medicine
for doc in db.prizes.find({"laureates.share": "3"}): 2016 chemistry
print("{year} {category}".format(**doc))

2017 chemistry
2017 medicine
2016 chemistry
2015 chemistry
2014 physics
2014 chemistry
2013 chemistry
...

INTRODUCTION TO MONGODB IN PYTHON


Skips and paging through results
for doc in db.prizes.find({"laureates.share": "3"}, limit=3): for doc in db.prizes.find({"laureates.share": "3"}, skip=6, limit=3):
print("{year} {category}".format(**doc)) print("{year} {category}".format(**doc))

2017 chemistry 2013 chemistry


2017 medicine 2013 medicine
2016 chemistry 2013 economics

for doc in db.prizes.find({"laureates.share": "3"}, skip=3, limit=3):


print("{year} {category}".format(**doc))

2015 chemistry
2014 physics
2014 chemistry

INTRODUCTION TO MONGODB IN PYTHON


Using cursor methods for {sort, skip, limit}
for doc in db.prizes.find({"laureates.share": "3"}).limit(3): for doc in (db.prizes.find({"laureates.share": "3"})
print("{year} {category}".format(**doc)) .sort([("year", 1)])
.skip(3)
.limit(3)):
2017 chemistry
print("{year} {category}".format(**doc))
2017 medicine
2016 chemistry
1954 medicine
1956 physics
for doc in (db.prizes.find({"laureates.share": "3"}).skip(3).limit(3)):
1956 medicine
print("{year} {category}".format(**doc))

2015 chemistry
2014 physics
2014 chemistry

INTRODUCTION TO MONGODB IN PYTHON


Simpler sorts of sort
cursor1 = (db.prizes.find({"laureates.share": "3"}).skip(3).limit(3)
.sort([("year", 1)]))

cursor2 = (db.prizes.find({"laureates.share": "3"}).skip(3).limit(3)


.sort("year", 1))

cursor3 = (db.prizes.find({"laureates.share": "3"}).skip(3).limit(3)


.sort("year"))

docs = list(cursor1)
assert docs == list(cursor2) == list(cursor3)
for doc in docs:
print("{year} {category}".format(**doc))

1954 medicine
1956 physics
1956 medicine

doc = db.prizes.find_one({"laureates.share": "3"},


skip=3, sort=[("year", 1)])
print("{year} {category}".format(**doc))

INTRODUCTION TO MONGODB IN PYTHON


Limit or Skip
Practice? Exactly.
INTRODUCTION TO MONGODB IN PYTHON

You might also like