0% found this document useful (0 votes)
12 views10 pages

BDA Unit 3 Notes

MongoDB is a NoSQL, document-oriented database that uses BSON format for flexible data storage, enabling high performance and scalability. Its architecture includes components like the MongoDB server, databases, collections, and documents, allowing for schema-less design and horizontal scaling through sharding. Compared to traditional relational databases, MongoDB offers advantages in handling unstructured data, rapid development, and built-in high availability through replication.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
12 views10 pages

BDA Unit 3 Notes

MongoDB is a NoSQL, document-oriented database that uses BSON format for flexible data storage, enabling high performance and scalability. Its architecture includes components like the MongoDB server, databases, collections, and documents, allowing for schema-less design and horizontal scaling through sharding. Compared to traditional relational databases, MongoDB offers advantages in handling unstructured data, rapid development, and built-in high availability through replication.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 10

What is MongoDB?

MongoDB is a NoSQL database that stores data in a flexible, JSON-like format called BSON (Binary
JSON). It is a document-oriented database that allows for high performance, scalability, and ease of
development. MongoDB is designed to handle large amounts of unstructured or semi-structured data,
making it suitable for applications that require fast reads and writes, flexible data models, and scalability
across distributed systems.

MongoDB Architecture

MongoDB has a client-server architecture that consists of the following components:

1. MongoDB Server:
o MongoDB Daemon (mongod): The main process that handles database operations,
including reads, writes, and replication. It stores and retrieves data from disk.
o Config Servers: Manage metadata in a sharded cluster and keep track of the distribution
of data across shards.
o Mongos (Routing Service): Acts as a router in a sharded cluster. It directs client requests
to the appropriate shard based on the request and the shard key.
2. Database: MongoDB databases are containers for collections and are analogous to databases in
relational databases. Each database in MongoDB has its own set of collections.
3. Collection: A collection is a group of MongoDB documents (which are similar to rows in a
relational database). Collections are schema-less, meaning documents within a collection can
have different fields and data types.
4. Document: The basic unit of data in MongoDB. It is stored in BSON format, which extends
JSON with binary encoding for efficiency. Documents are analogous to records or rows in
relational databases but are more flexible due to their schema-less nature.
5. Indexes: MongoDB supports indexing to optimize query performance. It can create indexes on
fields to speed up query operations.
6. Sharding: In large-scale applications, MongoDB can distribute data across multiple servers
through a process called sharding. This enables horizontal scaling.
7. Replication: MongoDB supports replication for high availability. It allows for multiple copies of
data, ensuring that the data is still available if a primary server goes down. This is typically done
using a replica set, where one server is the primary and others are secondaries.

How MongoDB Differs from Traditional Relational Databases

Here’s a comparison between MongoDB (NoSQL) and traditional relational databases (SQL):

Traditional Relational Databases (e.g.,


Aspect MongoDB
MySQL, PostgreSQL)
Document-based, schema-less (BSON Table-based, fixed schema (Rows and
Data Model
format) Columns)
Flexible schema; fields can vary between Fixed schema; all rows in a table must
Schema
documents in the same collection follow the same structure
Query
MongoDB Query Language (MQL) Structured Query Language (SQL)
Language
Data No joins; uses embedded documents or
Joins to combine data across tables
Relationships references
Traditional Relational Databases (e.g.,
Aspect MongoDB
MySQL, PostgreSQL)
Vertical scaling (scaling up by adding
Scalability Horizontal scaling (sharding)
more powerful hardware)
Supports multi-document transactions (since ACID-compliant transactions (for
Transactions
version 4.0) individual rows and across tables)
Consistency Eventual consistency (with replication and
Strong consistency (ACID compliance)
Model sharding)
High performance for large, unstructured Can be slower for unstructured data and
Performance
data with read-heavy workloads large-scale applications
Data stored in tables with predefined
Storage Stores data in BSON (binary JSON) format
column types

Data Storage and Management Differences

1. Data Storage:
o MongoDB: Data is stored in collections as documents in BSON format, allowing flexible
schema design. This allows different documents within the same collection to have
varying structures (different fields or data types).
o Relational Databases: Data is stored in tables with rows and columns, following a strict
schema. Each row must follow the same column structure.
2. Data Relationships:
o MongoDB: It does not use traditional joins like relational databases. Instead, it embeds
documents or uses references to other documents. This helps with performance,
especially in distributed environments.
o Relational Databases: Use joins to combine data from different tables based on
relationships, such as foreign keys. This can become performance-intensive for complex
queries.
3. Scaling:
o MongoDB: MongoDB is designed for horizontal scaling, meaning you can add more
servers to handle large datasets or heavy traffic loads. It supports sharding, which
distributes data across multiple servers to ensure scalability.
o Relational Databases: Traditionally scale vertically, meaning upgrading the existing
server (more CPU, RAM, storage). Horizontal scaling can be complex and is often not as
straightforward as with NoSQL databases like MongoDB.
4. Transactions:
o MongoDB: Initially, MongoDB lacked support for multi-document transactions, but
since version 4.0, it has added support for ACID transactions on multiple documents
within a single replica set or sharded cluster. However, its transaction model is not as
mature as relational databases.
o Relational Databases: These databases are inherently designed for strong ACID
(Atomicity, Consistency, Isolation, Durability) compliance, ensuring that transactions are
reliably executed across tables and rows.

MongoDB Data Model

MongoDB is a document-oriented NoSQL database, and its data model is fundamentally different from
relational database models. Instead of organizing data into tables and rows, MongoDB uses collections of
documents, which are more flexible and scalable. Here's an overview of the core components of the
MongoDB data model:

Key Components:

1. Database:
o A database in MongoDB is a container for collections. Each MongoDB instance can
have multiple databases. Each database is independent and can store its own collections.
o A MongoDB database doesn’t require a fixed schema, so different databases can have
different structures. This gives flexibility and allows the data model to evolve over time.
2. Collection:
o A collection is a group of documents. Collections in MongoDB are similar to tables in
relational databases, but there are key differences.
o Collections do not require a predefined schema, meaning documents in a collection do
not need to have the same structure. This is unlike relational databases, where all rows in
a table must adhere to the same schema.
o Collections are more like directories of files, and you can add or remove documents
freely without worrying about a rigid structure.
3. Document:
o A document is the basic unit of storage in MongoDB. It is a set of key-value pairs and is
stored in BSON (Binary JSON) format, which is similar to JSON but with additional data
types and more efficient storage.
o Documents can contain nested structures like arrays or even other documents, which
gives MongoDB a lot of flexibility when storing complex or hierarchical data.
o Each document has a unique _id field, which is automatically generated by MongoDB
unless specified.

Example of Data Model in MongoDB:

Let’s look at an example. Suppose we’re building an application for storing information about books in a
library.

Database: "library"

Collection: "books"

Each document inside the "books" collection could represent a single book, and could look like this:

json
Copy
{
"_id": ObjectId("1234567890"),
"title": "Moby Dick",
"author": "Herman Melville",
"published_year": 1851,
"genres": ["Fiction", "Adventure"],
"available_copies": 12,
"reviews": [
{
"user": "John",
"rating": 5,
"comment": "Amazing story!"
},
{
"user": "Jane",
"rating": 4,
"comment": "Great, but a bit long."
}
]
}

In this document:

 Title, author, published_year, genres, available_copies, and reviews are all key-value pairs.
 Reviews is an embedded array of objects that contain more detailed information about each
review.

This example illustrates the flexibility of MongoDB's data model: each book can have a different
structure, and different collections within the same database can have different data structures.

How the Data Model Facilitates Efficient Storage and Retrieval

1. Schema Flexibility:
o Efficient Storage: MongoDB's schema flexibility allows for more efficient storage. Each
document can store only the data that is relevant to it. There is no need to fill in blank
columns or create complex tables for data that may not apply to all documents (as you
would in a relational database with null values or empty fields).
o Dynamic Changes: The schema can evolve over time. For example, you can add new
fields to documents without having to perform complex database migrations, which is
often needed in relational databases when the schema changes.
2. Embedded Data:
o MongoDB allows for embedding related data directly in a document, rather than creating
separate tables and establishing relationships through foreign keys (which would be
needed in relational databases). This reduces the need for joins, making queries faster.
o Example: In the book document above, the reviews are embedded as an array within the
document itself. In a relational database, you might have a separate reviews table with a
foreign key to the books table, requiring a join to retrieve the reviews for a specific book.
With MongoDB, this embedded structure makes retrieval faster and simpler because all
the data is in a single document.
3. No Joins (or Reduced Joins):
o In MongoDB, data is typically modeled to reduce or eliminate the need for joins. Since
joins in relational databases can be computationally expensive, avoiding them can
significantly improve performance.
o MongoDB allows you to denormalize data by embedding related data within a
document. For example, rather than creating a separate table for author details and then
performing a join when querying for a book, you can embed the author's information
directly in the book document.
o Less Complex Queries: Since MongoDB documents are self-contained, queries can be
simpler, requiring fewer joins or even none at all.
4. Indexing:
o MongoDB supports indexing on fields to improve query performance. You can index
fields in documents (including embedded fields) to ensure faster lookups and efficient
queries.
o Indexing in MongoDB works similarly to indexing in relational databases, but
MongoDB's flexibility allows for indexing on nested fields (such as elements in arrays or
embedded documents), which can further optimize complex queries.
5. Horizontal Scalability:
o MongoDB’s data model supports sharding, which is the process of distributing data
across multiple servers. This enables horizontal scaling, which means the database can
scale out across many machines to handle large datasets and heavy traffic loads.
o Sharding is particularly useful for applications that need to store vast amounts of data or
handle high throughput, and it’s easier to implement in MongoDB due to its schema-less
design and document-based data model.
6. Replication and High Availability:
o MongoDB supports replication through replica sets. A replica set is a group of
MongoDB servers that maintain the same data. Replica sets provide redundancy and high
availability. In case of server failure, one of the secondary replicas can automatically take
over as the primary server without interrupting access to the data.

How MongoDB's Data Model Improves Storage and Retrieval Efficiency Compared to Relational
Databases

 No Schema Overhead: Relational databases require a fixed schema and predefined tables. Every
change in structure (like adding a new column) might need complex schema changes and data
migration. MongoDB allows dynamic changes in data structure without any overhead, saving
time and complexity.
 Reduced Complexity in Queries: In relational databases, performing operations on related data
(like users and orders) requires complex joins and often results in performance bottlenecks,
especially with large datasets. MongoDB reduces the need for joins by embedding related data
inside documents, which simplifies queries and improves performance.
 Efficient Use of Storage: Relational databases typically use more storage because of
normalization, where data is split across multiple tables, often with empty or NULL values.
MongoDB allows denormalization (embedding data), which can minimize the storage footprint
by avoiding the need for empty fields.
 Scalability: Relational databases scale vertically (adding resources to a single server), but
MongoDB scales horizontally by distributing data across multiple servers. The flexible, self-
contained structure of MongoDB documents makes this process easier to manage.

Reasons Why Developers and Organizations Prefer MongoDB:

1. Schema Flexibility:
o MongoDB is schema-less, meaning documents within a collection do not have to adhere
to the same structure. This is highly beneficial for modern applications where the data
model may evolve rapidly, and flexibility is key. In relational databases, adding new
columns or modifying the schema requires migrations, which can be complex and time-
consuming.
o For example, if you're working with data that changes frequently, MongoDB allows you
to easily add or remove fields from documents without impacting the existing data.
2. Horizontal Scalability:
o MongoDB was built with horizontal scaling in mind. As data grows, MongoDB can
distribute data across multiple servers through sharding, allowing it to handle large-
scale, high-traffic applications. This is easier to implement compared to relational
databases, which are traditionally designed for vertical scaling (adding resources to a
single server).
o Applications with large amounts of unstructured or semi-structured data (e.g., social
media platforms, e-commerce sites) benefit greatly from MongoDB’s ability to scale out
across multiple machines.
3. Performance:
o MongoDB provides excellent performance for read-heavy or write-heavy workloads due
to its ability to index fields, denormalize data, and embed related data inside documents.
This reduces the need for complex joins that can slow down queries in relational
databases.
o With its support for in-memory storage engines and efficient use of indexes, MongoDB
can handle large datasets and complex queries faster than traditional relational databases.
4. Developer Productivity and Speed:
o Developers find MongoDB more intuitive and quicker to work with, especially when
dealing with evolving and unpredictable data. MongoDB's flexible data model
(documents instead of rows) aligns well with how developers design applications. With
its JSON-like structure, MongoDB maps naturally to the data structures used in modern
application development (like JSON, JavaScript, or Python dictionaries).
o It’s easy to work with, especially for agile teams that need to deploy quickly and adapt to
frequent changes in the data model.
5. Built-in High Availability:
o MongoDB supports replica sets, which provide high availability by replicating data
across multiple servers. If one server fails, another can take over, ensuring continuous
availability of data. This feature is critical for modern applications that require minimal
downtime and must remain operational even in the event of server failure.

2 Advantages of Using MongoDB for Modern Data-Driven Projects:

1. Handling Unstructured or Semi-Structured Data:


o Modern data-driven projects often deal with unstructured or semi-structured data, such as
logs, social media feeds, sensor data, or content from various sources (e.g., images,
documents, and metadata). MongoDB’s document-based model is well-suited for this
kind of data. Documents can store any type of data, including nested arrays or objects,
which makes it easier to model complex data structures without predefined schemas.
o This flexibility allows businesses to work with data that doesn’t fit neatly into a relational
database table structure, such as data that evolves over time.
2. Rapid Development and Agile Prototyping:
o MongoDB enables rapid application development due to its flexible schema and easy-
to-use APIs. Developers can start with a basic application and iterate quickly without
worrying about creating complex database schemas upfront. This is particularly
beneficial in startups or businesses that need to prototype and experiment with new
features quickly.
o The ability to easily modify the structure of data as the application evolves means that
MongoDB is a good choice for projects where requirements are expected to change
frequently or where agility is a key factor in development.
MongoDB Querying and Indexing

MongoDB provides a powerful and flexible querying system that is different from the SQL-based
querying used in traditional relational databases. Here's a breakdown of how MongoDB handles querying,
indexing, and how it differs from SQL databases

MongoDB Querying:

In MongoDB, querying is done using MongoDB Query Language (MQL), which allows developers to
query data in collections (documents) in a flexible and efficient way. MongoDB’s querying capabilities
are based on the structure of documents and can handle more complex data types than SQL queries.
MongoDB queries typically use JSON-like syntax for defining criteria.

Key Aspects of MongoDB Querying:

1. Basic Querying:
o MongoDB uses a key-value approach to query documents. For example, to find all
documents where the name field equals "John":

javascript
Copy
db.users.find({ name: "John" });

o MongoDB supports a wide range of query operators (like $gt, $lt, $in, $exists, etc.) for
more complex conditions. For instance:

javascript
Copy
db.orders.find({ totalAmount: { $gt: 100 } });

oThe query syntax in MongoDB is intuitive, and developers can use it directly within their
application code (especially in JavaScript, which aligns well with MongoDB's document
structure).
2. Querying Nested and Complex Data:
o Since MongoDB stores data in BSON format (similar to JSON), it allows developers to
query nested fields, arrays, or embedded documents easily. For example, if a document
has an array of objects (like reviews), you can query specific elements in that array:

javascript
Copy
db.books.find({ "reviews.rating": { $gt: 4 } });

This retrieves all books where the rating inside the reviews array is greater than 4.

3. Aggregation Framework:
o MongoDB provides a powerful Aggregation Framework for performing advanced data
manipulations, like grouping, filtering, sorting, and transforming data. It is similar to
SQL GROUP BY or JOIN operations but more flexible and efficient for handling
complex data.

javascript
Copy
db.sales.aggregate([
{ $match: { category: "electronics" } },
{ $group: { _id: "$store", totalSales: { $sum: "$amount" } } }
]);

o This framework is optimized for performance and can handle more complex data
transformations than SQL, making MongoDB well-suited for analytical workloads.
4. Text Search:
o MongoDB supports text search indexing on string fields, enabling full-text search
queries. Developers can perform searches on fields that contain large bodies of text or
search for keywords across multiple documents.

javascript
Copy
db.posts.find({ $text: { $search: "database" } });

MongoDB Indexing:

MongoDB’s indexing mechanisms are designed to speed up data retrieval by allowing faster searches on
specific fields. Indexes in MongoDB are similar to those in SQL databases, but MongoDB provides
several additional indexing types that cater to the unique structure of documents.

Types of Indexes in MongoDB:

1. Single Field Index:


o The most basic index, where an index is created on a single field in a collection. This is
similar to indexing a column in a relational database.

javascript
Copy
db.users.createIndex({ name: 1 }); // Index on the 'name' field

2. Compound Index:
o A compound index allows you to create an index on multiple fields in a single query,
which is useful for optimizing queries that filter on more than one field.

javascript
Copy
db.users.createIndex({ name: 1, age: -1 }); // Index on 'name' and 'age'

o This index helps optimize queries that need to filter by both name and age.
3. Multikey Index:
o MongoDB automatically creates a multikey index when indexing array fields. This
allows efficient querying on array elements. For example, if each document has an array
of tags:

javascript
Copy
db.products.createIndex({ tags: 1 }); // Multikey index on 'tags' array

4. Geospatial Index:
o MongoDB supports geospatial indexes for efficiently querying geographical data, such
as locations represented by coordinates (longitude, latitude).

javascript
Copy
db.places.createIndex({ location: "2dsphere" });

5. Text Index:
o As mentioned earlier, MongoDB also supports text indexes for performing full-text
searches on string fields, which is particularly useful for querying unstructured textual
data.
6. Hashed Index:
o Hashed indexes are used for sharding purposes. MongoDB uses hashed indexes to
evenly distribute documents across shards based on the values of a field.

javascript
Copy
db.users.createIndex({ _id: "hashed" });

How MongoDB’s Querying and Indexing Differs from SQL Databases:

1. Data Structure:
o MongoDB: Queries operate on flexible, schema-less documents (BSON) that can contain
nested structures, arrays, and dynamic fields. This allows developers to perform complex
queries on unstructured data directly within documents.
o SQL Databases: SQL queries are based on tables with fixed schemas (columns and
rows). Joins are often required for retrieving related data from multiple tables, which can
be more performance-intensive, especially for large datasets.
2. Querying:
o MongoDB: MongoDB allows querying with rich conditions (e.g., querying within arrays
or embedded documents) and supports complex aggregation pipelines that allow
powerful data transformations within the database.
o SQL Databases: SQL queries are based on SELECT statements with fixed conditions,
and they typically require joins to combine data from multiple tables. Complex
operations such as aggregations or filtering in relational databases often involve multiple
queries or complex joins, which can be inefficient.
3. No Joins:
o MongoDB: MongoDB minimizes the need for joins. It often uses embedding related
data directly in documents, which eliminates the performance costs of joins. However, if
necessary, it supports $lookup for performing joins (similar to SQL joins), but this is
generally less common.
o SQL Databases: Relational databases rely heavily on joins to combine related data from
multiple tables. While this is powerful for structured data, it can be inefficient for large
datasets and complex queries.

Benefits of MongoDB’s Indexing Mechanisms:

1. Improved Query Performance:


o MongoDB indexes help to speed up query execution by allowing the database to locate
documents faster based on indexed fields. Without indexing, MongoDB would need to
scan every document in a collection to find matching documents, which can be very slow
for large datasets.
o With proper indexing, MongoDB can quickly retrieve matching documents, making
searches and queries much more efficient.
2. Flexible Indexing Options:
o MongoDB supports a wide variety of index types, including compound indexes,
geospatial indexes, full-text search indexes, and more. This flexibility enables developers
to optimize queries for specific use cases (e.g., location-based queries, text search).
o The ability to create multikey indexes on array fields or geospatial indexes makes
MongoDB a great choice for applications dealing with complex data structures like
nested arrays or geographical data.
3. Scalability:
o MongoDB’s ability to index fields, including sharded keys, is crucial for horizontal
scaling. When data is sharded across multiple servers, MongoDB uses hashed or range-
based indexing to distribute the data evenly across shards, ensuring efficient querying
even in large-scale distributed systems.
4. Reduced Query Time:
o By reducing the number of documents MongoDB needs to scan, indexes significantly
reduce query execution time. For instance, text indexes allow quick full-text searches on
large datasets, while compound indexes speed up queries that filter on multiple fields.

You might also like