0% found this document useful (0 votes)
3 views

Unit II No-SQL Db Managment

The document provides an overview of NoSQL data management, covering various data models including aggregate, key-value, document, and graph databases. It explains the advantages of NoSQL over traditional relational databases, particularly in handling large volumes of data and real-time applications. Additionally, it discusses distribution models, materialized views, and the concept of schemaless databases, highlighting their significance in modern data management systems.

Uploaded by

daivshaladhepale
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

Unit II No-SQL Db Managment

The document provides an overview of NoSQL data management, covering various data models including aggregate, key-value, document, and graph databases. It explains the advantages of NoSQL over traditional relational databases, particularly in handling large volumes of data and real-time applications. Additionally, it discusses distribution models, materialized views, and the concept of schemaless databases, highlighting their significance in modern data management systems.

Uploaded by

daivshaladhepale
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

UNIT- II NOSQL Data Management

• Introduction to NoSQL, aggregate data models,


key-value and document data models,
• relationships, graph databases, schema less
databases, materialized views, distribution
• models, master-slave replication
Introduction to NoSQL
What is NoSQL?
• NoSQL Database is a non-relational Data Management System, that does not
require a fixed schema. It avoids joins, and is easy to scale. The major
purpose of using a NoSQL database is for distributed data stores with
humongous datastorage needs. NoSQL is used for Big data and real-time web
apps. For example,companies like Twitter, Facebook and Google collect
terabytes of user data everysingle day.
• NoSQL databasestands for "Not Only SQL" or "Not SQL." Though a better
term would be "NoREL", NoSQL caught on. Carl Strozz introduced the NoSQL
concept in 1980
• Traditional RDBMS uses SQL syntax to store and retrieve data for further
insights.Instead, a NoSQL database system encompasses a wide range of
databasetechnologies that can store structured, semi-structured,
unstructured andpolymorphic data. Let's understand about NoSQL with a
diagram in this NoSQLdatabase tutorial
Why NoSQL?
• The concept of NoSQL databases became popular with Internet giants like
Google,Facebook, Amazon, etc. who deal with huge volumes of data. The
systemresponse time becomes slow when you use RDBMS for massive
volumes of data.To resolve this problem, we could "scale up" our systems by
upgrading ourexisting hardware. This process is expensive.The alternative for
this issue is to distribute database load on multiple hostswhenever the load
increases. This method is known as "scaling out."
• Brief History of NoSQL Databases
• 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched
• 2007- The research paper on Amazon Dynamo is released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced
AGGREGATE DATA MODELS
• The term aggregate means a collection of objects that we use to treat as a unit.
An aggregate is a collection of data that we interact with as a unit. These units
of data or aggregates form the boundaries for ACID operation.
• Aggregate Data Models in NoSQL make it easier for the Databases to manage
data storage over the clusters as the aggregate data or unit can now reside on
any of the machines. Whenever data is retrieved from the Database all the data
comes along with the Aggregate Data Models in NoSQL.

• Aggregate Data Models in NoSQL don’t support ACID transactions and sacrifice
one of the ACID properties. With the help of Aggregate Data Models in NoSQL,
you can easily perform OLAP (Online Analytical Processing)operations on the
Database.
Example of Aggregate Data Model:
Aggregation:
• Customer Aggregate: Includes the customer’s details and billing addresses.
• Order Aggregate: Contains details about the order, including the shipping address, order items,
and payments.
Denormalization:
• In the example, the BillingAddress appears multiple times (in the customer and payment). This
avoids having to look up the address in a separate place and helps ensure that the address details
are consistent.
• This is a trade-off in NoSQL. While it may involve some duplication, it reduces the need for
complex joins and improves performance.
No Need for IDs in Aggregates:
• Instead of using IDs to reference addresses and other data, the full address information is
included directly in each aggregate. This simplifies data retrieval and ensures consistency.
Relationship Between Aggregates:
• The link between a customer and their orders is maintained through the CustomerID in the order
aggregate but is not part of the customer aggregate itself.
• Similarly, the ProductName is included in the order items for simplicity, but the actual product
Embed all the objects for customer and the customer’s orders Using the above data model
key-value and document data
models
• Key-value and document databases were strongly aggregate-oriented. these
databases as primarily constructed through aggregates. Both of these types of databases
consist of lots of aggregates with each aggregate having a key or ID that’s used to get at
the data
• The two models differ in that in a key-value database, the aggregate is opaque to the
database—just some big blob of mostly meaningless bits
• In practice, the line between key-value and document gets a bit blurry. People often
putan ID field in a document database to do a key-value style lookup. Databases
classified as key-value databases may allow you structures for data beyond just
an opaque aggregate.
• Forexample, Riak allows you to add metadata to aggregates for indexing and
interaggregate links,
• Redis allows you to break down the aggregate into lists or sets. You can support querying
byintegrating search tools such as Solr. As an example, Riak includes a search facility that
usesSolr-like searching on any aggregates that are stored as JSON or XML structure
• Data is stored in key/value pairs. It is designed in such a way to handle lots of
data and heavy load.Key-value pair storage databases store data as a hash table
where each key isunique, and the value can be a JSON, BLOB(Binary Large
Objects), string, etc.For example, a key-value pair may contain a key like
"Website" associated with avalue like "JavaTpoint".

• It is one of the most basic NoSQL database


example. This kind of NoSQL databaseis
used as a collection, dictionaries,
associative arrays, etc. Key value stores
helpthe developer to store schema-less
data.
• They work best for shopping
cartcontents.Redis, Dynamo, Riak are some
NoSQL examples of key-value store
DataBases.They are all based on Amazon's
Dynamo paper.
Document data models
• A Document Data Model is a lot different than other data models because it stores data in
JSON, BSON, or XML documents. in this data model, we can move documents under one
• document and apart from this, any particular elements can be indexed to run queries faster.
• Often documents are stored and retrieved in such a way that it becomes close to the data
objects
• which are used in many applications which means very less translations are required to use
• data in applications. JSON is a native language that is often used to store and query data too.
• So in the document data model, each document has a key-value pair below is an example for
the same.
{
"Name" : "Yashodhra",
"Address" : "Near Patel Nagar",
"Email" : "[email protected]",
"Contact" : "12345"
}
• Document-Oriented:
• Document-Oriented NoSQL DB stores and retrieves data as a key value
pair butthe value part is stored as a document. The document is stored in
JSON or XMLformats. The value is understood by the DB and can be
queried.
graph databases
• A graph database is a type of NoSQL database that is designed to handle data with
complex relationships and interconnections. In a graph database, data is stored as nodes
and edges,
• where nodes represent entities and edges represent the relationships between those
entities
The description of components are as follows:
Nodes: represent the objects or instances. They are equivalent to a row in database. The
node
basically acts as a vertex in a graph. The nodes are grouped by applying a label to each
member.
Relationships: They are basically the edges in the graph. They have a specific direction,
type
and form patterns of the data. They basically establish relationship between nodes.
Properties: They are the information associated with the node
• Graph-Based
• A graph type database stores entities as well the relations amongst those
entities.The entity is stored as a node with the relationship as edges. An
edge gives a relationship between nodes. Every node and edge has a
unique identifier
• Compared to a relational database where tables
are loosely connected, a Graphdatabase is a
multi-relational in nature.
• Traversing relationship is fast as they arealready
captured into the DB, and there is no need to
calculate them.
• Graph base database mostly used for social
networks, logistics, spatial data.Neo4J, Infinite
Graph, OrientDB, FlockDB are some popular
graph-baseddatabases.
Schemaless databases
• Schemaless databases, also known as schema-free or schema-less databases,
are a type of database management system (DBMS) that allows for flexible and
dynamic data modeling without rigidly predefined schemastabase.
• schemaless databases provide a more agile and adaptable approach for storing
and querying data.
• In schemaless databases, data is typically stored in a format that does not
require a predefined schema to be specified before storing the data.
• Each document or record within the database can have its own structure, and
the database system does not enforce a specific schema on the data.
• This means that different documents within the same collection or table can
have varying structures and fields.
How does a schemaless database work?
• In schemaless databases, information is stored in JSON-style documents
which can have
• varying sets of fields with different data types for each field. So, a
collection could look like
this:
{
name : “Joe”, age : 30, interests : ‘football’ }
{
name : “Kate”, age : 25
}
What are the benefits of using a schemaless database?
• Greater flexibility over data types:- schemaless databases can store, retrieve,
and query any data type — perfect for big data analytics and similar
operations that are powered
• No pre-defined database schemas
• No data truncation:-A schemaless database makes almost no changes to
your data
• Suitable for real-time analytics functions
• Enhanced scalability and flexibility
Materialized views
• Materialized views, also known as materialized or indexed views, are
database objects that store the results of a query in a precomputed and
persistent form. They are derived from one or more source tables or views
and are used to improve query performance by providing faster access to
frequently queried or complex data.
• In a traditional database system, queries often involve joining multiple tables
or performing complex calculations, which can be resource-intensive and
time-consuming.
• Materialized views address this issue by precomputing and storing the results
of such queries, allowing subsequent queries to retrieve the data directly
from the materialized view instead of reexecuting the original query
1. Data Storage: Materialized views store the actual result set of a query, typically as a
table-like structure in the database. The data in the materialized view is updated
periodically to reflect changes in the underlying source tables.
2. Query Performance: By storing precomputed results, materialized views eliminate
the need for executing complex queries repeatedly. This improves query performance
by reducing the processing and computation time required for data retrieval.
3. Data Aggregation and Joins: Materialized views are commonly used for aggregating
data or joining multiple tables to simplify and optimize complex queries. They can
store the aggregated or joined results, allowing for faster access to the desired data.
4. Maintenance and Refresh: Materialized views need to be maintained and
refreshed to reflect changes in the underlying data. Depending on the database
system, materialized views can be refreshed on a schedule or triggered by specific
events or updates to the source data.
5. Query Rewrite: Some database systems support automatic query rewrite, where
the optimizer recognizes queries that can be satisfied using a materialized view and
rewrites the query to use the materialized view instead. This further improves
performance by transparently utilizing the materialized view.
Distribution models
• Distribution models in database systems refer to strategies for distributing data across
multiple nodes or servers in a distributed computing environment.
• These models determine how data is partitioned and replicated to ensure availability, fault
tolerance, and efficient query processing.
• Here are some commonly used distribution models:
1. Horizontal Partitioning (Sharding):
2. Replication :-i)master-slave replication
The choice of a distribution model depends on factors such as the nature of the data, access
patterns, scalability requirements, fault tolerance goals, and performance considerations. It's
crucial to analyze the characteristics of the application and workload to determine the most
suitable distribution model for a given scenario.
MongoDB, and Apache Hadoop, provide mechanisms to implement these distribution
models
• Sharding

• Sharding involves dividing a large database into smaller, more manageable parts
called shards or partitions.
• Each shard contains a subset of the data and is stored on a separate node or server in
the distributed system.
• The sharding process typically involves selecting a shard key or partitioning key,
which determines how data is distributed across shards.
• The goal of sharding is to evenly distribute data to avoid bottlenecks and enable
horizontal scalability.
• Sharding is commonly used in NoSQL databases to handle large-scale datasets and
achieve better performance and scalability.
Replication
• Replication involves maintaining multiple copies (replicas) of data across different
nodes in the distributed database cluster.
• Each replica is an exact copy of the data stored on a separate server.
• Replication enhances data availability, fault tolerance, and read performance by
allowing data to be served from multiple replicas.
• Different replication models include master-slave replication and multi-master
replication.
• In master-slave replication, one node (master) accepts write operations and
asynchronously propagates changes to one or more replica nodes (slaves). Read queries
can be distributed among replicas, reducing the load on the master.
• In multi-master replication, multiple nodes can accept write operations, and changes
are replicated to other nodes. This approach allows for better write scaling and high
availability
Master-slave replication
Master-slave replication is a method of data replication in distributed database systems where
one node, called the master or primary node, serves as the authoritative source for data, and
one or more nodes, known as slave or secondary nodes, replicate and maintain copies of the
master's data.

In a master-slave replication setup, the master node handles write operations (inserts, updates,
deletes) and propagates those changes to the slave nodes. The slave nodes synchronize with
the master to receive and apply the changes, ensuring that they have an up-to-date copy of the
data. Slave nodes are typically read-only, meaning they do not accept write operations directly.
Important Questions
• What is NoSQL, and how does it differ from traditional relational databases?
• Why are NoSQL databases becoming increasingly popular in modern
applications?
• What are the key characteristics of NoSQL databases?
• Explain the concept of aggregate data models in NoSQL.
• Provide examples of aggregate data models in NoSQL.
• Describe the key-value data model and its structure.
• Explain the document data model and its structure.
• What is a graph database, and how does it differ from other NoSQL
databases?
• Describe the basic components of a graph database, including nodes, edges, and
properties.
• Explain the concept of schemaless databases.
• What are the benefits of using schemaless databases?
• What are the challenges of working with schemaless databases?
• What are materialized views in NoSQL databases?
• How do materialized views improve query performance?
• When should you consider using materialized views?
• Why are distribution models important in NoSQL databases?
• Explain different distribution models used in NoSQL databases, such as sharding and
replication.
• What are the factors to consider when choosing a distribution model?
• What is master-slave replication, and how does it work?
• When is master-slave replication suitable, and when might other replication
• Query: Retrieve all documents from a MongoDB collection where the
"status" field is "active".
• Query: Create a collection in MongoDB that aggregates orders by
customer ID.
• Insert a new document into a MongoDB collection representing a blog
post with title, content, and tags.
• create a Materialized views where find greater salary.

You might also like