0% found this document useful (0 votes)
4 views

NoSQL M1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
4 views

NoSQL M1

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 48

NoSQL Database (21CS745)

Module -1 : Introduction to NoSQL

Dr. Rama Satish K V


Associate Professor
Department of AI & ML
RNSIT, Bengaluru
Syllabus 2
Why NoSQL 3

• NoSQL databases have become increasingly popular due to their ability to


handle large volumes of unstructured or semi-structured data more efficiently
than traditional relational databases.
• 1. Scalability:
• 2. Flexibility:Schema-less or flexible schemas
• 3. Performance:
• Optimized for large datasets:
• Distributed data storage:

• 4. Cost-effectiveness:
• Horizontal scaling
• Cloud-based deployment:

• 5. Real-time applications:
• Big data analytics:
The Value of Relational Databases 4

• Getting at Persistent Data


• Concurrency
• Integration
• Standard Model
Impedance Mismatch 5

• Impedance mismatch refers to the


challenges that arise when trying to
integrate data between systems that
use different data models or
paradigms.
Application and Integration Databases 6

•Historical Role of Relational Databases:


•Served as central integration points for multiple applications.
•Provided a common data repository.
•Challenges of Integration Databases:
•Complexity due to accommodating diverse application needs.
•Difficulty in ensuring data integrity across multiple applications.
•Rise of Application Databases:
•Databases dedicated to a single application.
•Greater control and flexibility.
•Integration Technologies:
•Web services and other integration mechanisms.
•Decoupling of applications from shared data.
•Data Model Flexibility:
•Choice of relational or non-relational databases based on application requirements.
•Future Trends:
•Hybrid approaches combining relational and non-relational databases.
•Continued evolution of integration technologies.
•Emphasis on data governance and security.
Attack of the Clusters 7

• The dot-com bubble burst in the early 2000s. However, large web properties continued
to grow significantly in scale. This growth led to increased demand for computing
resources.
• Scaling Challenges with Relational Databases:
• Vertical scaling: Limited by the cost and physical constraints of larger machines.
• Horizontal scaling: Relational databases were not designed for clusters, leading to technical
and licensing challenges.
• Sharding: While a solution, it introduces complexities and limitations.
• The Rise of NoSQL:
• Google's BigTable: Introduced a new approach to distributed data storage.
• Amazon's Dynamo: Further contributed to the development of NoSQL databases.
• Designed for clusters: NoSQL databases were explicitly designed to operate in a clustered
environment.
• Implications:
• The threat from clusters posed a serious challenge to the dominance of relational databases.
• NoSQL databases emerged as viable alternatives for organizations dealing with large-scale data
and distributed systems.
The Emergence of NoSQL (1) 8

The term "NoSQL" first gained prominence in the late 2009s by Johan Oskarsson.
Definition: While there's no universally accepted definition, NoSQL databases are generally
characterized by their non-relational nature, open-source origins, and suitability for clustered
environments.
Characteristics:
• Non-relational: Do not use SQL as their primary query language.
• Open-source: Most NoSQL databases are open-source projects.
• Cluster-oriented: Designed to operate in distributed environments.
• Schema-less: Allow flexible data structures without predefined schemas.
Benefits:
• Scalability: Can handle large volumes of data efficiently.
• Flexibility: Accommodate diverse data structures and evolving requirements.
• Performance: Optimized for high-performance operations.
The Emergence of NoSQL (2) 9

• Polyglot Persistence: The rise of NoSQL has led to a shift towards polyglot persistence, where
organizations use a mix of data storage technologies to meet their specific needs.
• Reasons to Consider NoSQL:
• Handling large-scale data and performance demands.
• Improving application development productivity through a more convenient data interaction
style.
10

Next Class 10

• Module -1 : Chapter 2
NoSQL Database (21CS745)

Module -1 : Introduction to NoSQL

Dr. Rama Satish K V


Associate Professor
Department of AI & ML
RNSIT, Bengaluru
Syllabus 12
Understanding Data Models (1) 13

• 1. Data Model vs. Storage Model


• Data Model
• Defines how we perceive and interact with data.
• Example: Entity-Relationship Diagrams (ERDs).

• Storage Model
• Details how data is stored and managed internally.
• Focuses on performance optimization.
Understanding Data Models (2) 14

• 2. Relational Data Model 3. NoSQL Data Models


• Concept: •Key-Value Stores
• Data as key-value pairs.
• Data organized into tables
(relations). •Document Stores
• Tables consist of rows
• Data in documents (e.g., JSON).
(tuples) and columns •Column-Family Stores
(attributes). • Data in columns, flexible schema.
•Graph Databases
• Visualization:
• Data as nodes and edges for
• Entity-Relationship Diagrams
(ERDs). complex relationships.
Understanding Data Models (3) 15

• 4. Aggregate Orientation
• Definition:
• Data organized into aggregates or collections of related data.
• Each NoSQL model represents different ways to structure and access
aggregates.

• Visual Aids
• Diagram of Relational Model (Tables and Relationships)
• Icons or simple diagrams for Key-Value, Document, Column-Family, and
Graph Databases
Aggregates - Relational Model vs. Aggregate Orientation 16

• Relational Model • Aggregate Orientation


• Structure: • Concept:
• Data as simple tuples (rows). • Data organized into complex aggregates.
• No nesting or lists within • Supports nested records and lists.
tuples. • Benefits:
• Key Concept: • Facilitates atomic operations,
• Operations are on tuples; consistency, replication, and sharding.
simplicity in data handling. • Simplifies data manipulation for
applications.
• NoSQL Types:
• Key-Value: Key-value pairs.
• Document: JSON or similar documents.
• Column-Family: Flexible column-based
structure.
Example of Relations and Aggregates(1) 17
Example of Relations and Aggregates(2) 18
Example of Relations and Aggregates(3) 19
Example of Relations and Aggregates(4) 20

// in customers
{ "id":1,
"name":"Martin", "billingAddress":[{"city":"Chicago"}]
}
// in orders
{ "id":99,
"customerId":1, "orderItems":[
{
"productId":27, "price": 32.45,
"productName": "NoSQL Distilled"
} ],
"shippingAddress":[{"city":"Chicago"}], "orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
}],
}
Alternate model 21

{"customer": { "id": 1, "name": "Martin",


"billingAddress": [{"city": "Chicago"}],
"orders": [{ "id":99, "customerId":1,
"orderItems":[{
"productId":27, "price": 32.45,
"productName": "NoSQL Distilled"
}],
"shippingAddress":[{"city":"Chicago"}]
"orderPayment":[{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft", "billingAddress":
{"city": "Chicago"}
}],
}]
} // close customer
} // close main
Consequences of Aggregate Orientation 22

1. Understanding Aggregates
• Definition: An aggregate is a collection of related data that is treated as a single
unit, such as an order with its items, payment, and shipping information.
• Role in Application: The concept of aggregates is critical because it aligns with
how applications interact with data, focusing on cohesive units of behavior
rather than isolated data points.
2. Aggregate-Ignorance in Relational and Graph Databases
• Relational Databases: These systems do not inherently recognize aggregates,
instead relying on foreign key relationships to represent data connections. This
limits their ability to optimize data storage and retrieval based on usage patterns.
• Graph Databases: Similar to relational databases, graph databases are also
aggregate-ignorant, which can hinder performance when complex relationships
and aggregations are needed.
Consequences of Aggregate Orientation 23

3. Advantages of Aggregate Orientation


• Clarity in Data Use: By defining aggregates, developers can clearly articulate
how data should be manipulated and stored, improving application design.
• Optimization for Clusters: Knowing how data aggregates will be used allows
databases to store related data on the same nodes, reducing query complexity
and enhancing performance in distributed systems.
4. Challenges of Aggregate Structures
• Complex Queries: While aggregates are beneficial for specific operations (like
order processing), they complicate scenarios such as sales analysis, where data
from multiple aggregates is required.
• Data Boundary Issues: Defining aggregate boundaries can be difficult,
particularly when the same data serves different purposes in various contexts.
Consequences of Aggregate Orientation 24

5. Transaction Models
• ACID Transactions: Traditional relational databases support ACID transactions,
allowing for multi-table manipulations that maintain atomicity and consistency
across diverse data.
• NoSQL Considerations: Aggregate-oriented databases often support atomic
transactions within single aggregates but do not provide built-in mechanisms for
multi-aggregate transactions. This necessitates application-level management for
atomicity when required.
6. Best Practices
• Define Clear Aggregates: When designing data models, aim to create aggregates that represent
logical units of work for the application.
• Be Aware of Query Needs: Consider how data will be accessed and manipulated, balancing the
benefits of aggregate-oriented design with the need for flexibility in querying.
• Transaction Management: When working with NoSQL databases, carefully assess your
application’s transactional needs and implement mechanisms to handle cross-aggregate
transactions if necessary.
Key-Value and Document Data Models 25

• Key-Value Data Model • Document Data Model


• Definition: • Definition:
• Stores data as a collection of key- • Stores data as documents, typically in
value pairs. JSON or BSON format.
• Characteristics: • Characteristics:
• Simple and fast retrieval. • Supports nested structures and varied
• Highly scalable and performant data types.
for specific queries. • More flexible schema, allowing for

• Use Cases: dynamic data representation.


• Caching, session storage, user • Use Cases:
preferences. • Content management systems, e-
commerce catalogs, social media
applications.
Column-Family Stores 26

• Definition: A type of NoSQL database that • Use Cases


stores data in columns rather than rows,
grouping related columns into families. • Big Data Applications:
• Ideal for data warehousing and
• Key Features
analytics.
• Schema Flexibility:
• Real-time Analytics:
• Supports varied column structures; each
row can have a different set of columns. • Suitable for applications needing
fast access to large datasets (e.g., IoT
• Efficient Read/Write:
data, recommendation engines).
• Optimized for reading and writing large
volumes of data, particularly for analytical • Popular Examples
queries. • Apache Cassandra
• Data Model: • HBase
• Organized into Column Families
containing rows and columns.
• Columns can be added dynamically
without altering the existing schema.
27
Summarizing Aggregate-Oriented Databases 28

• At this point, we’ve covered enough material to give you a reasonable overview of the
three different styles of aggregate-oriented data models and how they differ. What they
all share is the notion of an aggregate indexed by a key that you can use for lookup. This
aggregate is central to running on a cluster, as the database will ensure that all the data
for an aggregate is stored together on one node. The aggregate also acts as the atomic
unit for updates, providing a useful, if limited, amount of transactional control.
• Within that notion of aggregate, we have some differences. The key-value data model
treats the aggregate as an opaque whole, which means you can only do key lookup for
the whole aggregate—you cannot run a query nor retrieve a part of the aggregate.
• The document model makes the aggregate transparent to the database allowing you to
do queries and partial retrievals. However, since the document has no schema, the
database cannot act much on the structure of the document to optimize the storage and
retrieval of parts of the aggregate.
• Column-family models divide the aggregate into column families, allowing the database
to treat them as units of data within the row aggregate. This imposes some structure on
the aggregate but allows the database to take advantage of that structure to improve its
accessibility.
29

Next Class 29

• Module -1 : Chapter 3
NoSQL Database (21CS745)

Module -1 : Introduction to NoSQL

Dr. Rama Satish K V


Associate Professor
Department of AI & ML
RNSIT, Bengaluru
Syllabus 31
More Details on Data Models 32

• So far we’ve covered the key feature in most NoSQL databases: their use of
aggregates and how aggregate-oriented databases model aggregates in different
ways.
• While aggregates are a central part of the NoSQL story, there is more to the data
modeling side than that.
• Aggregates are useful in that they put together data that is commonly accessed
together. But there are still lots of cases where data that’s related is accessed
differently.
Relationships (1) 33

• Consider the relationship between a customer and all of his orders. Some applications
will want to access the order history whenever they access the customer; this fits in well
with combining the customer with his order history into a single aggregate.
• Other applications, however, want to process orders individually and thus model orders
as independent aggregates.
• In this case, you’ll want separate order and customer aggregates but with some kind of
relationship between them so that any work on an order can look up customer data.
• The simplest way to provide such a link is to embed the ID of the customer within the
order’s aggregate data.
• That way, if you need data from the customer record, you read the order, ferret out the
customer ID, and make another call to the database to read the customer data.
Relationships (2) 34

• As a result, many databases—even key-value stores—provide ways to make


these relationships visible to the database.
• Document stores make the content of the aggregate available to the database to
form indexes and queries.
• Riak, a key-value store, allows you to put link information in metadata,
supporting partial retrieval and link-walking capability. An important aspect of
relationships between aggregates is how they handle updates.
• Aggregate- oriented databases treat the aggregate as the unit of data-retrieval.
Consequently, atomicity is only supported within the contents of a single
aggregate. If you update multiple aggregates at once, you have to deal yourself
with a failure partway through.
• Relational databases help you with this by allowing you to modify multiple
records in a single transaction, providing ACID guarantees while altering many
rows.
Relationships (3) 35

• All of this means that aggregate-oriented databases become more awkward as


you need to operate across multiple aggregates.
• This may imply that if you have data based on lots of relationships, you should
prefer a relational database over a NoSQL store. While that’s true for aggregate-
oriented databases, it’s worth remembering that relational databases aren’t all
that stellar with complex relationships either.
• While you can express queries involving joins in SQL, things quickly get very
hairy—both with SQL writing and with the resulting performance—as the
number of joins mounts up.
Graph Databases (1) 36

• Most NoSQL databases were inspired by the need to


run on clusters, which led to aggregate-oriented data
models of large records with simple connections.
• Graph databases are motivated by a different
frustration with relational databases and thus have
an opposite model—small records with complex
interconnections.
• A graph database is a database that stores data in a
graph structure, using nodes, edges, and properties
to represent relationships between data entities.
Graph databases are different from relational
databases, which store data in tables, and are often
better suited to modeling real-world scenarios.
Graph Databases (2) 37

• Graph databases are designed for


capturing large amounts of complex
relationship data, such as social
networks or product preferences.
• The basic data model consists of nodes
connected by edges.
• Different graph databases offer various
mechanisms for storing data in nodes
and edges.
• Query operations in graph databases are
optimized for navigating relationships,
unlike relational databases.
• Graph databases focus on relationships
and are typically single-server systems
with ACID transactions.
Schema-less Database 38

• A schema-less database, also known as a NoSQL database, is a database that


doesn't require a predefined schema for data to conform to. This means that you
don't need to know the structure of your data before adding it to the database.
• A schema-less database allows for a flexible data model without a predefined
schema. Supports dynamic data structures.
• Traditional Databases:
• Fixed schema
• Relational model
• Data integrity and constraints

• Schema-less Databases:
• No fixed schema
• Various data models (key-value, document, column-family)
• Adaptable to changes
Schema-less Database (2) 39

Advantages of Schema-less Databases


• Flexibility: Easily accommodates changes in data structure.
• Scalability: Handles large volumes of data efficiently.
• Rapid Development: Faster prototyping and iteration.
• why relational databases have a fixed schema?
• Relational databases have a fixed schema for several key reasons:
1. Data Integrity
• A fixed schema enforces data types and constraints, ensuring that the data
adheres to defined rules (e.g., primary keys, foreign keys).
• This helps maintain consistency and accuracy across the database.
2. Normalization
• Relational databases use normalization techniques to reduce data redundancy and
dependency.
• A fixed schema helps organize data into related tables, allowing for efficient storage
and retrieval.
Schema-less Database (3) 40

3. Structured Query Language (SQL)


• SQL relies on a well-defined schema to perform operations like SELECT,
INSERT, UPDATE, and DELETE.
• The structure allows the database engine to optimize query execution and
access paths.
4. Predictable Relationships
• A fixed schema clearly defines relationships between tables, making it easier
to manage joins and complex queries.
• This predictability is crucial for applications requiring strict relational data
modeling.
5. Data Management
• A consistent schema simplifies data management tasks such as backups,
migrations, and maintenance.
• It provides a clear structure for developers and administrators to understand
the database design.
Accessing Schemaless data 41

foreach (Record r in records)


{
foreach (Field f in r.fields)
{
print (f.name, f.value)
}
}
Schema-less Database (4) 42

• Essentially, a schemaless database shifts the schema into the application code
that accesses it. This becomes problematic if multiple applications, developed by
different people, access the same database.
• These problems can be reduced with a couple of approaches. One is to
encapsulate all database interaction within a single application and integrate it
with other applications using web services.
• This fits in well with many people’s current preference for using web services for
integration. Another approach is to clearly delineate different areas of an
aggregate for access by different applications.
• These could be different sections in a document database or different column
families an a column-family database.
Schema-less Database (4) 43

• Schemalessness does have a big impact on changes of a database’s structure over


time, particularly for more uniform data.
• Although it’s not practiced as widely as it ought to be, changing a relational
database’s schema can be done in a controlled way.
• Similarly, you have to exercise control when changing how you store data in a
schemaless database so that you can easily access both old and new data.
• Furthermore, the flexibility that schemalessness gives you only applies within an
aggregate—if you need to change your aggregate boundaries, the migration is
every bit as complex as it is in the relational case.
Materialized views (1) 44

• Materialized views are a powerful concept in database management, especially


when working with aggregate-oriented data models. They help bridge the gap
between the need for efficient, quick data retrieval and the challenges posed by
the inherent structure of aggregates.
Aggregate Orientation vs. Query Flexibility:
• Aggregate-oriented models group related data, which simplifies certain
operations but can complicate others, like querying for specific metrics (e.g.,
product sales over time).
• Relational databases, lacking this strict structure, allow more flexibility for
different types of queries.
Views vs. Materialized Views:
• Views are virtual tables derived from base tables, computed on-the-fly when
accessed. They help encapsulate data but can be resource-intensive to compute.
• Materialized Views are precomputed and stored on disk, allowing for faster
access at the cost of potential data staleness. They are particularly useful for read-
heavy operations.
Materialized views (2) 45

• Materialized views are a powerful concept in database management, especially


when working with aggregate-oriented data models. They help bridge the gap
between the need for efficient, quick data retrieval and the challenges posed by
the inherent structure of aggregates.
Aggregate Orientation vs. Query Flexibility:
• Aggregate-oriented models group related data, which simplifies certain
operations but can complicate others, like querying for specific metrics (e.g.,
product sales over time).
• Relational databases, lacking this strict structure, allow more flexibility for
different types of queries.
Views vs. Materialized Views:
• Views are virtual tables derived from base tables, computed on-the-fly when
accessed. They help encapsulate data but can be resource-intensive to compute.
• Materialized Views are precomputed and stored on disk, allowing for faster
access at the cost of potential data staleness. They are particularly useful for read-
heavy operations.
Materialized views (3) 46

Implementation Strategies:
• Eager Updates: The materialized view is updated in real time with base data changes. This is
optimal when reads significantly outnumber writes, ensuring freshness.
• Batch Updates: Materialized views are updated at intervals, reducing the overhead on each write
operation. This is suitable when some staleness is acceptable.
Creation and Management:
• Materialized views can be created directly in the database, where the database engine handles the
computation and storage. This often includes configuration parameters to control when and how
updates occur.
• They can also be generated externally by extracting data, computing the view, and writing it back
to the database.
Usage within Aggregates:
• Materialized views can be integrated within the same aggregate, like including a summary within
an order document, which enhances query efficiency.
Column Family Databases:
• In column-family databases, utilizing different column families for materialized views allows
atomic updates, ensuring data consistency while improving read performance.
47

• DEMO of NOSQL
48

Next Class 48

• Module -1 : Chapter 3

You might also like