NoSQL M1
NoSQL M1
• 4. Cost-effectiveness:
• Horizontal scaling
• Cloud-based deployment:
• 5. Real-time applications:
• Big data analytics:
The Value of Relational Databases 4
• The dot-com bubble burst in the early 2000s. However, large web properties continued
to grow significantly in scale. This growth led to increased demand for computing
resources.
• Scaling Challenges with Relational Databases:
• Vertical scaling: Limited by the cost and physical constraints of larger machines.
• Horizontal scaling: Relational databases were not designed for clusters, leading to technical
and licensing challenges.
• Sharding: While a solution, it introduces complexities and limitations.
• The Rise of NoSQL:
• Google's BigTable: Introduced a new approach to distributed data storage.
• Amazon's Dynamo: Further contributed to the development of NoSQL databases.
• Designed for clusters: NoSQL databases were explicitly designed to operate in a clustered
environment.
• Implications:
• The threat from clusters posed a serious challenge to the dominance of relational databases.
• NoSQL databases emerged as viable alternatives for organizations dealing with large-scale data
and distributed systems.
The Emergence of NoSQL (1) 8
The term "NoSQL" first gained prominence in the late 2009s by Johan Oskarsson.
Definition: While there's no universally accepted definition, NoSQL databases are generally
characterized by their non-relational nature, open-source origins, and suitability for clustered
environments.
Characteristics:
• Non-relational: Do not use SQL as their primary query language.
• Open-source: Most NoSQL databases are open-source projects.
• Cluster-oriented: Designed to operate in distributed environments.
• Schema-less: Allow flexible data structures without predefined schemas.
Benefits:
• Scalability: Can handle large volumes of data efficiently.
• Flexibility: Accommodate diverse data structures and evolving requirements.
• Performance: Optimized for high-performance operations.
The Emergence of NoSQL (2) 9
• Polyglot Persistence: The rise of NoSQL has led to a shift towards polyglot persistence, where
organizations use a mix of data storage technologies to meet their specific needs.
• Reasons to Consider NoSQL:
• Handling large-scale data and performance demands.
• Improving application development productivity through a more convenient data interaction
style.
10
Next Class 10
• Module -1 : Chapter 2
NoSQL Database (21CS745)
• Storage Model
• Details how data is stored and managed internally.
• Focuses on performance optimization.
Understanding Data Models (2) 14
• 4. Aggregate Orientation
• Definition:
• Data organized into aggregates or collections of related data.
• Each NoSQL model represents different ways to structure and access
aggregates.
• Visual Aids
• Diagram of Relational Model (Tables and Relationships)
• Icons or simple diagrams for Key-Value, Document, Column-Family, and
Graph Databases
Aggregates - Relational Model vs. Aggregate Orientation 16
// in customers
{ "id":1,
"name":"Martin", "billingAddress":[{"city":"Chicago"}]
}
// in orders
{ "id":99,
"customerId":1, "orderItems":[
{
"productId":27, "price": 32.45,
"productName": "NoSQL Distilled"
} ],
"shippingAddress":[{"city":"Chicago"}], "orderPayment":[
{
"ccinfo":"1000-1000-1000-1000",
"txnId":"abelif879rft",
"billingAddress": {"city": "Chicago"}
}],
}
Alternate model 21
1. Understanding Aggregates
• Definition: An aggregate is a collection of related data that is treated as a single
unit, such as an order with its items, payment, and shipping information.
• Role in Application: The concept of aggregates is critical because it aligns with
how applications interact with data, focusing on cohesive units of behavior
rather than isolated data points.
2. Aggregate-Ignorance in Relational and Graph Databases
• Relational Databases: These systems do not inherently recognize aggregates,
instead relying on foreign key relationships to represent data connections. This
limits their ability to optimize data storage and retrieval based on usage patterns.
• Graph Databases: Similar to relational databases, graph databases are also
aggregate-ignorant, which can hinder performance when complex relationships
and aggregations are needed.
Consequences of Aggregate Orientation 23
5. Transaction Models
• ACID Transactions: Traditional relational databases support ACID transactions,
allowing for multi-table manipulations that maintain atomicity and consistency
across diverse data.
• NoSQL Considerations: Aggregate-oriented databases often support atomic
transactions within single aggregates but do not provide built-in mechanisms for
multi-aggregate transactions. This necessitates application-level management for
atomicity when required.
6. Best Practices
• Define Clear Aggregates: When designing data models, aim to create aggregates that represent
logical units of work for the application.
• Be Aware of Query Needs: Consider how data will be accessed and manipulated, balancing the
benefits of aggregate-oriented design with the need for flexibility in querying.
• Transaction Management: When working with NoSQL databases, carefully assess your
application’s transactional needs and implement mechanisms to handle cross-aggregate
transactions if necessary.
Key-Value and Document Data Models 25
• At this point, we’ve covered enough material to give you a reasonable overview of the
three different styles of aggregate-oriented data models and how they differ. What they
all share is the notion of an aggregate indexed by a key that you can use for lookup. This
aggregate is central to running on a cluster, as the database will ensure that all the data
for an aggregate is stored together on one node. The aggregate also acts as the atomic
unit for updates, providing a useful, if limited, amount of transactional control.
• Within that notion of aggregate, we have some differences. The key-value data model
treats the aggregate as an opaque whole, which means you can only do key lookup for
the whole aggregate—you cannot run a query nor retrieve a part of the aggregate.
• The document model makes the aggregate transparent to the database allowing you to
do queries and partial retrievals. However, since the document has no schema, the
database cannot act much on the structure of the document to optimize the storage and
retrieval of parts of the aggregate.
• Column-family models divide the aggregate into column families, allowing the database
to treat them as units of data within the row aggregate. This imposes some structure on
the aggregate but allows the database to take advantage of that structure to improve its
accessibility.
29
Next Class 29
• Module -1 : Chapter 3
NoSQL Database (21CS745)
• So far we’ve covered the key feature in most NoSQL databases: their use of
aggregates and how aggregate-oriented databases model aggregates in different
ways.
• While aggregates are a central part of the NoSQL story, there is more to the data
modeling side than that.
• Aggregates are useful in that they put together data that is commonly accessed
together. But there are still lots of cases where data that’s related is accessed
differently.
Relationships (1) 33
• Consider the relationship between a customer and all of his orders. Some applications
will want to access the order history whenever they access the customer; this fits in well
with combining the customer with his order history into a single aggregate.
• Other applications, however, want to process orders individually and thus model orders
as independent aggregates.
• In this case, you’ll want separate order and customer aggregates but with some kind of
relationship between them so that any work on an order can look up customer data.
• The simplest way to provide such a link is to embed the ID of the customer within the
order’s aggregate data.
• That way, if you need data from the customer record, you read the order, ferret out the
customer ID, and make another call to the database to read the customer data.
Relationships (2) 34
• Schema-less Databases:
• No fixed schema
• Various data models (key-value, document, column-family)
• Adaptable to changes
Schema-less Database (2) 39
• Essentially, a schemaless database shifts the schema into the application code
that accesses it. This becomes problematic if multiple applications, developed by
different people, access the same database.
• These problems can be reduced with a couple of approaches. One is to
encapsulate all database interaction within a single application and integrate it
with other applications using web services.
• This fits in well with many people’s current preference for using web services for
integration. Another approach is to clearly delineate different areas of an
aggregate for access by different applications.
• These could be different sections in a document database or different column
families an a column-family database.
Schema-less Database (4) 43
Implementation Strategies:
• Eager Updates: The materialized view is updated in real time with base data changes. This is
optimal when reads significantly outnumber writes, ensuring freshness.
• Batch Updates: Materialized views are updated at intervals, reducing the overhead on each write
operation. This is suitable when some staleness is acceptable.
Creation and Management:
• Materialized views can be created directly in the database, where the database engine handles the
computation and storage. This often includes configuration parameters to control when and how
updates occur.
• They can also be generated externally by extracting data, computing the view, and writing it back
to the database.
Usage within Aggregates:
• Materialized views can be integrated within the same aggregate, like including a summary within
an order document, which enhances query efficiency.
Column Family Databases:
• In column-family databases, utilizing different column families for materialized views allows
atomic updates, ensuring data consistency while improving read performance.
47
• DEMO of NOSQL
48
Next Class 48
• Module -1 : Chapter 3