NoSQL Databases
Department of Computer Science Engineering
School of Engineering
Module 1 : NoSQL Database
Architectures
• Transactions: Concurrency and Integration, ACID, NoSQL
emergence and its main features, BASE for reliable database
transactions, Achieving horizontal scalability with database
sharding, Brewers CAP theorem. Main Data models of NoSQL:
Document Data Model, Key-Value Data Model, Columnar Data
Model, Graph Data Model.
[6Hrs.] [Knowledge]
CSE2024 NoSQL Databases 2
Session 02
Topics
• Introduction to NoSQL
• Concurrency and Integration
• ACID
• NoSQL emergence and its main features
Learning Outcomes (LOs)
LO1: Understand the fundamental concepts of NoSQL
LO2: Describe the concepts of ACID properties.
LO3: Explain concurrency and its role in database transactions.
LO4: Describe the emergence of NoSQL and identify its primary features.
CSE2024 NoSQL Databases 3
Introduction
What is NoSQL?
NoSQL is a database store that does not require table-based relational
models to hold structured, unstructured, and semi-structured data.
Types of NoSQL Databases:
• Key Valued Storage
• Document Storage
• Column Storage
• Graph Storage
CSE2024 NoSQL Databases 4
Key Value Pair Based
• Data is stored in key/value pairs. It is designed in such a way to handle
lots of data and heavy load.
• Key-value pair storage databases store data as a hash table where each
key is unique, and the value can be a JSON, BLOB(Binary Large Objects),
string, etc.
CSE2024 NoSQL Databases 5
Column-based
• Column-oriented databases work on columns and are based on BigTable
paper by Google. Every column is treated separately. Values of single-
column databases are stored contiguously.
• Column-based NoSQL databases are widely used to manage data
warehouses, business intelligence and Library card catalogues.
CSE2024 NoSQL Databases 6
Document-Oriented:
• Document-Oriented NoSQL DB stores and retrieves data as a key value
pair but the value part is stored as a document. The document is stored in
JSON or XML formats. The value is understood by the DB and can be
queried.
CSE2024 NoSQL Databases 7
Graph Based
• A graph-type database stores entities as well the relations amongst those
entities. The entity is stored as a node with the relationship as edges.
An edge gives a relationship between nodes. Every node and edge has a
unique identifier.
• Graph base database mostly used for social networks, logistics, spatial data.
CSE2024 NoSQL Databases 8
Concurrency and Integration
• Concurrency Control in Database Management System is a procedure of
managing simultaneous operations without conflicting with each other.
It ensures that Database transactions are performed concurrently and
accurately to produce correct results without violating the data integrity
of the respective Database.
• Database integration is the process used to aggregate information from
multiple sources—like social media, sensor data from IoT, data
warehouses, customer transactions, and more—and share a current, clean
version of it across an organization.
CSE2024 NoSQL Databases 9
ACID
• ACID Properties are used for maintaining the integrity of the database
during transaction processing.
• The ACID in DBMS stands for Atomicity, Consistency, Isolation, and
Durability.
CSE2024 NoSQL Databases 10
A- Atomicity
• The Entire transaction takes place at once. (i.e.) Atomicity: A transaction is a
single unit of operation.
CSE2024 NoSQL Databases 11
C- Consistency
• The database must be consistent before and after the transaction.
• The word consistency means that the value should remain preserved
always.
CSE2024 NoSQL Databases 12
I- Isolation
• Multiple Transactions occur independently without interference.
• The term 'isolation' means separation.
CSE2024 NoSQL Databases 13
D - Durability
• The Changes in successful transactions occur even if the system failure
occurs.
• Durability ensures the permanency of something.
• Therefore, the ACID property of DBMS plays a vital role in maintaining the
consistency and availability of data in the database.
CSE2024 NoSQL Databases 14
Why NoSQL? And History
• The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data. The
system response time becomes slow when you use RDBMS for massive
volumes of data.
• To resolve this problem, we could “scale up” our systems by upgrading our
existing hardware. This process is expensive.
• The alternative for this issue is to distribute database load on multiple hosts
whenever the load increases. This method is known as “scaling out.”
• NoSQL database is non-relational, so it scales out better than relational
databases as they are designed with web applications in mind.
CSE2024 NoSQL Databases 15
Why NoSQL? And History
CSE2024 NoSQL Databases 16
Why NoSQL? And History
• 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database
• 2000- Graph database Neo4j is launched
• 2004- Google BigTable is launched
• 2005- CouchDB is launched
• 2007- The research paper on Amazon Dynamo is released
• 2008- Facebooks open sources the Cassandra project
• 2009- The term NoSQL was reintroduced
CSE2024 NoSQL Databases 17
Features of NoSQL
Non-relational
• NoSQL databases never follow the relational model
• Never provide tables with flat fixed-column records
• Work with self-contained aggregates
• Doesn’t require object-relational mapping and data normalization
• No complex features like query languages, query planners, referential
integrity join, ACID
CSE2024 NoSQL Databases 18
Features of NoSQL
Schema-free
• NoSQL databases are either schema-free or have relaxed schemas
• Do not require any sort of definition of the schema of the data
• Offers heterogeneous structures of data in the same domain
CSE2024 NoSQL Databases 19
Features of NoSQL
Simple API
• Offer easy-to-use interfaces for storage and querying data provided
• APIs allow low-level data manipulation and selection methods
• Text-based protocol mostly used with HTTP REST with JSON
• Mostly used no standard-based NoSQL query language
• Web-enabled databases running as internet-facing services
CSE2024 NoSQL Databases 20
Features of NoSQL
Distributed
• Multiple NoSQL databases can be executed in a distributed fashion
• Offers auto-scaling and fail-over capabilities
• Often ACID concept can be sacrificed for scalability and throughput
• Shared Nothing Architecture. This enables less coordination and higher
distribution.
CSE2024 NoSQL Databases 21
Features of NoSQL
Distributed
CSE2024 NoSQL Databases 22
Question Bank
[Link] Questions LOs
1 Define NoSQL Database and List the different types of LO1
NoSQL Database.
2 Explain Key-Value pair Database. LO1
3 Explain Column-oriented databases. LO1
4 Explain Document-Oriented database. LO1
5 Explain Graph Based Database. LO1
6 Explain Concurrency and integration with an LO3
example.
7 Explain ACID Properties with an example. LO2
8 Describe the history of NoSQL. LO4
9 List the feature of NoSQL. LO4
CSE2024 NoSQL Databases 23
BASE - Basically Available, Soft state, Eventual consistency
Definition: An alternative to the ACID data processing model.
• The ACID model provides a consistent system.
• The BASE model provides high availability.
• Basically, available means DB is available all the time as per CAP theorem
• Soft state means even without input; the system state may change
• Eventual consistency means that the system will become consistent over
time.
Example :
• Using shopping cart applications on a website.
• Monitoring network and IT infrastructure security
• Managing and reusing document content
CSE2024 NoSQL Databases 24
ACID Vs BASE
CSE2024 NoSQL Databases 25
ACID Vs BASE
CSE2024 NoSQL Databases 26
ACID Vs BASE
• ACID-compliant databases will be a better fit for those who require
consistency, predictability, and reliability.
• BASE model, because it enables easier scaling up and provides more
flexibility.
• Databases: Just as SQL databases are almost uniformly ACID compliant, NoSQL
databases tend to conform to BASE principles. MongoDB, Cassandra, and Redis are
among the most popular NoSQL solutions, together with Amazon DynamoDB and
Couchbase.
• (i.e) The term “eventual consistency” means to have copies of data on
multiple machines to get high availability and scalability.
CSE2024 NoSQL Databases 27
CAP – Consistency , Availability, Partition Tolerance
• CAP Theorem: The CAP theorem is also called the brewer’s theorem. It
states that is impossible for a distributed data store to offer more than two
out of three guarantees:
1. Consistency
2. Availability
3. Partition Tolerance
Consistency :
• The data should remain consistent even after the execution of an operation. This
means once data is written, any future read request should contain that data.
• For example, after updating the order status, all the clients should be able to see the
same data.
CSE2024 NoSQL Databases 28
CAP – Consistency , Availability, Partition Tolerance
• Availability: The database should always be available and responsive. It
should not have any downtime.
• Partition Tolerance: Partition Tolerance means that the system should
continue to function even if the communication among the servers is
not stable.
• For example, the servers can be partitioned into multiple groups which may
not communicate with each other. Here, if part of the database is unavailable,
other parts are always unaffected.
CSE2024 NoSQL Databases 29
Database Sharding
• Sharding is a method for distributing a single dataset across multiple
databases, which can then be stored on multiple machines.
• Sharding is a form of scaling known as horizontal scaling or scale-out.
• Horizontal scaling allows for near-limitless scalability to handle big data and intense
workloads.
• vertical scaling refers to increasing the power of a single machine or single server
through a more powerful CPU, increased RAM, or increased storage capacity.
CSE2024 NoSQL Databases 30
Replication
• If your data workload is primarily read-focused, replication increases availability
and read performance while avoiding some of the complexity of database sharding.
CSE2024 NoSQL Databases 31
Advantages of Sharding
• Sharding allows you to scale your database to handle the increased load to a nearly
unlimited degree by providing increased read/write throughput, storage capacity,
and high availability.
• Increased read/write throughput: distributing the dataset across multiple shards,
both read and write.
• Increased storage capacity: By increasing the number of shards, you can also
increase overall total storage capacity, allowing near-infinite scalability.
• High availability: shards provide high availability in two ways. First, since each
shard is a replica set, every piece of data is replicated. Second, even if an entire shard
becomes unavailable since the data is distributed, the database as a whole still
remains partially functional, with part of the schema on different shards.
CSE2024 NoSQL Databases 32
Disadvantages of Sharding
• Sharding does come with several drawbacks, namely overhead in query result
compilation, the complexity of administration, and increased infrastructure
costs.
• Query overhead — Each sharded database must have a separate machine or service
which understands how to route a querying operation to the appropriate shard. This
introduces additional latency in every operation.
• The complexity of administration — With a single unsharded database, only the
database server itself requires upkeep and maintenance. (i.e) in cases
where replication is being used, any data updates must be mirrored across each
replicated node.
• Increased infrastructure costs — Sharding by its nature requires additional
machines and compute power over a single database server
CSE2024 NoSQL Databases 33
Advantages of NoSQL
• Can be used as a Primary or Analytic Data Source
• Big Data Capability
• No Single Point of Failure
• Easy Replication
• No Need for Separate Caching Layer
• It provides fast performance and horizontal scalability.
• Can handle structured, semi-structured, and unstructured data with equal
effect
• Object-oriented programming which is easy to use and flexible
• NoSQL databases don’t need a dedicated high-performance server
CSE2024 NoSQL Databases 34
Advantages of NoSQL
• Support Key Developer Languages and Platforms
• Simple to implement than using RDBMS
• It can serve as the primary data source for online applications.
• Handles big data which manages data velocity, variety, volume, and
complexity
• Excels at distributed database and multi-data centre operations
• Eliminates the need for a specific caching layer to store data
• Offers a flexible schema design that can easily be altered without downtime
or service disruption
CSE2024 NoSQL Databases 35
Disadvantages of NoSQL
• No standardization rules
• Limited query capabilities
• RDBMS databases and tools are comparatively mature
• It does not offer any traditional database capabilities, like consistency when
multiple transactions are performed simultaneously.
• When the volume of data increases it is difficult to maintain unique values as
keys become difficult
• Doesn’t work as well with relational data
• The learning curve is stiff for new developers
• Open source options so not so popular for enterprises.
CSE2024 NoSQL Databases 36
Summary of NoSQL
• NoSQL is a non-relational DMS, that does not require a fixed schema,
avoids joins, and is easy to scale
• The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data
• In the year 1998- Carlo Strozzi use the term NoSQL for his lightweight,
open-source relational database
• NoSQL databases never follow the relational model it is either schema-free
or has relaxed schemas
• Four types of NoSQL Database are 1). Key-value Pair Based 2). Column-
oriented Graph 3). Graphs based 4). Document-oriented
CSE2024 NoSQL Databases 37
Summary of NoSQL
• NoSQL can handle structured, semi-structured, and unstructured data with
equal effect
• CAP theorem consists of three words Consistency, Availability, and
Partition Tolerance
• The BASE stands for Basically Available, Soft state, Eventual consistency
• The term “eventual consistency” means to have copies of data on multiple
machines to get high availability and scalability
• NoSQL offer limited query capabilities
CSE2024 NoSQL Databases 38
CSE2024 NoSQL Databases 39