0% found this document useful (0 votes)
6 views

NoSQL Databases

The document provides a comprehensive overview of NoSQL databases, highlighting their characteristics, benefits, and types compared to traditional relational databases (RDBMS). It explains the limitations of RDBMS, the scalability and flexibility of NoSQL, and introduces key concepts such as CAP theorem and ACID vs BASE properties. Additionally, it categorizes NoSQL databases into key-value, document, graph, and wide-column databases, each with distinct features and use cases.

Uploaded by

amanjot26kaur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

NoSQL Databases

The document provides a comprehensive overview of NoSQL databases, highlighting their characteristics, benefits, and types compared to traditional relational databases (RDBMS). It explains the limitations of RDBMS, the scalability and flexibility of NoSQL, and introduces key concepts such as CAP theorem and ACID vs BASE properties. Additionally, it categorizes NoSQL databases into key-value, document, graph, and wide-column databases, each with distinct features and use cases.

Uploaded by

amanjot26kaur
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 8

NoSQL Databases

Created @November 7, 2023 8:21 AM

Tags

Field (attributes): Individual piece of data


Record (rows): Fields that are grouped together for a specific purpose
Primary Key: A field, or group of fields, that uniquely identifies an individual record

Benefits of RDBMS
1. RDBMS database model is simple

2. It is easy to use as SQL is used to execute complex queries

3. Relational databases are accurate and consistent (data integrity) because of their pre-defined
structure with zero data duplication and strict schema

4. Normalization is a method used in RDBMS that breaks down information into manageable
chunks to reduce storage size.

5. It can be accessed by multiple users to retrieve information at the same time and even if,
data is being updated.

6. It provides data security via access authentication.

Features of RDBMS
1. Requires data to be stored in tables (records and attributes)

2. Provides the use of primary key to uniquely identify each record, and foreign key to connect
two or more tables via a shared column

3. Implements indexing for quicker data retrieval

4. Provides the creation of a virtual table wherein sensitive data can be stored and queried

5. Provides multi-user accessibility at the same time

Limitations of RDBMS

NoSQL Databases 1
1. Institutions had to spend a lot of time, money and resources maintaining and upgrading the
database due to an exponential increase in data.

2. Requirements of physical memory increase along with the increase of data.

3. Relational databases are vertically scalable and hence, it becomes difficult to handle when
the quantity of data grows.

4. It stores data in a pre-defined tabular format, providing zero flexibility due to strict schemas.

5. Querying a relational database becomes slow as the data grows, and its reliance on multiple
tables.

Introduction to NoSQL Databases


NoSQL (Not Only SQL) Databases refer to a family of databases that vary widely in format and
technology, sharing common traits of being non-relational and containing no pre-defined format.
Here are some characteristics of NoSQL databases:

1. It can scale vertically as well as horizontally.

2. It stores data in key-value pairs, documents, graphs and columns.

3. It includes specific indexing and querying capabilities such as geospatial search, data
replication robustness and modern HTTP API’s.

Benefits of NoSQL
1. NoSQL databases focuses on enhancing operational speed (quick lookup) and flexibility (no
pre-defined schema) in storing big data while cutting significant costs as compared to
RDBMS databases.

2. It doesn’t put a restriction on the types of data being stored as it can store structured, semi-
structured and unstructured data, and allows introduction of new features (key-value pairs)
as needs change without any database downtime.

3. It is best suited for agile development application which requires fast implementation.

4. Relational database supports vertical scaling as it is build on the traditional concept of


client-server architecture, whereas in NoSQL, databases can be divided into smaller chunks
across multiple servers (sharding) which is very complicated to implement in relational
databases. Replacing and upgrading your database server machines to accommodate more
throughput results in downtime as well.

NoSQL Databases 2
5. NoSQL database is built on the concept of masterless, peer-to-peer architecture, wherein
data is sharded and replicated across multiple nodes in a cluster, and aggregate queries
including sum(), count(), avg(), etc. are distributed by default. This allows easy scaling such
as executing some commands to add a new server to the cluster. This scalability improves
performance, allows continuous availability and high read/write speeds.

6. Relational databases use a centralized application that is location dependent, especially for
write operations. However, the NoSQL database distributes data on a global scale via
multiple data centers and/or cloud for CURD operations, maintaining continuous availability
because of sharding and data replication.

Scaling
There are three types of scaling:
Vertical Scaling

Horizontal Scaling
Automatic Scaling

Clusters: Clusters are groups of servers that are managed together and participate in workload
management. A cluster can contain nodes or individual application servers.
Node: A node is typically a physical computer system with a distinct host IP address that is
running one or more application servers.

Replication: Replication is storing multiple copies of a dataset on multiple nodes. Hence, if one
node goes down then another node will have a copy of the data for easy and fast access. This
leads to zero downtime in the NoSQL database. When one considers the cost of downtime, this is
a big deal.

Cloud Architecture: Cloud refers to servers that are accessed over the Internet to avoid
managing physical servers and running software applications on their own machines.

Document stores for storing de-normalized intuitive information, and


Graph databases for associative data sets.

Features of Non-relational Databases

NoSQL Databases 3
CAP (Brewer’s) Theorem
A distributed system can deliver only two of the three desired characteristics: consistency,
availability and partition tolerance. Let’s take a detailed look at the three distributed system
characteristics:

Consistency: Consistency means that all clients see the same data at the same time, no
matter which node they connect to. For this to happen, whenever data is written to one node,
it must be instantly forwarded or replicated to all the other nodes in the system before the
write is deemed ‘successful.’

Availability: Availability means that any client making a read/write request gets a response,
even if one or more nodes are down. Another way to state this—all working nodes in the
distributed system return a valid response (success/failure) for any request, without
exception.

Partition Tolerance: A partition is a communications break within a distributed system—a


lost or temporarily delayed connection between two nodes. Partition tolerance means that
the cluster must continue to work despite any number of communication breakdowns
between nodes in the system.

CAP Theorem NoSQL Database Types


NoSQL databases are ideal for distributed network applications. Unlike their vertically scalable
SQL (relational) counterparts, NoSQL databases are horizontally scalable and distributed by
design—they can rapidly scale across a growing network consisting of multiple interconnected
nodes.

NoSQL databases are classified based on the two CAP characteristics they support:

CP Databases: A CP database delivers consistency and partition tolerance at the expense of


availability. When a partition occurs between any two nodes, the system has to shut down
the non-consistent node (i.e., make it unavailable) until the partition is resolved.

NoSQL Databases 4
AP database: An AP database delivers availability and partition tolerance at the expense of
consistency. When a partition occurs, all nodes remain available but those at the wrong end
of a partition might return an older version of data than others. (When the partition is
resolved, the AP databases typically resync the nodes to repair all inconsistencies in the
system.)

CA database: A CA database delivers consistency and availability across all nodes. It can’t
do this if there is a partition between any two nodes in the system, however, and therefore
can’t deliver fault tolerance.

NOTE: In a distributed system, partitions can’t be avoided. So, while we can discuss a CA
distributed database in theory, for all practical purposes a CA distributed database can’t exist.
This doesn’t mean you can’t have a CA database for your distributed application if you need one.
Many relational databases, such as PostgreSQL, deliver consistency and availability and can be
deployed to multiple nodes using replication.

Types of NoSQL Databases


Key-value Databases

This is the most basic type of database, where information is stored in two
parts: key and value. The key is then used for fast retrieve of unstructured
data from the database. Everything is stored as a unique key and a value
that is either the data or a location for the data, and hence, reading and
writing will always be fast. However, this simplicity restricts the type of
use cases it can be used for. Therefore, more complex data requirements
can’t be supported. Example: Redis, DynamoDB, Oracle NoSQL, etc.

Document Databases

Document databases store data as documents in JSON format, having zero


partition tolerance. The values within a document can be anything from
strings to objects. Document databases offer a variety of advantages,

NoSQL Databases 5
including an intuitive data model that is fast and easy to work with, a
flexible schema that allows for the data model to evolve as application
needs change, and the ability to horizontally scale out.

Graph Databases

Graph databases use a structure of elements called nodes that store data,
and edges between them contain attributes about the relationship.
Relationships are defined in the edges, which makes searches related to
these relationships naturally fast. Plus, they are flexible because new nodes
and edges can be added easily. They also don’t have to have a defined
schema like a traditional relational database. However, they are not very
good for querying the whole database, where relationships aren’t as well—
or at all—defined. They also don’t have a standard language for querying,
which means moving between different graph database types comes with a
learning requirement.

Wide-column Databases (Subset of Key-value databases)

Wide-column databases, similar to relational databases, store data in


tables, columns, and rows. However, the names and formatting of the
columns don’t have to match in each row. The columns can even be stored
across multiple servers. They are considered two-dimensional key-value
stores because they use multi-dimensional mapping to reference data by
row and column. Like two-column key-value databases, wide-column
databases have the benefit of being flexible, so queries are fast. They are
good at handling “big data” and unstructured data because of this
flexibility. However, compared to relational databases, wide-column
databases are much slower when handling transactions. Columns group

NoSQL Databases 6
together similar attributes rather than using rows and store these in
separate files, which means transactions have to be carried out across
multiple files.

ACID Properties of an RDBMS Database


ACID properties ensure that a set of database operations (grouped together in a transaction) leave
the database in a valid state even in the event of unexpected errors.

Atomicity: Atomicity guarantees that all of the commands that make up a transaction are
treated as a single unit and either succeed or fail together. The transaction would have either
completed successfully or been rollbacked if any part of the transaction failed.

Consistency: Consistency guarantees that changes made within a transaction are consistent
with database constraints. This includes all rules, constraints, and triggers. If the data gets
into an illegal state, the whole transaction fails. Therefore, ACID compliance database
models are used in financial institutions, data warehouses, etc.

Isolation: Isolation ensures that all transactions run in an isolated environment. That enables
running transactions concurrently because transactions don’t interfere with each other.

Durability: Durability guarantees that once the transaction completes and changes are
written to the database, they are persisted. This ensures that data within the system will
persist even in the case of system failures like crashes or power outages.

BASE Properties
The BASE acronym stands for:

Basically Available: Rather than enforcing immediate consistency. BASE-modelled NoSQL


databases will ensure availability of data by spreading and replicating it across the nodes of
the database cluster.

Soft state: Due to the lack of immediate consistency, data values may change over time. In
the BASE model data, stores don’t have to be write-consistent, nor do different replicas have
to be mutually consistent all the time.

Eventually consistent: The fact that the BASE model does not enforce immediate
consistency does not mean that it never achieves it. However, until it does, data reads might
be inconsistent.

NoSQL Databases 7
NOTE: At the expense of immediate consistency (ACID compliance), these days, institutions are
adopting BASE compliance databases for availability.

Difference between SQL and NoSQL


Semi-structured
Unstructured Data

NoSQL Databases 8

You might also like