NoSQL Databases
NoSQL Databases
Tags
Benefits of RDBMS
1. RDBMS database model is simple
3. Relational databases are accurate and consistent (data integrity) because of their pre-defined
structure with zero data duplication and strict schema
4. Normalization is a method used in RDBMS that breaks down information into manageable
chunks to reduce storage size.
5. It can be accessed by multiple users to retrieve information at the same time and even if,
data is being updated.
Features of RDBMS
1. Requires data to be stored in tables (records and attributes)
2. Provides the use of primary key to uniquely identify each record, and foreign key to connect
two or more tables via a shared column
4. Provides the creation of a virtual table wherein sensitive data can be stored and queried
Limitations of RDBMS
NoSQL Databases 1
1. Institutions had to spend a lot of time, money and resources maintaining and upgrading the
database due to an exponential increase in data.
3. Relational databases are vertically scalable and hence, it becomes difficult to handle when
the quantity of data grows.
4. It stores data in a pre-defined tabular format, providing zero flexibility due to strict schemas.
5. Querying a relational database becomes slow as the data grows, and its reliance on multiple
tables.
3. It includes specific indexing and querying capabilities such as geospatial search, data
replication robustness and modern HTTP API’s.
Benefits of NoSQL
1. NoSQL databases focuses on enhancing operational speed (quick lookup) and flexibility (no
pre-defined schema) in storing big data while cutting significant costs as compared to
RDBMS databases.
2. It doesn’t put a restriction on the types of data being stored as it can store structured, semi-
structured and unstructured data, and allows introduction of new features (key-value pairs)
as needs change without any database downtime.
3. It is best suited for agile development application which requires fast implementation.
NoSQL Databases 2
5. NoSQL database is built on the concept of masterless, peer-to-peer architecture, wherein
data is sharded and replicated across multiple nodes in a cluster, and aggregate queries
including sum(), count(), avg(), etc. are distributed by default. This allows easy scaling such
as executing some commands to add a new server to the cluster. This scalability improves
performance, allows continuous availability and high read/write speeds.
6. Relational databases use a centralized application that is location dependent, especially for
write operations. However, the NoSQL database distributes data on a global scale via
multiple data centers and/or cloud for CURD operations, maintaining continuous availability
because of sharding and data replication.
Scaling
There are three types of scaling:
Vertical Scaling
Horizontal Scaling
Automatic Scaling
Clusters: Clusters are groups of servers that are managed together and participate in workload
management. A cluster can contain nodes or individual application servers.
Node: A node is typically a physical computer system with a distinct host IP address that is
running one or more application servers.
Replication: Replication is storing multiple copies of a dataset on multiple nodes. Hence, if one
node goes down then another node will have a copy of the data for easy and fast access. This
leads to zero downtime in the NoSQL database. When one considers the cost of downtime, this is
a big deal.
Cloud Architecture: Cloud refers to servers that are accessed over the Internet to avoid
managing physical servers and running software applications on their own machines.
NoSQL Databases 3
CAP (Brewer’s) Theorem
A distributed system can deliver only two of the three desired characteristics: consistency,
availability and partition tolerance. Let’s take a detailed look at the three distributed system
characteristics:
Consistency: Consistency means that all clients see the same data at the same time, no
matter which node they connect to. For this to happen, whenever data is written to one node,
it must be instantly forwarded or replicated to all the other nodes in the system before the
write is deemed ‘successful.’
Availability: Availability means that any client making a read/write request gets a response,
even if one or more nodes are down. Another way to state this—all working nodes in the
distributed system return a valid response (success/failure) for any request, without
exception.
NoSQL databases are classified based on the two CAP characteristics they support:
NoSQL Databases 4
AP database: An AP database delivers availability and partition tolerance at the expense of
consistency. When a partition occurs, all nodes remain available but those at the wrong end
of a partition might return an older version of data than others. (When the partition is
resolved, the AP databases typically resync the nodes to repair all inconsistencies in the
system.)
CA database: A CA database delivers consistency and availability across all nodes. It can’t
do this if there is a partition between any two nodes in the system, however, and therefore
can’t deliver fault tolerance.
NOTE: In a distributed system, partitions can’t be avoided. So, while we can discuss a CA
distributed database in theory, for all practical purposes a CA distributed database can’t exist.
This doesn’t mean you can’t have a CA database for your distributed application if you need one.
Many relational databases, such as PostgreSQL, deliver consistency and availability and can be
deployed to multiple nodes using replication.
This is the most basic type of database, where information is stored in two
parts: key and value. The key is then used for fast retrieve of unstructured
data from the database. Everything is stored as a unique key and a value
that is either the data or a location for the data, and hence, reading and
writing will always be fast. However, this simplicity restricts the type of
use cases it can be used for. Therefore, more complex data requirements
can’t be supported. Example: Redis, DynamoDB, Oracle NoSQL, etc.
Document Databases
NoSQL Databases 5
including an intuitive data model that is fast and easy to work with, a
flexible schema that allows for the data model to evolve as application
needs change, and the ability to horizontally scale out.
Graph Databases
Graph databases use a structure of elements called nodes that store data,
and edges between them contain attributes about the relationship.
Relationships are defined in the edges, which makes searches related to
these relationships naturally fast. Plus, they are flexible because new nodes
and edges can be added easily. They also don’t have to have a defined
schema like a traditional relational database. However, they are not very
good for querying the whole database, where relationships aren’t as well—
or at all—defined. They also don’t have a standard language for querying,
which means moving between different graph database types comes with a
learning requirement.
NoSQL Databases 6
together similar attributes rather than using rows and store these in
separate files, which means transactions have to be carried out across
multiple files.
Atomicity: Atomicity guarantees that all of the commands that make up a transaction are
treated as a single unit and either succeed or fail together. The transaction would have either
completed successfully or been rollbacked if any part of the transaction failed.
Consistency: Consistency guarantees that changes made within a transaction are consistent
with database constraints. This includes all rules, constraints, and triggers. If the data gets
into an illegal state, the whole transaction fails. Therefore, ACID compliance database
models are used in financial institutions, data warehouses, etc.
Isolation: Isolation ensures that all transactions run in an isolated environment. That enables
running transactions concurrently because transactions don’t interfere with each other.
Durability: Durability guarantees that once the transaction completes and changes are
written to the database, they are persisted. This ensures that data within the system will
persist even in the case of system failures like crashes or power outages.
BASE Properties
The BASE acronym stands for:
Soft state: Due to the lack of immediate consistency, data values may change over time. In
the BASE model data, stores don’t have to be write-consistent, nor do different replicas have
to be mutually consistent all the time.
Eventually consistent: The fact that the BASE model does not enforce immediate
consistency does not mean that it never achieves it. However, until it does, data reads might
be inconsistent.
NoSQL Databases 7
NOTE: At the expense of immediate consistency (ACID compliance), these days, institutions are
adopting BASE compliance databases for availability.
NoSQL Databases 8