Introduction to NoSQL
Introduction to NoSQL
NoSQL (Not Only SQL) is a non-relational database used to manage unstructured data. It is a
distributed database system designed to work in virtual environments, providing mechanisms
for data storage and retrieval with a focus on scalability, high performance, availability, and
agility.
It was developed in response to the need to store a large volume of user-related data. NoSQL
databases are designed to scale easily and to handle products and objects that need to be
frequently accessed, updated, and changed, keeping up with the needs of the modern
industry.
Relational databases:
1. Not Only SQL – SQL and other query languages can be used.
2. Non-relational and schema-free – No fixed structure required.
3. No JOINs – Avoids complex join operations.
4. Distributed architecture – Runs on multiple processors/nodes.
5. Horizontally scalable – Add more machines instead of upgrading one.
6. Open-source options – Many available for free.
7. Easy data replication – For better performance and backup.
8. Simple API usage – Easy to implement.
9. Handles huge volumes of data – Efficient at big data processing.
10. Can be run on commodity hardware – Follows shared nothing concept.
Why NoSQL? :
A traditional database model is not suitable for all types of applications, especially those with:
High performance
Flexible structure
Scalability
Capability to handle dynamic data
Although NoSQL may not provide full ACID (Atomicity, Consistency, Isolation, Durability)
properties, it guarantees BASE properties:
Basically Available
Soft State
Eventually Consistent
CAP Theorem says a distributed system cannot guarantee all three of the following at the
same time:
Consistency – All nodes show the same data at the same time.
Availability – Every request gets a response (success or failure).
Partition Tolerance – System continues working despite network failure.
Basically Available – System responds to every request, even if the data is not consistent.
Soft State – System state can change over time even without input (due to eventual
consistency).
Eventually Consistent – All changes will eventually reflect across all nodes, but not
immediately.
Characteristics of BASE:
If data is consistent and available with no partition, then data is replicated and available in
both servers (A and B).
If data is available and partitioned, then it's not consistent. Example: Server A has new
data, B has old.
If data is consistent and partitioned, then it may not be available (B is waiting for update
from A).
There are around 150 NoSQL databases in the market. Some popular ones include:
Google BigTable
Apache Hadoop
MapReduce
SimpleDB
MemcacheDB
1. Volume :
Organizations now generate huge volumes of data. RDBMS systems often fail due to
limitations in single CPU performance. When dealing with large datasets, distributed
processing using clusters of commodity (low-cost) machines becomes necessary.
Apache Hadoop
HDFS
MapR
HBase
These systems break large data into smaller chunks and process them in parallel.
2. Velocity :
For example:
NoSQL systems handle these high-speed real-time operations efficiently and ensure low
response time, even during heavy traffic.
3. Variability :
Data often comes in different formats and structures. In RDBMS, changing the schema (table
design) for new data fields is difficult and can affect the entire system.
Example: If you want to store a special field for a few customers, you need to change the
entire table schema. This creates a sparse matrix (empty fields for others) and affects
performance.
NoSQL systems offer schema-less models, allowing storage of different kinds of data without
any rigid structure.
4. Agility :
Handling complex queries in RDBMS requires multiple nested queries and object-relational
mapping layers (ORM) using frameworks like Hibernate or Java. This slows down development
and updates.
1. 24x7 Availability
2. Location Transparency
Read/write data from any location without knowing the physical location of the node
Data is synchronized across regions
Ensures fast local access and global availability
Scalability
Data distribution
Continuous availability
Support for multi-data centers
Key-Value Store
Column Store
Document Store
Graph Store
1. Key-Value Store
A key-value store stores data as a pair of key and value, just like a dictionary.
How it works:
Operation Description
Get(key) Retrieves value using the key
Put(key, value) Stores or updates value with the key
Multi-Get(key1, key2...) Retrieves multiple values
Delete(key) Deletes the value for the key
Rules:
Weaknesses:
Use Cases:
Caching
Session storage
Image stores
Dictionaries (word-definition pairs)
Stores data in columns instead of rows. It is good for storing large and sparse datasets.
Key Concepts:
Structure Format:
<Row Key, Column Family, Column Name, Timestamp> : Value
Use Cases:
Analytics
Time-series data
IoT (Internet of Things) systems
Social media posts
3. Document Store
A document store is like a smart key-value store, where the value is a document (usually in
JSON or XML format).
Features:
Documents are semi-structured and self-describing.
Each document has a unique key (ID).
All properties inside the document are indexed for fast search.
Can store nested data (tree structure) directly.
How it works:
Use Cases:
4. Graph Store
A graph store uses nodes and relationships to represent and store data.
It is based on graph theory.
Structure:
Key Benefits:
Use Cases:
NoSQL databases are schema-less, distributed, and horizontally scalable. But based on system
needs, the architectural design can vary. Let’s explore those variations:
1. Key-Value Store
2. Document Store
Can be used in IoT systems, where sensors push data into JSON-like documents.
4. Graph Store
1. Distributed Architecture
2. Federated Architecture
Healthcare systems
Integrate streams
Example:
Introduction to MongoDB:
Datatypes in MongoDB: