0% found this document useful (0 votes)
6 views

Introduction to NoSQL

NoSQL is a non-relational database designed for managing unstructured data, emphasizing scalability, performance, and agility. It addresses the limitations of traditional relational databases by supporting flexible data structures and real-time processing, making it suitable for modern applications. Key features include distributed architecture, schema-less models, and the ability to handle large volumes of data efficiently.

Uploaded by

sakinabohra0909
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
6 views

Introduction to NoSQL

NoSQL is a non-relational database designed for managing unstructured data, emphasizing scalability, performance, and agility. It addresses the limitations of traditional relational databases by supporting flexible data structures and real-time processing, making it suitable for modern applications. Key features include distributed architecture, schema-less models, and the ability to handle large volumes of data efficiently.

Uploaded by

sakinabohra0909
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 13

Introduction to NoSQL :

NoSQL (Not Only SQL) is a non-relational database used to manage unstructured data. It is a
distributed database system designed to work in virtual environments, providing mechanisms
for data storage and retrieval with a focus on scalability, high performance, availability, and
agility.

It was developed in response to the need to store a large volume of user-related data. NoSQL
databases are designed to scale easily and to handle products and objects that need to be
frequently accessed, updated, and changed, keeping up with the needs of the modern
industry.

Limitations of Traditional Relational Databases:

Relational databases:

 Are not designed to handle frequent changes or unstructured data.


 Do not take advantage of cheap storage and processing power from commodity hardware.
 Are less agile in handling big data and dynamic applications.

Key Features of NoSQL:

1. Not Only SQL – SQL and other query languages can be used.
2. Non-relational and schema-free – No fixed structure required.
3. No JOINs – Avoids complex join operations.
4. Distributed architecture – Runs on multiple processors/nodes.
5. Horizontally scalable – Add more machines instead of upgrading one.
6. Open-source options – Many available for free.
7. Easy data replication – For better performance and backup.
8. Simple API usage – Easy to implement.
9. Handles huge volumes of data – Efficient at big data processing.
10. Can be run on commodity hardware – Follows shared nothing concept.

Why NoSQL? :

A traditional database model is not suitable for all types of applications, especially those with:

 Unstructured or unpredictable data


 Need for easy scalability
 Real-time processing

NoSQL fits this need perfectly because of its:

 High performance
 Flexible structure
 Scalability
 Capability to handle dynamic data

Although NoSQL may not provide full ACID (Atomicity, Consistency, Isolation, Durability)
properties, it guarantees BASE properties:

 Basically Available
 Soft State
 Eventually Consistent

This is achieved through its distributed and fault-tolerant architecture.

CAP Theorem (Brewer’s Theorem) :

CAP Theorem says a distributed system cannot guarantee all three of the following at the
same time:

 Consistency – All nodes show the same data at the same time.
 Availability – Every request gets a response (success or failure).
 Partition Tolerance – System continues working despite network failure.

NoSQL often compromises consistency in favor of availability and partition tolerance.

BASE Transactions (Opposite of ACID) :

BASE stands for:

 Basically Available – System responds to every request, even if the data is not consistent.
 Soft State – System state can change over time even without input (due to eventual
consistency).
 Eventually Consistent – All changes will eventually reflect across all nodes, but not
immediately.

Characteristics of BASE:

 Weak consistency (stale data is okay)


 Focus on availability
 Best effort system
 Approximate answers are acceptable
 Optimistic in design
 Simpler and faster than ACID systems

BASE Case Scenarios:

 If data is consistent and available with no partition, then data is replicated and available in
both servers (A and B).
 If data is available and partitioned, then it's not consistent. Example: Server A has new
data, B has old.
 If data is consistent and partitioned, then it may not be available (B is waiting for update
from A).

Examples of NoSQL Implementations :

There are around 150 NoSQL databases in the market. Some popular ones include:

 Google BigTable
 Apache Hadoop
 MapReduce
 SimpleDB
 MemcacheDB

NoSQL Business drivers :


Today’s businesses need fast, scalable, and always-available data storage systems. Traditional
relational database systems (RDBMS), which work on a single CPU, often fail to keep up with
the increasing demands of data processing, speed, and variety of data. This is where NoSQL
databases come in.

Businesses today need to:

 Handle large and variable amounts of data


 Make quick decisions based on real-time data
 Be flexible with changing data types and needs

NoSQL addresses these needs through four major business drivers:

1. Volume :

Organizations now generate huge volumes of data. RDBMS systems often fail due to
limitations in single CPU performance. When dealing with large datasets, distributed
processing using clusters of commodity (low-cost) machines becomes necessary.

This has led to the development of distributed systems like:

 Apache Hadoop
 HDFS
 MapR
 HBase

These systems break large data into smaller chunks and process them in parallel.
2. Velocity :

Velocity refers to the speed at which data is generated and processed.

For example:

 E-commerce websites handle thousands of reads and writes per second.


 During sales or discounts, traffic spikes slow down RDBMS systems due to multiple
indexes.

NoSQL systems handle these high-speed real-time operations efficiently and ensure low
response time, even during heavy traffic.

3. Variability :

Data often comes in different formats and structures. In RDBMS, changing the schema (table
design) for new data fields is difficult and can affect the entire system.

Example: If you want to store a special field for a few customers, you need to change the
entire table schema. This creates a sparse matrix (empty fields for others) and affects
performance.

NoSQL systems offer schema-less models, allowing storage of different kinds of data without
any rigid structure.

4. Agility :

Handling complex queries in RDBMS requires multiple nested queries and object-relational
mapping layers (ORM) using frameworks like Hibernate or Java. This slows down development
and updates.

NoSQL simplifies this by:

 Supporting easy data retrieval


 Reducing the need for complex SQL queries
 Adapting quickly to changes in business requirements

Key Business Features of NoSQL :

1. 24x7 Availability

 No single point of failure


 Data and functions are replicated across multiple nodes
 Even if a node fails, others continue operations without data loss
 Dynamic updates can be made without downtime

2. Location Transparency

 Read/write data from any location without knowing the physical location of the node
 Data is synchronized across regions
 Ensures fast local access and global availability

3. Schema-less Data Model

 Accepts structured, semi-structured, and unstructured data


 Handles large volumes of data efficiently
 Suitable for flexible and unpredictable data patterns
 Delivers fast performance for both read and write operations

4. Modern Transaction Analysis

 NoSQL does not require strict ACID transactions


 Uses CAP theorem for consistency: data can be immediately or eventually consistent
across nodes
 Suitable for customer reviews, branding, strategy planning, etc., where JOINs and foreign
keys are unnecessary

5. Architecture for Big Data

NoSQL databases support modern architectures by offering:

 Scalability
 Data distribution
 Continuous availability
 Support for multi-data centers

Big data architecture includes:

 Huge data source handling (terabytes to petabytes)


 Real-time data streaming instead of batch processing
 Storage using Hadoop, MongoDB, Cassandra, Neo4j, etc.
 Support for various compute methods (MapReduce, streaming, batch)
6. Analytics and Business Intelligence

 NoSQL enables real-time data mining and analytics


 Helps in quick decision-making
 Extracts valuable insights from high-volume, complex datasets
 Provides integrated analytics that traditional RDBMS struggle to offer

NoSQL Data architectural patterns :


NoSQL databases are designed for flexibility, scalability, and high performance. Based on the
data structure they use, there are four main types of NoSQL data stores:

Types of NoSQL Data Stores:

 Key-Value Store
 Column Store
 Document Store
 Graph Store

1. Key-Value Store

A key-value store stores data as a pair of key and value, just like a dictionary.

 The key is unique and is used to find the value.


 The value can be in formats like String, JSON, or Binary (BLOB).
 It is schema-less, meaning no fixed structure is required.

How it works:

 Internally uses a hash table to store data.


 Keys can be system-generated or custom.
 Buckets group keys logically (not physically), so same key names can exist in different
buckets.
 The real key is a combination of bucket + key.

Basic Operations (APIs):

Operation Description
Get(key) Retrieves value using the key
Put(key, value) Stores or updates value with the key
Multi-Get(key1, key2...) Retrieves multiple values
Delete(key) Deletes the value for the key

Rules:

1. Distinct Keys: All keys must be unique.


2. No Queries on Values: You cannot search within values

Weaknesses:

 No consistency: Cannot update part of the value.


 No querying: Cannot search based on value.
 As data grows, performance can become difficult to manage

Use Cases:

 Caching
 Session storage
 Image stores
 Dictionaries (word-definition pairs)

2. Column Store / Wide Column Store

Stores data in columns instead of rows. It is good for storing large and sparse datasets.

Key Concepts:

 A row key and column name together identify the cell.


 Data is grouped in Column Families, which are like categories of related columns
 Each cell stores data with a timestamp for versioning.
 Very fast for reading data from specific columns.

Structure Format:
<Row Key, Column Family, Column Name, Timestamp> : Value

How it differs from Key-Value:

 Supports grouping of columns.


 Allows fast reading of selected columns.
 Used in analytical systems (OLAP).

Cassandra Data Model Highlights:

 Keyspace: Like a database for one application.


 Column Family: Stores data related to a specific topic.
 Row Key: Unique identifier for each row.
 Columns can be added dynamically.

Use Cases:

 Analytics
 Time-series data
 IoT (Internet of Things) systems
 Social media posts

3. Document Store

A document store is like a smart key-value store, where the value is a document (usually in
JSON or XML format).

Features:
 Documents are semi-structured and self-describing.
 Each document has a unique key (ID).
 All properties inside the document are indexed for fast search.
 Can store nested data (tree structure) directly.

How it works:

1. You can search by any field inside the document.


2. Uses Document Path to access specific nested values.

Example Path: Employee[id='2300']/Address/street/BuildingName

Advantages Over Key-Value Store:

 Allows searching inside documents.


 Supports complex data and hierarchies.
 Supports queries on values.

Use Cases:

 Content management systems


 User profiles
 Ad services (MongoDB sends real-time ads to millions)
 Real-time analytics

4. Graph Store

A graph store uses nodes and relationships to represent and store data.
It is based on graph theory.

Structure:

 Nodes: Entities (e.g., person, product)


 Relationships: Connections between nodes (e.g., follows, friend)
 Properties: Data stored inside nodes or relationships (key-value pairs)

Key Benefits:

 Great for storing and exploring complex relationships.


 No need for complex joins like in RDBMS.
 Fast traversal between connected nodes.

Use Cases:

 Social networks (Facebook, LinkedIn)


 Recommendation systems
 Fraud detection
 Video platforms (YouTube, Flickr)

Variations of NoSQL architectural patterns :


A NoSQL architectural pattern refers to how a NoSQL database system is structured or
designed to store, manage, and retrieve data efficiently — especially for big data, distributed
systems, and real-time applications.

NoSQL databases are schema-less, distributed, and horizontally scalable. But based on system
needs, the architectural design can vary. Let’s explore those variations:

Major NoSQL Data Models (Core Patterns):

1. Key-Value Store

 Stores data as a pair: Key → Value


 Example: Redis, Riak
 Variation:

Can be distributed across multiple servers for scalability.

Federated architecture allows multiple independent key-value databases to work


together.

2. Document Store

 Stores semi-structured data like JSON, XML.


 Example: MongoDB, CouchDB
 Variation:

Can be used in IoT systems, where sensors push data into JSON-like documents.

Data can be temporarily stored or permanently archived.


3. Column Family Store

 Stores data in columns instead of rows.


 Example: Apache Cassandra, HBase
 Variation:

Hash table + content-addressable network to improve distribution and data


lookup.

Scalable distributed architecture using shared-nothing design and load balancers.

4. Graph Store

 Stores entities as nodes and relationships as edges.


 Example: Neo4j, Amazon Neptune
 Variation:

Often used in social networks or enterprise collaboration platforms.

Architectural Variations Based on Implementation Style:

1. Distributed Architecture

 Data is split and stored on multiple servers at different locations.


 Benefits: High availability, fault tolerance, scalability.
 Used in:

Global-scale apps (Netflix, Facebook)

Content delivery platforms

2. Federated Architecture

 Manages independent and heterogeneous databases across various sites.


 Each database is autonomous but can work together as one logical system.
 Used in:

Healthcare systems

Academic research platforms


IoT-Centric NoSQL Architecture

With the rise of Internet of Things (IoT):

 Data from multiple sensors needs to be processed as a single stream.


 Middleware (software between database and app) helps:

Integrate streams

Temporarily store or archive data

Enable real-time querying

 Example:

Using a document store to store sensor readings as JSON

Using Pub/Sub model (EventJava) for live updates

Scalable and Flexible NoSQL Patterns:

System Requirement-Based Variations:


Using NoSQL to manage Big data:

Introduction to MongoDB:

Datatypes in MongoDB:

MongoDB Query language:

You might also like