0% found this document useful (0 votes)
23 views

Unit 1

Uploaded by

senaaus000
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
23 views

Unit 1

Uploaded by

senaaus000
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 23

Ramaiah Institute of Technology

(Autonomous Institute, Affiliated to VTU)


Department of AIML

Course Name : NoSQL Data Bases


Curse Code : AIE734
Credits : 3:0:0
UNIT 1
Introduction to NoSQL concepts

What is NoSQL?
 The term NoSQL originally referred to “non-SQL” or “non-relational” databases, but the term has since evolved
to mean “not only SQL,” as NoSQL databases have expanded to include a wide range of different database
architectures and data models.
 NoSQL is a type of database management system (DBMS) that is designed to handle and store large volumes of
unstructured and semi-structured data.
 Unlike traditional relational databases that use tables with pre-defined schemas to store data, NoSQL
databases use flexible data models that can adapt to changes in data structures and are capable of scaling
horizontally to handle growing amounts of data.
 Data Base can be considered as one of the important component entity for technology and the application.
 Data need to be stored in the specific structure.
 But, there are situations where data are not always in a structured format.
 NoSQL is famous for its high functionality.
Types of Data Base
• Centralized Data base
• Cloud Data Base
• Commercial Data base
• Distributed Data base
• End User Data base
• Object Oriented Data base
• NoSQL data base
• Open Source Data base
• Operational Data base
• Personal Data base
• Relational Data base
These are the few main Data bases.
Mostly we have used Relational Data Base that is nothing but SQL and MySQL.

Why we call it ha Relational Data base?


One or more table we will be connecting and we will be fetching the data.
Centralized data base

• Multiple System or client will be there and only one server will be present.
• Multiple clients will be connected to the single server.
• If Data base gets any problem/error after that all the client/system will also be in problem.

Distributed data base


• Multiple system/client will be there and multiple data base will be present.
• Each data base will be like connected as one-to -one.
• If any one data base is down means, the also we can connect to the main data base and also
application will be up, server wont go down.
Scale Up

• Scale up means, one node will be there, in one node only many upgrades will be done.
• For eg: CPU, RAM
• It is also called as Vertical scaling.

Scale out
• It’s opposite to the scale up.
• Instead of one node if it ha multiple node or multiple server’s are coming means then it
is called as scale out.
• It is also called as horizontal scaling.

Fig: Scale up and Scale out


NoSQL databases are generally classified into four main categories
• Document databases
• Key-value stores
• Column-family stores
• Graph databases

Document databases: These databases store data as semi-structured documents, such as JSON or XML,
and can be queried using document-oriented query languages.

Key-value stores: These databases store data as key-value pairs, and are optimized for simple and fast
read/write operations.

Column-family stores: These databases store data as column families, which are sets of columns that
are treated as a single entity. They are optimized for fast and efficient querying of large amounts of data.

Graph databases: These databases store data as nodes and edges, and are designed to handle complex
relationships between data.
Advantages of NoSQL

• High scalability
• Flexibility
• High availability
• Scalability
• Performance
• Cost-effectiveness
• Agility

Disadvantages of NoSQL

• Lack of standardization
• Lack of ACID compliance
• Open-source
• Lack of support for complex queries
• Lack of maturity
• Management challenge
• GUI is not available
• Backup
• Large document size
Data Base Revolutions
1. First Generation
2. Second Generation
3. Third Generation

First Generation
 1st generation of data base revolution is Relational database.
 The data’s are organized in the form of row’s and columns.
 The data’s are stored in the form of tables.
 It is based on the mathematical concept of set theory and use a structured query language(SQL).
 The relational model uses a collection of tables to represent both data and the relationship among those data.
 Each table has multiple columns and each table has a unique name.
 Tables are also known as relations.

Advantages:
• Data is easy to organize.
• Querying is straightforward.
• It can be used to enforce data integrity.
Disadvantages
• Relational Data base model is not very good for large data base.
• Sometimes, it becomes difficult to find the relation between tables.

Second Generation
 2nd generation of data base revolution is Object oriented database.
 The data is organized into objects with attributes and methods.
 This is based on the concept of object oriented programming language(OQL).
 OQL is used for the accessing and manipulating data.
 An object oriented data bases is a data base that store the data in objects.
 Objects are similar to the files system, where each object contains a collection of information
 It an store data in the form of objects.

Advantages:
• More flexible than relational databases.
• It can represent more complicated relationships between data.
• It can be easier to work with object-oriented programming languages.
Third Generation

 The third generation of data base revolution is NoSQL data base.


 NoSQL data base organizes data into key-value pairs, documents, columns and graphs.
 The data’s in NoSQL are not store in the form of tables, object or XML documents and It can store data in a
variety of formats.
 NoSQL data base is a type of data base that dose not use the traditional table structure.

Advantages
•It can be more scalable than relational databases.
•It can be more suitable for working with large amounts of data.
•It can be more flexible in terms of schema.

Key-Value Pair

 The key-value pair database is the simplest NoSQL database and is often used for storing simple data such as
configuration settings.
 A key-value pair database is a database that stores data in key-value pairs.
 A key-value pair has a key and a value.
 The key is used to identify the value of the data stored in the database, and It can store data in the form of
key-value pairs.
Document Data Base

 The document database is more complex and is used for storing semi-structured or unstructured data.
 A document database is a database that stores data in documents.
 Documents are similar to files in a file system, where each document contains a collection of information, and
It can store data in the form of documents.

Column Data Base


 The column database stores data organized into columns
 A column database is a database that stores data in columns.
 Columns are similar to fields in a database table, where each column contains data collection. It can store
data in the form of columns.

Graph Data Base


 The graph database is the most complex NoSQL database for storing data organized into relationships.
 A graph database is a database that stores data in a graph.
 A graph is a collection of nodes and edges, where each node represents an entity, and each edge represents
a relationship between two entities.
Managing Transactions and Data Integrity

 Managing transactions and ensuring data integrity in NoSQL data base can be quite different
from traditional relational data base.
 NoSQL data base such as MangoDB, Cassandra and Couchbase.
 For eg: Banking System

Key concepts in NoSQL transaction and data integrity management

1. Eventual consistency
2. ACID Transactions in NoSQL
3. Single Document Transactions
4. Data Integrity

Eventual consistency

• It means when you make changes to data, those changes might not show up everywhere right away.
• Here we understand the limitations of the eventual consistency based on this we need to design our
application.
Single Document Transactions

• Many NoSQL databases provide strong consistency and atomicity at the document or row level.
• For eg: MangoDB treat each document as an atomic unit.
• This approach works well foe use cases where a single document encapsulates all the data that needs to be
manipulated in a transaction.

Data Integrity

• Data Modelling : In NoSQL data base, data integrity often comes from designing your data models carefully
to avoid inconsistency.
• This may involve denormalizing data(storing related data together in a single document) to avoid having to
update multiple documents in response to change.
• Schema flexibility: NoSQL data base are typically schema-less or have flexible schema designs, which can
lead to inconsistencies if not managed well.
• Validation and constraint are defined at the application level or through data base mechanism.
Best Practices for managing transactions and data integrity

1. Design with eventual consistency in mind.


2. Use ACID where possible.
3. Data partitioning and sharding.
4. Replication.

ACID for reliable database transactions


Transaction

 An action or series of action that are being performed by a single user or application program, which
reads or updates the contents of the data base.
 For eg: X=500, Y=300
T1 T2
Read(X) Read(Y)
X=X-100 Y=Y+100
Write(X) Write(Y)
Atomicity(A)
The entire transaction takes place at once or doesn’t happen at all.

Consistency(C)
• Correctness
• Integrity constraint must be maintained.

Isolation
• Multiple transaction can occur concurrently without leading to the inconsistency of data base
state.
• Transactions occur independently.
• Changes occurring in 1T will not be visible to other T until committed.
• Responsibility of concurrently control subsystem.

For eg: If Person A is doing transaction means it should affect some other person’s account
T1 T2
Read(X)
X=X*100
Write(X) Read(X)
Read(Y) Read(Y)
Y=Y-50 Z=X+Y
Write(Y) Write(Z)

Durability(D)

• Once the transaction is committed, the updates and modifications to the data base are stored in and
written to disk and they predict even if a system failure occurs.
• For eg: Banking system failure
• The effects of the transaction are never cost.
• Permanent
BASE for reliable database transactions
1. Basic Availability(AB)
2. Soft State(S)
3. Eventual Consistency(E)

Basic Availability(AB)

A distributed system should be available to respond with some acknowledgment even if it’s failure message to
any incoming request.

Soft State(S)
The system keep changing states as and when it receives new information.

Eventual Consistency(E)

 The components in the system may not reflect the same value/state of a record at a given point in time.
 They will settle it with time, eventually though.
For eg: E-Commerce site use.

• Here we have 2 different modules.


1. Order service module
2. Payment service module
• And these 2 module have there own data base.
• Assume that when the customer places an order, when order services receives the order it initiate into it’s own
record where there stores the all records along with the payment status.
• After that these all details sent to the payment service, so that the customer make payment.
• Suppose, this payment has been failed in any reason might be expiry date, server down etc. It stores the details
in its data base as payment has been failed same information is sent back to the order service, order service
can also update this payment failed and give the alternative way to the user for the payment.
Place order

Order Payment
Service Service

Order service Payment service


data base data base

Fig: E-Commerce site


Speeding Performance by strategic use of RAM, SSD and Disk

1. Benefits of RAM in performance.


• Speed
• Multitasking efficiency
• Reducing Bottleneck

Speed
 RAM is much faster than performance storage device.
 Having the sufficient RAM ensures that frequently used data is readily available.
 Reducing the need to access slower storage, thereby speeding up overall system performance.

Multitasking efficiency
 More RAM allows more applications to run simultaneously without slowing down the system.
 This multitasking efficiency is important for professionals running resource-intensive applications, such as
video editing software, virtual or machines.

Reducing Bottleneck
 RAM minimize the number of times the CPU has to fetch data from slower storage(SSD or HDD)
preventing bottleneck in processing and increasing the efficiency of the system.
2. SSD(Solid State Drive)
SSD is a type of storage device that flash memory to store data, similar to the traditional hard disk
drives(HDD), but it is faster and more efficient.

Uses
 Primary use is faster data storage and access.
 SSD’s are ideal for hosting applications that require fast read/write access to the large amount of data.
 SSD’s can be used to cache data that is frequently accessed by the cpu or in other program.

3. Disk
 In NoSQL data base, disks(HDD’s or SSD’s are crucial for storing large amount of data persistently).
 NoSQL data base is designed for the high performance distributed and scalable environment.
 The use of disks in NoSQL data base affects how data is stored , retrieved and optimized for the performance.
Achieving horizontal scalability with data base sharding
Sharding
Definition : The method of splitting the single logical dataset and storing it in multiple servers / data
base are known as sharding

Data base
server shard1

User Internet

Web server Data base


server shard2
We have few data sharding techniques.
1. Horizontal sharding
2. Hash Pattern
3. Vertical sharding

Horizontal sharding
 Most of the time sharding is referred to the horizontal way.
 It is a data architecture pattern related to horizontal partitioning, which is a method of splitting the one big
table into different smaller tables.
 This way is also called as partitioning.
 Here each partition will have column and schema, but rows will be with different data set.
Brewers CAP theorem
CAP theorem is again majorly used in the distributed environment system.
C Consistency
A Availability
P Partition Tolerance

Consistency(C)
It states that all replicas of a record must posses same value at every point of time.

Availability(A)
All the active nodes at any moment must be able to respond to different operations.

Partition Tolerance(P)
 The system must be able to tolerance network partition among its participant nodes.
 In other words partitioning should not affect the retrieval of records.

You might also like