0% found this document useful (0 votes)
1 views

Chapter 3 NoSQL Database (1)

Uploaded by

thedeveloper333
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
1 views

Chapter 3 NoSQL Database (1)

Uploaded by

thedeveloper333
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 47

Introduction To NoSQL Database

AIDS – B.E – BDA


Dr. Pooja K Revankar
Assistant Professor,
Dept. of Computer Science and Engg.,
SIES Graduate School of Technology

1
Dr. Pooja K R
Agenda
• Introduction to NoSQL

• Limitations of Relational Database

• What is NoSQL

• Business Drivers of NoSQL

• NoSQL Data Architecture Patterns

• NoSQL solution for big data

• Choosing distribution models

2
Dr. Pooja K R
Introduction to NoSQL Databases

•A database Management System provides the mechanism to store


and retrieve the data.

•There are different kinds of database Management Systems:

1. RDBMS (Relational Database Management Systems)

2. OLAP (Online Analytical Processing)

3. NoSQL (Not only SQL)

3
Dr. Pooja K R
Different SQL Databases

4
Dr. Pooja K R
What is NoSQL?

NoSQL is a set of concepts that allows the rapid and


efficient processing of data sets with a focus on
performance, reliability, and agility.

5
Dr. Pooja K R
Limitations of Relational databases
•Need to define structure and schema of data first and then
only we can process the data.

•Provides consistency and integrity of data by


enforcing ACID properties.

•Most of the applications store their data in JSON format.

•RDBMS don’t provide you a better way of performing


operations such as create, insert, update, delete etc on this
data.

6
Dr. Pooja K R
Advantages of NoSQL

•High scalability

•High Availability

7
Dr. Pooja K R
RDBMS Vs NoSQL
• RDBMS: It is a structured data that provides more functionality but
gives less performance.

• NoSQL: Structured or semi structured data, less functionality and high


performance.

8
Dr. Pooja K R
NOSQL DATABASES

9
Dr. Pooja K R
What is NoSQL?
• More than rows in tables

• Free of joins

• Schema-free

• Works on many processors

• Uses shared-nothing commodity computers

• Supports linear scalability

• Innovative

10
Dr. Pooja K R
NoSQL Database Categories

•Document Database

•Key value stores

•Graph store

•Wide column stores

11
Dr. Pooja K R
NoSQL Data Architecture Patterns

12
Dr. Pooja K R
NOSQL BUSINESS DRIVERS

 VOLUME

 VELOCITY

 VARIABILITY

 AGILITY

13
Dr. Pooja K R
What is the CAP Theorem?

CAP theorem is also called brewer's theorem. It states that


is impossible for a distributed data store to offer more than
two out of three guarantees:

1. Consistency
2. Availability
3. Partition Tolerance

14
Dr. Pooja K R
BASE Properties

15
Dr. Pooja K R
BASE Properties

NoSQL relies upon a softer model known as the BASE model(instead of


ACID properties)

 Basically Available: Guarantees the availability of the data . There


will be a response to any request (can be failure too).

 Soft state: The state of the system could change over time.

 Eventual consistency: The system will eventually become


consistent once it stops receiving input.

16
Dr. Pooja K R
NoSQL Database Categories

•Document Database

•Key value stores

•Graph store

•Wide column stores

17
Dr. Pooja K R
NoSQL Data Architecture Patterns

18
Dr. Pooja K R
Data Models
NoSQL databases are classified in four major data
models :

19
Dr. Pooja K R
Key-value
 Simplest NOSQL databases

 The main idea is the use of a hash table

 Access data (values) by strings called keys

 Data has no required format

 Data model: (key, value) pairs

 Key maps to a BLOB(Binary Large Object)

 Example of Key-value store DataBase : Redis,


Dynamodb, Riak, Memcache etc.
20
Dr. Pooja K R
Operations using KEY VALUE STORE

• Get(key)

• Put (key, value)

• Multi-get(Key1, Key2,….Keyn)

• Delete(key)

21
Dr. Pooja K R
KEY VALUE STORE PROS

 Any data type in value field

 Consistent

 Returned values on queries can be used to convert into lists,


data frames etc.

 Scalable

 Reliable

 Key can be synthetic or auto generated


22
Dr. Pooja K R
KEY VALUE STORE CONS

 No indexes are made on values.

 Do not provide traditional DBMS capabilities ,such as ACID


properties when multiple transactions are executed
simultaneously.

 No queries on values.

 Maintaining unique keys is a problem if volume is large.

23
Dr. Pooja K R
Key Value Stores

24
Dr. Pooja K R
Key Value Stores

25
Dr. Pooja K R
Document-Based Store NoSQL

•In this type of database, the record and its associated data are stored
in a single document.

•So this model is not completely unstructured but it is a kind of Semi-


structured data.

•The difference between a document and Key value pair is that in


document type storage is that in this type some kind of encoding is
provided while storing the data in documents.

• It can be XML encoding or JSON encoding.

•The below example shows a document that can be stored in a


document database but with a different encoding.

26
Dr. Pooja K R
DOCUMENT STORES
 The central concept of a document-oriented database is the notion
of a document.

 Documents in a document store are roughly equivalent to the


programming concept of an object.

 They are not required to adhere to a standard schema, nor will


they have all the same sections, slots, parts or keys.

 Generally, programs using objects have many different types of


objects, and those objects often have many optional fields.

 Every object, even those of the same class, can look very different.

 Document stores are similar in that they allow different types of


documents in a single store, allow the fields within them to be
optional, and often allow them to be encoded using different
encoding systems.

27
Dr. Pooja K R
DOCUMENT STORES

JSON DOCUMENT XML DOCUMENT

28
Dr. Pooja K R
DOCUMENT STORES

29
Dr. Pooja K R
Document-Based Store NoSQL
•The document type is mostly used for CMS systems, blogging
platforms, real-time analytics & e-commerce applications. It should not
use for complex transactions which require multiple operations or
queries against varying aggregate structures.

•Amazon SimpleDB, CouchDB, MongoDB, Riak, Lotus Notes,


MongoDB, are popular Document originated DBMS systems.

30
Dr. Pooja K R
Example:

•The difference between conventional databases and document-based


databases is that data here is not stored in tables like conventional
databases but are stored in documents.

•The examples of databases using the above data model are MongoDB
and Couchbase.

•These types of databases are used extensively especially in big data


analysis.
31
Dr. Pooja K R
COLUMN ORIENTED DATABASES
 Column-oriented databases primarily work on columns and every column is treated

individually.

 Values of a single column are stored contiguously.

 Column stores data in column specific files.

 In Column stores, query processors work on columns too.

 All data within each column data file have the same type which makes it ideal for

compression.

 Column stores can improve the performance of queries as it can access specific

column data.

 High performance on aggregation queries (e.g. COUNT, SUM, AVG, MIN, MAX).

 Works on data warehouses and business intelligence, customer relationship

management (CRM), Library card catalogs etc.


32
 Example of Column-oriented databases : BigTable, Cassandra, SimpleDB etc
Dr. Pooja K R
COLUMN-ORIENTED DATABASE

33
Dr. Pooja K R
GRAPH DATABASES
 A graph database stores data in a graph.

 It is capable of elegantly representing any kind of data in a highly


accessible way.
 A graph database is a collection of nodes and edges.

 Each node represents an entity (such as a student or business)


and each edge represents a connection or relationship between
two nodes.

 Every node and edge is defined by a unique identifier.

 Each node knows its adjacent nodes.

 As the number of nodes increases, the cost of a local step (or hop)
remains the same.
 Index for lookups.
 Example of Graph databases: OrientDB, Neo4J, Titan.etc.
38
Dr. Pooja K R
GRAPH STORES

39
Dr. Pooja K R
GRAPH STORES

40
Dr. Pooja K R
Analyzing big data with a shared-nothing architecture

41
Dr. Pooja K R
Analyzing big data with a shared-nothing architecture

42
Dr. Pooja K R
Analyzing big data with a shared-nothing architecture

•A shared nothing architecture (SN) is a distributed computing


architecture in which each node is independent and self-sufficient, and
there is no single point of contention across the system.

•More specifically, none of the nodes share memory or disk storage.

•People typically contrast SN with systems that keep a large amount of


centrally-stored state information, whether in a database, an application
server, or any other similar single point of contention.

43
Dr. Pooja K R
Analyzing big data with a shared-nothing architecture
•The advantages of SN architecture versus a central entity that controls
the network (a controller-based architecture) include eliminating any
single point of failure, allowing self-healing capabilities and providing an
advantage with offering non-disruptive upgrades.

•Shared nothing is popular for web development because of its


scalability.

•SN system can scale almost infinitely simply by adding nodes in the
form of inexpensive computers, since there is no single bottleneck to
slow the system down.

•A SN system typically partitions its data among many nodes on


different databases (assigning different computers to deal with different
users or queries),
• It may require every node to maintain its own copy of the application's
data, using some kind of coordination protocol. This is often referred to
as database sharding.
44
Dr. Pooja K R
Choosing distribution models: master-slave versus peer-to-peer

45
Dr. Pooja K R
Master-slave versus peer-to-peer

• In master-slave configuration where all incoming database requests


(reads or writes) are sent to a single master node and redistributed
from there.

•The master node is called the NameNode in Hadoop.

• This node keeps a database of all the other nodes in the cluster and
the rules for distributing requests to each node.

• In the peer-to-peer model stores all the information about the cluster
on each node in the cluster.

•If any node crashes, the other nodes can take over and processing
can continue.

46
Dr. Pooja K R
Choosing distribution models: master-slave versus peer-to-
peer
• Peer-to-peer systems distribute the responsibility of the master to
each node in the cluster.
• In this situation, testing is much easier since you can remove any
node in the cluster and the other nodes will continue to function.
•The disadvantage of peer-to-peer networks is that there’s an
increased complexity and communication overhead that must occur for
all nodes to be kept up to date with the cluster status.

47
Dr. Pooja K R
Master Slave Distribution Model
•With a master-slave distribution model, the role of managing the
cluster is done on a single master node.
•This node can run on specialized hardware such as RAID drives to
lower the probability that it crashes.
•The cluster can also be configured with a standby master that’s
continually updated from the master node.
•The challenge with this option is that it’s difficult to test the standby
master without jeopardizing the health of the cluster.
•Failure of the standby master to take over from the master node is a
real concern for high-availability operations.

48
Dr. Pooja K R
NoSQL systems to handle big data problems

49
Dr. Pooja K R
Case Study:

• Google maps stores GIS in Bigtable


• Storing analytical information in BigTables
•References:
• https://dzone.com/articles/what-nosql

50
Dr. Pooja K R
Thank You!
([email protected])

51
Dr. Pooja K R

You might also like