0% found this document useful (0 votes)
158 views

On Introdution To NoSQL

This document provides an introduction to NoSQL databases. It discusses the CAP theorem, which states that a distributed data store can only provide two of three guarantees: consistency, availability, and partition tolerance. It also covers eventual consistency and the BASE properties that many NoSQL systems follow. Finally, it outlines some business drivers for adopting NoSQL, including large data volumes, high velocities of data ingestion, variability in data structures, and the need for agility.

Uploaded by

Atharv Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
158 views

On Introdution To NoSQL

This document provides an introduction to NoSQL databases. It discusses the CAP theorem, which states that a distributed data store can only provide two of three guarantees: consistency, availability, and partition tolerance. It also covers eventual consistency and the BASE properties that many NoSQL systems follow. Finally, it outlines some business drivers for adopting NoSQL, including large data volumes, high velocities of data ingestion, variability in data structures, and the need for agility.

Uploaded by

Atharv Patil
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

Advanced Database Management System

Week 7: Module 4: NoSQL

Faculty Name :
Mrs. Aditi Chhabria, Mrs. Rajashree Shedge, Mr. Tushar Ghorpade
Index - Module :4 NoSQL

Lecture 17 : Introduction to NoSQL, NoSQL Business Drivers 4

Lecture 18 : CAP Theorem, BASE Properties, NoSQL Business Drivers 17

Lecture 19 : NoSQL data Architecture patterns: Key value stores, Graph stores, column
34
column family(Bigtable) stores,

Lecture 20 : Document stores, Variations of NoSQL architectural patterns,


44

2 Module 3: NoSQL
Lecture 17

Introduction to NoSQL
History of Databases

Problem Solution
No Standard Relational
Flat File System
Definition Database

Problem Solution
Relational Could not handle
No SQL Databases
Databases big data

4 Module 3: NoSQL
Definition

 NoSQL database stands for “Not Only SQL” or “NOT SQL”

 Traditional RDBMS uses SQL syntax and queries to analyze and get the
data for further insights.

 NoSQL is a Database Management System that provides mechanism for


storage and retrieval of massive amount of unstructured data in distributed
environment.

Database Management Systems

RDBMS
OLAP NoSQL
(Relational)

5 Module 3: NoSQL
Why NoSQL?

 The concept of NoSQL databases became popular with Internet giants like
Google, Facebook, Amazon, etc. who deal with huge volumes of data. The
system response time becomes slow when you use RDBMS for massive
volumes of data.

 To resolve this problem, we could "scale up" our systems by upgrading our
existing hardware. This process is expensive.

 The alternative for this issue is to distribute database load on multiple hosts
whenever the load increases. This method is known as "scaling out."

6 Module 3: NoSQL
Why NoSQL?

7 Module 3: NoSQL
Further Challenges with Traditional RDBMS

 Not optimized for horizontal scaling

 Data size has increased tremendously to the range of petabytes.

 Schema-less data

 Majority of data comes in a semi-structured or unstructured format

 Cost

 High licensing cost for data analysis

 High Velocity of data ingestion

 RDBMS lacks in high velocity because it is designed for steady data


retention rather than rapid growth

8 Module 3: NoSQL
Performance

More Functionality Less Functionality


Less Performance Less Performance

Database Management Systems

RDBMS
OLAP NoSQL
(Relational)

9 Module 3: NoSQL
Performance

Structured Data or
Structured Data
Unstructured Data

Database Management Systems

RDBMS
OLAP NoSQL
(Relational)
Tables Cubes Collections

10 Module 3: NoSQL
Brief History of NoSQL

 1998- Carlo Strozzi use the term NoSQL for his lightweight, open-source
relational database

 2000- Graph database Neo4j is launched

 2004- Google BigTable is launched

 2005- CouchDB is launched

 2007- The research paper on Amazon Dynamo is released

 2008- Facebooks open sources the Cassandra project

 2009- The term NoSQL was reintroduced

11 Module 3: NoSQL
Features of NoSQL

1. Non-relational

 NoSQL databases never follow the relational model

 Never provide tables with flat fixed-column records

 Work with self-contained aggregates or BLOBs

 Doesn't require object-relational mapping and data normalization

 No complex features like query languages, query planners, referential


integrity joins, ACID

12 Module 3: NoSQL
Features of NoSQL

2. Scehma-free

 NoSQL databases are either schema-free or have relaxed schemas

 Do not require any sort of definition of the schema of the data

 Offers heterogeneous structures of data in the same domain

13 Module 3: NoSQL
Features of NoSQL

3.Simple API

 Offers easy to use interfaces for storage and querying data provided

 APIs allow low-level data manipulation & selection methods

 Text-based protocols mostly used with HTTP REST with JSON

 Mostly used no standard based query language

 Web-enabled databases running as internet-facing services

14 Module 3: NoSQL
Features of NoSQL

4. Distributed

 Multiple NoSQL databases can be executed in a distributed fashion

 Offers auto-scaling and fail-over capabilities

 Often ACID concept can be sacrificed for scalability and throughput

 Shared Nothing Architecture. This enables less coordination and higher

distribution.

15 Module 3: NoSQL
Lecture 18

CAP Theorem, BASE


Properties, NoSQL Business
Drivers
What is CAP theorem?

 CAP theorem is also called brewer's theorem. It states that is impossible


for a distributed data store to offer more than two out of three guarantees

1. Consistency

2. Availability

3. Partition Tolerance

Consistency: The data should remain consistent even after the execution of
an operation. This means once data is written, any future read request should
contain that data. For example, after updating the order status, all the clients
should be able to see the same data.

17 Module 3: NoSQL
What is CAP Theorem?

Availability:

The database should always be available and responsive. It should not have
any downtime.

Partition Tolerance:

Partition Tolerance means that the system should continue to function even if
the communication among the servers is not stable. For example, the servers
can be partitioned into multiple groups which may not communicate with each
other. Here, if part of the database is unavailable, other parts are always
unaffected.

18 Module 3: NoSQL
CAP Theorem

NoSQL databases are meant for distributed storage

19 Module 3: NoSQL
CAP Theorem

Duplicate Copy of same data is maintained on Multiple Machines. This


increases availability, but decreases consistency

20 Module 3: NoSQL
CAP Theorem

If duplicate copy of same data is not maintained, consistency is superior But


availability decreases.

21 Module 3: NoSQL
CAP Theorem

If data on one machine changes, the update propagates to the other machine,
system is inconsistent, but will become eventually consistent.

22 Module 3: NoSQL
Eventual Consistency

 The term "eventual consistency" means to have copies of data on


multiple machines to get high availability and scalability. Thus, changes
made to any data item on one machine has to be propagated to other
replicas.

 Data replication may not be instantaneous as some copies will be updated


immediately while others in due course of time.

 These copies may be mutually, but in due course of time, they become
consistent. Hence, the name eventual consistency.

23 Module 3: NoSQL
CAP Theorem

Availability
Each client has always
read and write

Pick
2
All clients always have The system works well
the same view of the despite physical
data network partition

Consistency Partition
Tolerance

24 Module 3: NoSQL
BASE – in NoSQL Systems

BASE: Basically Available, Soft


state, Eventual consistency
Basically, available means DB is
available all the time as per CAP
theorem
Soft state means even without an
input; the system state may
change
Eventual consistency means that
the system will become consistent
over time

25 Module 3: NoSQL
NoSQL business drivers

 Volume

 Velocity

 Variability

 Agility

26 Module 3: NoSQL
NoSQL business drivers

Volume:
There are two ways to look into data
processing to improve performance
 If the key factor is only speed, a
faster processor could be used.
 If the processing involves complex
computations, GPU could be used
along with the CPU.
 But the volume of data is limited to
on board GPU memory

27 Module 3: NoSQL
NoSQL business drivers

Volume:
•The main reason for organizations to
look at an alternative to their current
RDBMS’s is the need to query big data
•The need to horizontal scaling made
organizations to move from serial to
distributed parallel processing where big
data is fragmented and processed using
cluster of commodity machines.
•This is made possible by the
development of technologies like Apache
Hadoop, MapR ,Hbase etc.

28 Module 3: NoSQL
NoSQL business drivers

Velocity
 Many single-processor RDBMSs are
unable to keep up with the demands of
real-time inserts and online queries to the
database made by public-facing websites.

 RDBMS frequently index many


columns of every new row, a process
which decreases system performance.

 When single-processor RDBMSs are


used as a back end to a web store front,
the random bursts in web traffic slow
down response for everyone, and tuning
these systems can be costly when both
high read and write throughput is desired.

29 Module 3: NoSQL
NoSQL business drivers

Variability
• Companies that want to capture and
report on exception data struggle when
attempting to use rigid database schema
structures imposed by RDBMSs.

• For example, if a business unit wants


to capture a few custom fields for a
particular customer, all customer rows
within the database need to store this
information even though it doesn’t apply.

• Adding new columns to an RDBMS


requires the system be shut down and
ALTER TABLE commands to be run.
When a database is large, this process
can impact system availability, costing
time and money.

30 Module 3: NoSQL
NoSQL business drivers

Agility
 The most complex part of building
applications using RDBMSs is the
process of putting data into and getting
data out of the database.
 If your data has nested and repeated
subgroups of data structures, you need
to include an object-relational mapping
layer.
 The responsibility of this layer is to
generate the correct combination of
INSERT, UPDATE, DELETE, and
SELECT SQL statements to move
object data to and from the RDBMS
persistence layer.

31 Module 3: NoSQL
NoSQL business drivers

Agility
 This process isn’t simple and is associated
with the largest barrier to rapid change
when developing new or modifying
existing applications.
 Generally, object-relational mapping
requires experienced software developers
who are familiar with object-relational
frameworks such as Java Hibernate (or
NHiber-nate for .Net systems).
 Even with experienced staff, small
change requests can cause slowdowns in
development and testing schedules.

32 Module 3: NoSQL
Lecture 19

NoSQL Data Architecture


Patterns: Key-Value stores,
Column Family stores,
Document Stores
Types of NoSQL Databases

34 Module 3: NoSQL
Types of NoSQL databases

 Relational databases generally strive toward normalization: making sure


every piece of data is stored only once.

35 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database

Traditional relational databases are row-oriented, with each row having a row-id
and each field within the row stored together in a table.

36 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database

 Every time you look something up in a row-oriented database, every row is


scanned, regardless of which columns you require. Let’s say you only want a list
of birthdays in September. The database will scan the table from top to bottom
and left to right

37 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database

 Column databases store each column separately, allowing for quicker scans
when only a small number of columns are involved

38 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database

When should you use a row-oriented database and when should you use a
column-oriented database?

In a column-oriented database it’s easy to add another column


because none of the existing columns are affected by it. But adding an
entire record requires adapting all tables. This makes the row-oriented
database preferable over the column-oriented database for online
transaction processing (OLTP) because this implies adding or changing
records constantly.

39 Module 3: NoSQL
Types of NoSQL databases : Column-Oriented Database

 Column Family Store:


 Apache Hbase

 Facebook’s Cassandra

 Hypertable

 Google BigTable

40 Module 3: NoSQL
Types of NoSQL databases : Key-Value Stores

 Key-value stores are the least complex of the NoSQL databases. They are,
as the name suggests, a collection of key-value pairs.
 This simplicity makes them the most scalable of the NoSQL database types,
capable of storing huge amounts of data.

41 Module 3: NoSQL
Types of NoSQL databases : Key-Value Stores

 The value in a key-value store can be anything: a string, a number, but also
an entire new set of key-value pairs encapsulated in an object. Figure,
shows a slightly more complex key value nested structure.

Examples:
 Redis
 Voldemort
 Riak
 Amazon’s Dynamo

42 Module 3: NoSQL
Lecture 20

Document Stores, Graph


Stores
Types of NoSQL databases : Document Stores

 Document stores are one step up in complexity from key-value stores.

 Document stores appear the most natural among the NoSQL database
types because they’re designed to store everyday documents as is, and
they allow for complex querying and calculations on this often already
aggregated form of data.

 The way things are stored in a relational database makes sense from a
normalization point of view: everything should be stored only once and
connected via foreign keys. Document stores care little about
normalization as long as the data is in a structure that makes sense.

44 Module 3: NoSQL
Types of NoSQL databases : Document Stores

 Newspapers or magazines, for example, contain articles. To store these in a


relational database, you need to chop them up first: the article text goes in
one table, the author and all the information about the author in another,
and comments on the article when published on a website go in yet another.
 Examples of document stores are MongoDB and CouchDB.

45 Module 3: NoSQL
Types of NoSQL databases : Document Stores

46 Module 3: NoSQL
Types of NoSQL databases : Document Stores

47 Module 3: NoSQL
Types of NoSQL databases : Document Stores

48 Module 3: NoSQL
Types of NoSQL databases : Document Stores

game::1
{
“name”:”Pokemon Red”,
“price”:”29.99”
}
game::2
{
“name”:”Super Smash Bros.”
“price”:”49.99”
}

49 Module 3: NoSQL
Types of NoSQL databases : Document Stores

person::agupta
{
“first_name”:”Arun”,
“last_name”:”Gupta”
“email”:”[email protected]
}

50 Module 3: NoSQL
Types of NoSQL databases : Document Stores

transaction::1 transaction::2
{ {
“order_number”:”1234” “order_number”:”1234”
“date”:”07/08/2016” “date”:”07/08/2016”
“person_id”:”person::nraboy”” “person_id”:”person::nraboy””
“game_id”:”game::1” “game_id”:”game::2”
“quantity”:”1” “quantity”:”1”
} }

51 Module 3: NoSQL
Types of NoSQL databases : Document Stores

transaction::1 transaction::2
{ {
“order_number”:”1234” “order_number”:”1234”
“date”:”07/08/2016” “date”:”07/08/2016”
“person_id”:”person::nraboy”” “person_id”:”person::nraboy””
“game_id”:”game::1” “game_id”:”game::2”
“quantity”:”1” “quantity”:”1”
} }

52 Module 3: NoSQL
Types of NoSQL databases : Document Stores

Embedded

53 Module 3: NoSQL
Types of NoSQL databases : Graph Databases

 The last big NoSQL database type is the most complex one, geared toward
storing relations between entities in an efficient manner.

 When the data is highly interconnected, such as for social networks,


scientific paper citations, or capital asset clusters, graph databases are the
answer.

 Graph or network data has two main components:

Node: The entities themselves. In a social network this could be people.

Edge: The relationship between two entities. This relationship is


represented by a line and has its own properties. An edge can have a
direction, for example, if the arrow indicates who is whose boss.

54 Module 3: NoSQL
Types of NoSQL databases : Graph Databases

 Graphs can become incredibly complex given enough relation and entity
types. Figure already shows that complexity with only a limited number of
entities. Graph databases like Neo4j also claim to uphold ACID, whereas
document stores and key-value stores adhere to BASE.

55 Module 3: NoSQL
Thank You

You might also like