0% found this document useful (0 votes)
18 views

Lecture NoSqlIntro

This document provides an introduction and overview of NoSQL databases. It discusses the limitations of relational databases in scaling to meet the needs of web-based applications. NoSQL databases provide an alternative by relaxing ACID properties to allow for horizontal scaling. The document covers the history and origins of NoSQL databases from papers by Google, Amazon, and the CAP theorem. It describes common NoSQL database categories like key-value, document, columnar, and graph databases and provides examples. Characteristics of NoSQL databases include being non-relational, schema-less, replicated across nodes for fault tolerance, and horizontally scalable. Relational features are limited and consistency guarantees are relaxed compared to relational databases.

Uploaded by

Kriti Gautam
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
18 views

Lecture NoSqlIntro

This document provides an introduction and overview of NoSQL databases. It discusses the limitations of relational databases in scaling to meet the needs of web-based applications. NoSQL databases provide an alternative by relaxing ACID properties to allow for horizontal scaling. The document covers the history and origins of NoSQL databases from papers by Google, Amazon, and the CAP theorem. It describes common NoSQL database categories like key-value, document, columnar, and graph databases and provides examples. Characteristics of NoSQL databases include being non-relational, schema-less, replicated across nodes for fault tolerance, and horizontally scalable. Relational features are limited and consistency guarantees are relaxed compared to relational databases.

Uploaded by

Kriti Gautam
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 30

Introduction to NoSQL

Background
• Relational databases  mainsteam of business
• Web-based applications caused spikes
• explosion of social media sites (Facebook, Twitter) with large data needs
• rise of cloud-based solutions such as Amazon S3 (simple storage solution)
• Hooking RDBMS to web-based application becomes trouble

3
Problem for Relational Database to Scale
• The Relational Database is built on the principle of ACID (Atomicity,
Consistency, Isolation, Durability)
• It implies that a truly distributed relational database should have
availability, consistency and partition tolerance.
• Which unfortunately is impossible …
Scalability is the key for processing huge data
Scaling Up
• Best way to provide ACID and rich query model is to have the dataset
on a single machine
• Limits to scaling up (or vertical scaling: make a “single” machine
more powerful)  dataset is just too big!

6
Scaling Out
• Scaling out (or horizontal scaling: adding more smaller/cheaper
servers) is a better choice
• Different approaches for horizontal scaling (multi-node database):
• Master/Slave
• Sharding (partitioning)
Scaling out: Sharding
• Sharding (Partitioning)
• Scales well for both reads and writes
• Not transparent, application needs to be partition-aware
• Can no longer have relationships/joins across partitions
• Loss of referential integrity across shards

8
NoSQL – the history
• The Name:
• Stands for Not Only SQL
• The term NOSQL was introduced by Carl Strozzi in 1998 to name his file-based
database
• It was picked up again as Twitter hash tag in 2009 for NoSQL meet up in San
Francisco organized by Johan Oskarsson.
• A Rackspace employee Eric Evans made it popular by describing the NoSQL
movement “the whole point of seeking alternatives is that you need to solve
a problem that relational databases are a bad fit for …”

9
3 major papers for NoSQL
• Three major papers were the “seeds” of the NoSQL movement:
• BigTable (Google)
• DynamoDB (Amazon)
• Ring partition and replication
• Gossip protocol (discovery and error detection)
• Distributed key-value data stores
• Eventual consistency
• CAP Theorem

10
NoSQL Characteristics
• Non-relational
• Schema-less
• data are replicated to multiple
nodes (so, identical & fault-tolerant)
and can be partitioned:
• down nodes easily replaced
• no single point of failure
• Horizontal scalable
• Cheap, easy to implement
(open-source)
• Massive write performance
• Fast key-value access
11
NoSQL Characteristics
• Don’t fully support relational features
• no join, group by, order by operations (except within partitions)
• no referential integrity constraints across partitions
• No declarative query language (e.g., SQL)  more programming
• Relaxed ACID (see CAP theorem)  fewer guarantees
• No easy integration with other applications that support SQL

12
Who is using them?

13
NoSQL Categories
• Key-value
o Example: Dynamo, Voldermort, Scalaris
• Document-based
o Example: MongoDB, CouchDB
• Column-based
o Example: BigTable, Cassandra, Hbased
• Graph-based
o Example: Neo4J, InfoGrid

14
Key-value
• Focus on scaling to huge amounts of data
• Designed to handle massive Key-/value-stores have a simple data
model in common: a map/dictionary, allowing clients to put and
request values per key load
• Based on Amazon’s dynamo paper
• Data model: (global) collection of Key-value pairs
• Modern key-value stores favor high scalability over consistency
• The lengths of keys to be stored is limited to a certain number of
bytes while there is less limitation on values.

15
Document-based
• Can model more complex objects
• Inspired by Lotus Notes
• Data model: collection of documents
• JSON (JavaScript Object Notation is a data model, key-value pairs, which
supports objects, records, structs, lists, array, maps, dates, Boolean, etc).
• MongoDB data type: BSON (Binary Serialisation Object Notation, or Binary
JSON)

16
Column-based
• Based on Google’s BigTable paper
• Like column oriented relational databases (store data in column order) but with a twist
• Tables similarly to RDBMS, but handle semi-structured
• Data model:
• Collection of Column Families
• Column family = (key, value) where value = set of related columns (standard, super)
• indexed by the triple (row key, column key and timestamp)

allow key-value pairs to be stored (and retrieved on key) in a massively parallel system
storing principle: big hashed distributed tables
properties: partitioning (horizontally and/or vertically), high availability etc. completely transparent
to application

* Better: extendible records

17
Graph Database
• A graph has nodes and
edges/relationships (directed or
undirected)
• A graph database stores data in a
graph (nodes and relationships)
• Both nodes and relationships can
have properties, this is
sometimes referred to as the
“Property Graph Model”.
Building Blocks of Graph Database
• Nodes
• Relationships
• Attributes
• Labels
Nodes
• Nodes are often used to represent entities, but depending on the
domain relationships may be used for that purpose as well.
• The following are some example nodes:
Relationships
• Relationships organise the nodes by connecting them.
• A relationship connects two nodes – a start node and an end node.
Relationships…

• Relationships are always directed, can can be viewed as outgoing or


incoming relative to a node.

• A node can have relationships to itself


Relationships
• Relationships are equally well
traversed in either direction.
• For better graph traversal, all
relationships have a type.
Properties
• Both nodes and relationships can
have properties
• Properties are key-value pairs
where the key is a string, and
values can be either a primitive or
an array of one primitive type.
• NULL is not a valid property
value, usually modeled by the
absence of a key.
Labels
• Labels assign roles or types to nodes.
• A label is a named construct that is used to group nodes into sets
• Many database queries can work with these sets instead of the whole
graph, making queries easier to write and more efficient to execute.
• A node may be assigned to any number of labels (including none).
• Labels are used when defining constraints and adding indexes for
properties.
• Labels can be added and removed at runtime, and be used to mark
temporary states for your nodes. E.g., offline label for phones that are
offline, a Happy label for happy pets.
Paths
• A path is one or more nodes with connecting relationships, typically
retrieved as a query or traversal result.
• A query in our example database can be returned as a path.
• The shortest possible path with length zero is a single node.
Sample Graph Database Model
Cypher Clauses
Resources
http://nosql-database.org/

You might also like