0% found this document useful (0 votes)
3 views

05 NoSQL

NoSQL is a movement towards non-relational databases designed for handling Big Data, emphasizing scalability and flexibility in data storage and retrieval. It includes various types such as key-value stores, document databases, column-family stores, and graph databases, with MongoDB being a prominent example of a document database. Key features of NoSQL databases include support for semi-structured data, sharding, replication, and the use of JSON or BSON for data representation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
3 views

05 NoSQL

NoSQL is a movement towards non-relational databases designed for handling Big Data, emphasizing scalability and flexibility in data storage and retrieval. It includes various types such as key-value stores, document databases, column-family stores, and graph databases, with MongoDB being a prominent example of a document database. Key features of NoSQL databases include support for semi-structured data, sharding, replication, and the use of JSON or BSON for data representation.
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 21

NoSQL

1
Big Data (some old numbers)
• Facebook:
 130TB/day: user logs
 200-400TB/day: 83 million pictures

• Google: > 25 PB/day processed data

• Gene sequencing: 100M kilobases


per day per machine
 Sequence 1 human cell costs Illumina $1k
 Sequence 1 cell for every infant
 10 trillion cells / human body

• Total data created in 2010: 1.ZettaByte


(1,000,000 PB)/year
 ~60% increase every year
2
Big data is not only databases
• Big data is more about data analytics and
on-line querying

Many components:
• Storage systems
• Database systems
• Data mining and statistical algorithms
• Visualization

3
What is NoSQL?
• An emerging “movement” around
non-relational software for Big Data
• Roots are in the Google and Amazon homegrown
software stacks

• Wikipedia: “A NoSQL database provides a mechanism for storage


and retrieval of data that use looser consistency models than
traditional relational databases in order to achieve
horizontal scaling and higher availability.
• Some authors refer to them as "Not only SQL" to emphasize that
some NoSQL systems do allow SQL-like query language to be
used.”
Some NoSQL Components

Analytics Interface Imperative Lang


(Pig, Hive, …) (RoR, Java,Scala, …)

Data Parallel Processing


(MapReduce/Hadoop)
Distributed Key/Value or Column
Store
(Cassandra, Hbase, Redis, …)
Scalable File System
(GFS, HDFS, …)

5
NoSQL features
• Scalability is crucial!
 load increased rapidly for many applications
• Large servers are expensive

• Solution: use clusters of small


commodity machines
 need to partition the data and use
replication (sharding)
 cheap (usually open source!)
 cloud-based storage

6
NoSQL features
• Sometimes not a well defined schema

• Allow for semi-structured data


 still need to provide ways to query
efficiently
(use of index methods)
 need to express specific types of queries
easily

7
Flavors of NoSQL

Four main types:


• key-value stores
• document databases
• column-family (big-table) stores
• graph databases

=>Here we will talk more about


“Document” databases (MongoDB)

10
Key-Value Stores

There are many systems like that: Redis,


MemcacheDB, Amazon's DynamoDB,
Voldemort

• Simple data model: key/value pairs


• the DBMS does not attempt to interpret the
value

• Queries are limited to query by key


• get/put/update/delete a key/value pair
• iterate over key/value pairs

11
Document Databases
Examples include: MongoDB, CouchDB, Terrastore

• Also store key/value pairs


- However, the value is a document.
• expressed using some sort of semi-structured data model
• XML
• more often: JSON or BSON (JSON's binary counterpart)
• the value can be examined and used by the DBMS (unlike
in key/ data stores)
• Queries can be based on the key (as in key/value
stores), but more often they are based on the
contents of the document.

• Here again, there is support for sharding and


replication.
• the sharding can be based on values within the
document 12
The Structure Spectrum

Structured Semi- Unstructure


(schema- Structured d (schema-
first) (schema-later) never)

Relational Documents Plain Text


Database XML
Media
Formatted Tagged
Messages Text/Media
MongoDB (An example of a
Document Database)
-Data are organized in collections. A collection
stores a set of documents.
- Collection like table and document like
record
- but: each document can have a different set
of attributes even in the same collection
- Semi-structured schema!
- Only requirement: every document should
have an “_id” field

14
Example mongodb

{ "_id”:ObjectId("4efa8d2b7d284dad101e4bc9"),
"Last Name": ” Cousteau",
"First Name": ” Jacques-Yves",
"Date of Birth": ”06-1-1910" },

{ "_id": ObjectId("4efa8d2b7d284dad101e4bc7"),
"Last Name": "PELLERIN",
"First Name": "Franck",
"Date of Birth": "09-19-1983",
"Address": "1 chemin des Loges",
"City": "VERSAILLES" }

15
Example Document Database:
MongoDB
Key features include:
• JSON-style documents
• actually uses BSON (JSON's binary
format)
• replication for high availability
• auto-sharding for scalability
• document-based queries
• can create an index on any attribute
• for faster reads

16
MongoDB Terminology
relational term <== >MongoDB equivalent
----------------------------------------------------------
database <== > database
table <== > collection
row <== > document
attributes <== > fields (field-name:value pairs)
primary key <== > the _id field, which is the key
associated with the document

17
JSON
• JSON is an alternative data model for
semi-structured data.
• JavaScript Object Notation

• Built on two key structures:


• an object, which is a sequence of name/value pairs
{ ”_id": "1000",
"name": "Sanders Theatre",
"capacity": 1000 }
• an array of values [ "123", "222", "333" ]
• A value can be:
• an atomic value: string, number, true,
false, null
• an object
• an array
18
The _id Field
Every MongoDB document must have an _id
field.
• its value must be unique within the
collection
• acts as the primary key of the collection
• it is the key in the key/value pair
• If you create a document without an _id field:
• MongoDB adds the field for you
• assigns it a unique BSON ObjectID
• example from the MongoDB shell:
> db.test.save({ rating: "PG-13" })
> db.test.find()
{ "_id" :ObjectId("528bf38ce6d3df97b49a0569"), "rating" : "PG-
13" }
19
Data Modeling in MongoDB
Need to determine how to map
entities and relationships => collections of
documents
• Could in theory give each type of entity:
• its own (flexibly formatted) type of document
• those documents would be stored in the same
collection
• However, it can make sense to group different
types of entities together.
• create an aggregate containing data
that tends to be accessed together

20
Capturing Relationships in
MongoDB
• Two options:
 1. store references to other documents
using their _id values

 2. embed documents within other


documents

21
Example relationships
Consider the following documents examples:
{ {
"_id":ObjectId("52ffc33cd85242f436000001"), "_id":ObjectId("52ffc4a5d85242602e000000"),
"name": "Tom Hanks", "building": "22 A, Indiana Apt",
"contact": "987654321", "pincode": 123456,
"dob": "01-01-1991" "city": "Los Angeles",
} "state": "California"
}

Here is an example of embedded relationship:


{
"_id":ObjectId("52ffc33cd85242f436000001"),
"contact": "987654321",
"dob": "01-01-1991",
"name": "Tom Benzamin", And here an example of reference based
"address": [
{ {
"building": "22 A, Indiana Apt",
"_id":ObjectId("52ffc33cd85242f436000001"),
"pincode": 123456,
"city": "Los Angeles", "contact": "987654321",
"state": "California" "dob": "01-01-1991",
}, "name": "Tom Benzamin",
{ "address_ids": [
"building": "170 A, Acropolis Apt", ObjectId("52ffc4a5d85242602e000000"),
"pincode": 456789, ObjectId("52ffc4a5d85242602e000001")
"city": "Chicago",
]
"state": "Illinois"
} }
]
}
22
Other Structure Issues
• NoSQL: a) Tables are unnatural, b) “joins” are
evil, c) need to be able to “grep” my data

• DB: a) Tables are a natural/neutral structure,


b) data independence lets you precompute
joins under the covers, c) this is a price of all
the DBMS goodness you get

This is an Old Debate – Object-oriented


databases, XML DBs, Hierarchical, …

23

You might also like