0% found this document useful (0 votes)
111 views33 pages

NoSQL DB

The document provides information about NoSQL databases and their internals. It introduces NoSQL databases as alternatives to traditional relational databases for handling large, unstructured data. It describes different NoSQL database models including key-value stores, document stores, and column-oriented stores. It also discusses the internals of some popular NoSQL databases like MongoDB, HBase, and Memcached, covering how they store and organize data.

Uploaded by

AKSHAY Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
111 views33 pages

NoSQL DB

The document provides information about NoSQL databases and their internals. It introduces NoSQL databases as alternatives to traditional relational databases for handling large, unstructured data. It describes different NoSQL database models including key-value stores, document stores, and column-oriented stores. It also discusses the internals of some popular NoSQL databases like MongoDB, HBase, and Memcached, covering how they store and organize data.

Uploaded by

AKSHAY Kumar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 33

NoSQL Database Model

Objectives

At the end of session we will be acquainted with


following topics:

-Introduction to NoSQL Databases

-Understanding Internals of different NoSQL databases


Introduction
Data Facts

Amount of data in circulation over internet by year 2020

- Google : 25 Peta Byte (PB)……… 2,62,14,400


GB !!

- Facebook : 60 million photos (1.5 PB)

Movie Avatar took 1 PB of storage space to render 3D


effects using CGI scripting

Data amount of
few TB started
knowing as
Bigdata
Bigdata : Main Problem Areas

- Efficiently storing and accessing large amounts of data


is difficult. We need backups too!!!

- Manipulating large data sets involves running


immensely parallel processes.

- Managing semi-structured and un-structured data,


generated by diverse sources, add to the problem
BigData : Hardware Challenge

Storage
- 1 TB Hard disk with 7400 RPM reads data at pace of
300 MBPS

- With this pace it will take minimum 55 minute to 1


hour to provide data up to 1 TB

Data Processing Units / Servers

- Either use mainframe servers to process and store


data.

- Use clusters or grid of machine which can scale


horizontally.
Challenges for Bigdata on RDBMS

- RDBMS assumes a well defined structure in data.


- Data distributed among multiple tables

- Tables must be indexed to optimize the


operation

- It assumes that the data is dense and is largely


uniform:
- Properties of the data can be defined up front
and that its interrelationships are well
established and systematically referenced.
NoSQL Database for BigData

- Umbrella term for all databases that:

- Don’t follow the RDBMS principles

- Related to large data sets accessed and


manipulated on a Web scale

- NoSQL is not a single product or even a single


technology
History / Advent of NoSQL DB

- Google has a set of massively scalable application and


infrastructure which operate on large amount of data.

- Google Maps
- Google Apps
- Google Mail
- Google Earth

- For them Google has invented:

- Distributed file system


- Distributed coordination system
- Map reduce based parallel execution algorithm
- Column family oriented data store / database
History / Advent of NoSQL DB

- Using same approach the first search engine system that


came out in market is LUCENE (search engine
framework).

- Later their developers joined Yahoo and worked to mimic


the development model of Google to form a new open
source development framework Hadoop (Apache
Hadoop).

- Later in year 2007 another web giant Amazon has also


revealed the story behind its NoSQL model database
known as Dynamo.
Example of NoSQL databases
NoSQL Database Models

SORTED ORDERED COLUMN-ORIENTED STORES

Hbase

History — Donated to the Apache

Technologies and Language — Implemented in Java.

Access Methods — A JRuby shell allows command-line access to


the store. Thrift, Avro

Query Language — No native querying language. Hive provides a


SQL-like interface for Hbase

Who Uses It — Facebook, Yahoo!


NoSQL Database Models

SORTED ORDERED COLUMN-ORIENTED STORES

Hypertable

History — Created at Zvents in 2007. Now an independent open-


source project.

Technologies and Language — Implemented in C++

Access Methods — A command-line shell is available. Thrift

Query Language — HQL (Hypertable Query Language)

Who Uses It —Baidu (China’s biggest search engine), Rediff


(India’s biggest portal).
NoSQL Database Models

KEY/VALUE STORES

Cassandra

History — Developed at Facebook and open sourced in 2008,


Apache Cassandra was donated to the Apache foundation.

Technologies and Language — Implemented in Java.

Access Methods — A command-line access to the store. Thrift


interface

Query Language — A query language specification is in the


making.

Who Uses It — Facebook, Digg, Reddit, Twitter, and others.


NoSQL Database Models

KEY/VALUE STORES

Voldemort

History — Created by the data and analytics team at


LinkedIn in 2008.

Technologies and Language — Implemented in Java.


Provides for pluggable storage using either Berkeley DB or
MySQL

Access Methods — Integrates with Thrift, Avro, and


protobuf

Who Uses It — LinkedIn.


NoSQL Database Models

Document Based

MongoDB

History — Created at 10gen.

Technologies and Language — Implemented in C++.

Access Methods — A JavaScript command-line interface. Drivers exist for a


number of languages including C, C#, C++, Erlang. Haskell, Java,
JavaScript, Perl, PHP, Python, Ruby, and Scala.

Query Language — SQL-like query language.

Who Uses It — FourSquare, Shutterfl y, Intuit, Github, and more.


NoSQL Database Models

Document Based

CouchDB

History — Work started in 2005 and it was incubated into Apache in 2008

Technologies and Language — Implemented in Erlang with some C and


a JavaScript execution environment.

Access Methods — Upholds REST above every other mechanism. Use


standard web tools and clients to access the database, the same way as
you access web resources.

Who Uses It — Apple, BBC, Canonical, Cern, and more at


http://wiki.apache
NoSQL Database Models

Graph Database

FlockDB

History — Created at Twitter and open sourced in 2010.


Designed to store the adjacency lists for followers on Twitter.

Technologies and Language — Implemented in Scala.

Access Methods — A Thrift and Ruby client.

Open-Source License — Apache License version 2.

Who Uses It — Twitter.


Internals of different NoSQL database
models
NoSQL Database Models

SORTED ORDERED COLUMN-ORIENTED STORES


Relational Database Table Design
Relational Database Table Design

Addition of new attributes will introduce NULL values

We may need to maintain each version of value in case of multi-updates


Record Oriented Stores (RDBMS)

001:10,Smith,Joe,40000;
002:12,Jones,Mary,50000;
003:11,Johnson,Cathy,44000;
004:22,Jones,Bob,55000;

COLUMN-ORIENTED STORES

10:001, 12:002, 11:003, 22:004;


Smith:001,Jones:002,Johnson:003, Jones:004;
Joe:001, Mary:002, Cathy:003, Bob:004;
40000:001,50000:002, 44000:003, 55000:004;
COLUMN-ORIENTED STORES

Column-Family :
-Is a set of columns grouped together into a bundle

-Column-family members are physically stored together

Column database also store multiple version of value


Basic Architechture
NoSQL Database Models

Document Store Internals


Document Based Model (MongoDB)

Start Mongo DB Server

C:\applications\mongodb-win32-x86_64-1.8.1> .\bin\mongod.exe

Connect to Mongo DB Server

C:\applications\mongodb-win32-x86_64-1.8.1> bin/mongo
MongoDB shell version: 1.8.1
connecting to: test
>
Document Based Model (MongoDB)

1. Switch to the prefs database.

2. Define the data sets that need to be stored.

3. Save the defined data sets in a collection, named location.

use prefs

w = {name: “John Doe”, zip: 10001};


x = {name: “Lee Chang”, zip: 94129};
y = {name: “Jenny Gonzalez”, zip: 33101};
z = {name: ”Srinivas Shastri”, zip: 02101};

db.location.save(w);
db.location.save(x);
db.location.save(y);
db.location.save(z);
Document Based Model (MongoDB)

Get all records stored in the collection named location

> db.location.find()

{ “_id” : ObjectId(“4c97053abe67000000003857”), “name” : “John Doe”,


“zip” : 10001 }

{ “_id” : ObjectId(“4c970541be67000000003858”), “name” : “Lee Chang”,


“zip” : 94129 }

{ “_id” : ObjectId(“4c970548be67000000003859”), “name” : “Jenny Gonzalez”,


“zip” : 33101 }

{ “_id” : ObjectId(“4c970555be6700000000385a”), “name” : “Srinivas Shastri”,


“zip” : 1089 }

> db.location.find({zip: 10001});


Document Based Model (MongoDB)

MongoDB maintaint data as:

-File Segments in Virtual Memory as accessing and manipulating


memory is much faster than making system calls

-No separation between the operating system cache and the


database cache

-MongoDB can expand its database cache to use all available


memory without any additional configuration.

-Hence we could enhance MongoDB performance by throwing in a


larger RAM and allocating a larger virtual memory

-In more recent versions, MongoDB supports auto-sharding for


scaling horizontally with ease.
Document Based Model (MongoDB)
NoSQL Database Models

Key/Value Store Data Model


Key/Value Model

- Memacached is one of the Key-Value database which is used by


Facebook, Twitter, Wikipedia

- It is extremely simple with no addon features like


- Failover
- Backup
- Recovery

- Memcached stores its values in a:


- Slab
- Slab is made of pages
- Pages are made of chunks or buckets

- Memcached can store data values up to a maximum of 1 MB in


size

- Values are stored and referenced by a key (which can be upto 250
bytes) in size

You might also like