0% found this document useful (0 votes)
14 views23 pages

Nosql Module 1

NoSQL databases are non-relational data management systems designed to handle large volumes of unstructured and semi-structured data, allowing for flexible data models and horizontal scalability. They are particularly suited for big data and real-time applications, with various types including document stores, key-value stores, column-family stores, and graph databases. While NoSQL offers advantages such as high scalability and flexibility, it also faces challenges like lack of standardization and ACID compliance.

Uploaded by

Ananya Hegde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views23 pages

Nosql Module 1

NoSQL databases are non-relational data management systems designed to handle large volumes of unstructured and semi-structured data, allowing for flexible data models and horizontal scalability. They are particularly suited for big data and real-time applications, with various types including document stores, key-value stores, column-family stores, and graph databases. While NoSQL offers advantages such as high scalability and flexibility, it also faces challenges like lack of standardization and ACID compliance.

Uploaded by

Ananya Hegde
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 23

NOSQL

22MCA335
Module 1

Introduction to NoSQL

 NoSQL Database is a non-relational Data Management System, that does not require a fixed
schema.

 It avoids joins and is easy to scale.

 The major purpose of using a NoSQL database is for distributed data stores with huge data
storage needs.

 NoSQL is used for Big data and real-time web apps.

 For example, companies like Twitter, Facebook and Google collect terabytes of user data
every single day.

 NoSQL database stands for “Not Only SQL” or “Not SQL.”

 Carl Strozzi introduced the NoSQL concept in 1998.

 Traditional RDBMS uses SQL syntax to store and retrieve data for further insights.

 Instead, a NoSQL database system encompasses a wide range of database technologies that
can store structured, semi-structured, unstructured and polymorphic data.

1|Page
Definition of NoSQL

 NoSQL is a type of database management system (DBMS) that is designed to handle and
store large volumes of unstructured and semi-structured data. Also (storage of structured,
unstructured, semi-structured or polymorphic data.)

 NoSQL databases (aka "not only SQL") are non-tabular databases and store data differently
than relational tables.

 Unlike traditional relational databases that use tables with pre-defined schemas to store
data, NoSQL databases use flexible data models that can adapt to changes in data structures
and are capable of scaling horizontally to handle growing amounts of data.

NOSQL History

 The acronym NoSQL was first used in 1998 by Carlo Strozzi while naming his lightweight,
open-source “relational” database that did not use SQL.

 This concept was then adopted and popularized by GAFAMs ( stocks of Google, Apple,
Facebook, Amazon, and Microsoft.) such as Google, Facebook or Amazon faced with huge
volumes of data. Relational databases had become too slow.

 Instead of upgrading their IT equipment to increase the performance of RDBMS (Relational


Database Management System), the tech giants chose to distribute the load over multiple
host servers. This is known as the “scaling out” method. NoSQL databases are ideal for
scaling out, since they are non-relational.

 In the year 2000, the graphical database Neo4j was launched. Then it was the turn of the
Google Bigtable, in 2004, and CouchDB in 2005. The history of NoSQL databases was also
marked by Amazon Dynamo in 2007.

 Then, in 2008, Facebook made open source the non-relational database it uses
internally: Cassandra. This tool became the reference for NoSQL databases, and put the
term NoSQL back in the spotlight by giving it its current meaning and popularity.

Why NoSQL?

 The concept of NoSQL databases became popular with Internet giants like Google,
Facebook, Amazon, etc. who deal with huge volumes of data. The system response time
becomes slow when you use RDBMS for massive volumes of data.

 To resolve this problem, we could “scale up” our systems by upgrading our existing
hardware. This process is expensive.

 The alternative for this issue is to distribute database load on multiple hosts whenever the
load increases. This method is known as “scaling out.”

 NoSQL database is non-relational, so it scales out better than relational databases

Reasons to choose a NoSQL database

 Handling large volumes of data at scale seamlessly. NoSQL databases can handle large
amounts of data by spreading it across multiple servers.

 Working easily with unstructured or semi-structured data. .

2|Page
 Enabling rapid development.

 High read/write speed.

 Managing complex relationships

Types of NoSQL databases

Types of NoSQL databases

Document databases: Document-Oriented NoSQL DB stores and retrieves data as a key


value pair but the value part is stored as a document.

➢ The document is stored in JSON or XML formats.

➢ The value is understood by the DB and can be queried.

➢ Example Databases: MongoDB, CouchDB.

3|Page
Key-value stores: Data is stored in key/value pairs. It is designed in such a way to handle lots
of data and heavy load.

➢ Key-value pair storage databases store data as a hash table where each key is unique, and
the value can be a JSON, BLOB(Binary Large Objects), string, etc.

➢ Example NoSQL Databases: Redis, Amazon DynamoDB

Column-family stores/Wide column store: Column-oriented databases work on columns


and are based on Bigtable paper by Google. Every column is treated separately. Values of
single column databases are stored contiguously.

➢ They deliver high performance on aggregation queries like SUM, COUNT, AVG, MIN etc. as
the data is readily available in a column.

➢ Column-based NoSQL databases are widely used to manage data warehouses, business
intelligence, CRM, Library card catalogs

➢ Example NoSQL Databases: Apache Cassandra, HBase.

Graph databases: A graph type database stores entities as well the relations amongst those
entities.

➢ The entity is stored as a node with the relationship as edges. An edge gives a relationship
between nodes.

➢ Every node and edge has a unique identifier.

➢ A Graph database is a multi-relational in nature. Traversing relationship is fast as they are


already captured into the DB, and there is no need to calculate them.

➢ Graph base database mostly used for social networks, logistics, spatial data.

➢ Example NoSQL Databases: Neo4j, Amazon Neptune.

Other Types

5. Object-Oriented Database: Objects, classes, and methods are used to organize and store
data. This model is particularly suitable for applications with complex data structures.

Example NoSQL Databases: db4o, ObjectDB.

6. Time-Series Database: Optimized for handling time-series data, such as measurements or


events over time. This model is efficient for queries related to temporal patterns.

Example NoSQL Databases: InfluxDB, OpenTSDB.

4|Page
 Data Models used are

1. Document Store:.

Example NoSQL Databases: MongoDB, CouchDB.

2. Key-Value Store:

Example NoSQL Databases: Redis, Amazon DynamoDB.

3. Wide-Column Store:

Example NoSQL Databases: Apache Cassandra, HBase.

4. Graph Database:

Example NoSQL Databases: Neo4j, Amazon Neptune.

5. Object-Oriented Database:

Example NoSQL Databases: db4o, ObjectDB.

6. Time-Series Database:

Example NoSQL Databases: InfluxDB, OpenTSDB.

Features of NoSQL

➢ Dynamic schema: NoSQL databases do not have a fixed schema and can accommodate
changing data structures without the need for migrations or schema alterations.

➢ Horizontal scalability: NoSQL databases are designed to scale out by adding more nodes to a
database cluster, making them well-suited for handling large amounts of data and high
levels of traffic.

➢ Document-based: Some NoSQL databases, such as MongoDB, use a document-based data


model, where data is stored in a scalessemi-structured format, such as JSON or BSON.

➢ Key-value-based: Other NoSQL databases, such as Redis, use a key-value data model, where
data is stored as a collection of key-value pairs.

➢ Column-based: Some NoSQL databases, such as Cassandra, use a column-based data model,
where data is organized into columns instead of rows.

➢ Distributed and high availability: NoSQL databases are often designed to be highly available
and to automatically handle node failures and data replication across multiple nodes in a
database cluster.

➢ Flexibility: NoSQL databases allow developers to store and retrieve data in a flexible and
dynamic manner, with support for multiple data types and changing data structures.

➢ Performance: NoSQL databases are optimized for high performance and can handle a high
volume of reads and writes, making them suitable for big data and real-time applications.

5|Page
Advantages of NoSQL

➢ High scalability: NoSQL databases use sharding for horizontal scaling. Partitioning of data
and placing it on multiple machines in such a way that the order of the data is preserved is
sharding.

➢ Vertical scaling means adding more resources to the existing machine whereas
horizontal scaling means adding more machines to handle the data. Vertical scaling
is not that easy to implement but horizontal scaling is easy to implement.

➢ Examples of horizontal scaling databases are MongoDB, Cassandra, etc. NoSQL can
handle a huge amount of data because of scalability, as the data grows NoSQL
scalesThe auto itself to handle that data in an efficient manner.

 Flexibility: NoSQL databases are designed to handle unstructured or semi-structured data,


which means that they can accommodate dynamic changes to the data model. This makes
NoSQL databases a good fit for applications that need to handle changing data
requirements.

➢ Cost-effectiveness: NoSQL databases are often more cost-effective than traditional


relational databases, as they are typically less complex and do not require expensive
hardware or software.

➢ Agility: Ideal for agile development.

➢ High availability: The auto, replication feature in NoSQL databases makes it highly available
because in case of any failure data replicates itself to the previous consistent state.

➢ Scalability: NoSQL databases are highly scalable, which means that they can handle large
amounts of data and traffic with ease. This makes them a good fit for applications that need
to handle large amounts of data or traffic

➢ Performance: NoSQL databases are designed to handle large amounts of data and traffic,
which means that they can offer improved performance compared to traditional relational
databases.

Disadvantages of NoSQL

➢ Lack of standardization: There are many different types of NoSQL databases, each with its
own unique strengths and weaknesses. This lack of standardization can make it difficult to
choose the right database for a specific application

➢ Lack of ACID compliance: NoSQL databases are not fully ACID-compliant, which means that
they do not guarantee the consistency, integrity, and durability of data. This can be a
drawback for applications that require strong data consistency guarantees.

➢ Open-source: NoSQL is a database open-source database. There is no reliable standard for


NoSQL yet. In other words, two database systems are likely to be unequal.

➢ Lack of support for complex queries: NoSQL databases are not designed to handle complex
queries, which means that they are not a good fit for applications that require complex data
analysis or reporting.

6|Page
➢ Lack of maturity: NoSQL databases are relatively new and lack the maturity of traditional
relational databases. This can make them less reliable and less secure than traditional
databases.

Different NoSQL Products:

MongoDB:

➢ Type: Document store

➢ Features: JSON-like documents, dynamic schemas, high performance, scalability.

➢ Use Cases: Content Management, Mobile Apps, Real-time Analytics.

Cassandra:

 Type: Wide-column store

 Features: High scalability, fault-tolerant, decentralized.

 Use Cases: Time-series data, sensor data, recommendation engines.

Couchbase:

1. Type: Document store

2. Features: Distributed architecture, memory-first architecture, JSON support.

3. Use Cases: Caching, session storage, real-time big data analytics.

Neo4j:

1. Type: Graph database

2. Features: Optimized for managing relationships, high-performance graph queries.

3. Use Cases: Social networks, fraud detection, network and IT operations.

CHALLENGES OF RDBMS

 RDBMS assumes a well-defined structure in data.

 It assumes that the data is dense and is largely uniform.

 RDBMS builds on a prerequisite that the properties of the data can be defined up front and
that its interrelationships are well established and systematically referenced.

 It also assumes that indexes can be consistently defined on data sets and that such indexes
can be uniformly leveraged for faster querying.

 Unfortunately, RDBMS starts to show signs of giving way as soon as these assumptions don’t
hold true.

 RDBMS can certainly deal with some irregularities and lack of structure but in the context of
massive sparse data sets with loosely defined structures, RDBMS appears a forced fit.

7|Page
 With massive data sets the typical storage mechanisms and access methods also get
stretched.

 Denormalizing tables, dropping constraints, and relaxing transactional guarantee can help an
RDBMS scale, but after these modifications an RDBMS starts resembling a NoSQL product

 NoSQL alleviates the problems that RDBMS imposes and makes it easy to work with large
sparse data, but in turn takes away the power of transactional integrity and flexible indexing
and querying

DATA SIZE MATH

A byte is a unit of digital information that consists of 8 bits.

Kilobyte (kB) — 10^3

Megabyte (MB) — 10^ 6

Gigabyte (GB) — 10 ^ 9

Terabyte (TB) — 10 ^ 12

Petabyte (PB) — 10 ^ 15

Exabyte (EB) — 10 ^18

Zettabyte (ZB) — 10^21

Yottabyte (YB) — 10 ^ 24

Scalability

 Scalability is the ability of a system to increase throughput with addition of resources to


address load increases.

 Scalability can be achieved either by provisioning a large and powerful resource to meet the
additional demands or it can be achieved by relying on a cluster of ordinary machines to
work as a unit.

 The involvement of large, powerful machines is typically classified as vertical scalability.

 Provisioning super computers with many CPU cores and large amounts of directly attached
storage is a typical vertical scaling solution.

 Such vertical scaling options are typically expensive and proprietary. The alternative to
vertical scalability is horizontal scalability.

 Horizontal scalability involves a cluster of commodity systems where the cluster scales as
load increases.

 Horizontal scalability typically involves adding additional nodes to serve additional load.

 Processing data spread across a cluster of horizontally scaled machines is complex.

8|Page
 The MapReduce model possibly provides one of the best possible methods to process large-
scale data on a horizontal cluster of machines

Definition and Introduction of Mapredcue

 MapReduce is a parallel programming model that allows distributed processing on large data
sets on a cluster of computers

 MapReduce derives its ideas and inspiration from concepts in the world of functional
programming

 In functional programming, a map function applies an operation or a function to each


element in a list

 Eg Multiply-by-two function on a list [1, 2, 3, 4] would generate another list [2, 4, 6, 8].

 Like the map function, functional programming has a concept of a reduce function
commonly known as a fold function

 Reduce or fold function applies a function on all elements of a data structure, such as a list,
and produces a single result or output.

 So applying a reduce function-like summation on the list generated out of the map function,
that is, [2, 4, 6, 8], would generate an output equal to 20.

 This same simple idea of map and reduce has been extended to work on large data sets.

 The idea is slightly modified to work on collections of tuples or key/value pairs.

 The map function applies a function on every key/value pair in the collection and generates
a new collection.

 Then the reduce function works on the new generated collection and applies an aggregate
function to compute a final output.

 Say you have a collection of key/value pairs as follows:

[{ “94303”: “Tom”}, {“94303”: “Jane”}, {“94301”: “Arun”}, {“94302”: “Chen”}]

This simple map function on this collection could get the names of all those who reside in a
particular zip code

[{“94303”:[“Tom”, “Jane”]}, {“94301”:[“Arun”]}, {“94302”:[“Chen”]}]

 Now a reduce function could work on this output to simply count the number of people who
belong to particular zip code. The final output then would be as follows:

 [{“94303”: 2}, {“94301”: 1}, {“94302”: 1}

SORTED ORDERED COLUMN-ORIENTED STORES

 Google’s Bigtable is a model where data in stored in a column-oriented way.

 This contrasts with the row-oriented format in RDBMS.

9|Page
 The column-oriented storage allows data to be stored effectively.

 It avoids consuming space when storing nulls by simply not storing a column when a value
doesn’t exist for that column.

 Each unit of data can be thought of as a set of key/value pairs, where the unit itself is
identified with the help of a primary

 identifier, often referred to as the primary key.

 Bigtable and its clones tend to call this primary key the row-key.

 Also, units are stored in an ordered-sorted manner.

 The units of data are sorted and ordered on the basis of the row-key.

 Consider a simple table of values that keeps information about a set of people.

 Such a table could have columns like first_name, last_name, occupation, zip_code, and
gender. A person’s information in this table could be as follows:

first_name: John

last_name: Doe

zip_code: 10001

gender: male

Another set of data in the same table could be as follows:

first_name: Jane

zip_code: 94303

 Therefore, the name column-family bucket stores the following values:

For row-key: 1 first_name: John last_name: Doe

For row-key: 2 first_name: Jane

The location column-family stores the following:

For row-key: 1 zip_code: 10001

For row-key: 2 zip_code: 94303

The profile column-family has values only for the data point with row-key 1 so it stores only the
following:

For row-key: 1 gender: male

 All data pertaining to a row-key is stored together.

 The column-family acts as a key for the columns it contains and the row-key acts as the key
for the whole data set.

 Example: Hbase, Hyper table

10 | P a g e
KEY/VALUE STORES

 The key of a key/value pair is a unique value in the set and can be easily looked up to access
the data.

 The data is fetched by a unique key or a number of unique keys to retrieve the associated
value with each key.

 The values can be simple data types like strings and numbers or complex objects

Features of a key-value database

 A key-value database is defined by the fact that it allows programs or users of programs to
retrieve data by keys, which are essentially names, or identifiers, that point to some stored
value.

Common features

 Retrieving a value (if there is one) stored and associated with a given key

 Deleting the value (if there is one) stored and associated with a given key

 Setting, updating, and replacing the value (if there is one) associated with a given key

 Eg: Membase, Berkely DB

DOCUMENT DATABASES

 Document databases are not document management systems.

 The word document in document databases connotes loosely structured sets of key/ value
pairs in documents, typically

 JSON (JavaScript Object Notation), and not documents or spreadsheets (though these could
be stored too).

11 | P a g e
 Document databases treat a document as a whole and avoid splitting a document into its
constituent name/value pairs.

 At a collection level, this allows for putting together a diverse set of documents into a single
collection.

 Document databases allow indexing of documents on the basis of not only its primary
identifier but also its properties

 Eg- MongoDB and CouchDB

GRAPH DATABASES

 A graph database is a database designed to treat the relationships between data as equally
important to the data itself.

 It is intended to hold data without constricting it to a pre-defined model.

 Instead, the data is stored like we first draw it out - showing how each individual entity
connects with or is related to others.

 Eg Neo4j and FlockDB

Exploring NoSql

A Simple Set of Persistent Preferences Data

 Location-based services are gaining prominence as local businesses are trying to connect
with users who are in the neighborhood and large companies are trying to customize their
online experience and offerings based on where people are stationed.

 A few common occurrences of location-based preferences are visible in popular applications


like Google Maps, which allows local search, and online retailers like Walmart.com that
provide product availability and promotion information based on your closest Walmart store
location.

 The location preferences are maintained by storing user identifier and a zip code .

 Data points like “John Doe, 10001,” “Lee Chang, 94129,” “Jenny Gonzalez 33101,” and
“Srinivas Shastri, 02101” will need to be maintained

 To store such data in a flexible and extendible way, this example uses a non-relational
database product named MongoDB

Starting MongoDB and Storing Data

 Assuming you have installed MongoDB successfully, start the server and connect to it.

 You can start a MongoDB server by running the mongod program within the bin folder of the
distribution. Distributions vary according to the underlying environment, which can be

12 | P a g e
Windows, Mac OS X, or a Linux variant, but in each case the server program has the same
name and it resides in a folder named bin in the distribution

 The simplest way to connect to the MongoDB server is to use the JavaScript shell available
with the distribution. Simply run mongo from your command-line interface. The mongo
JavaScript shell command is also found in the bin folder

 Now that the database server is up and running, use the mongo JavaScript shell to connect
to it. The initial output of the shell should be as follows:

PS C:\applications\mongodb-win32-x86_64-1.8.1> bin/mongo

MongoDB shell version: 1.8.1

connecting to: test>

 By default, the mongo shell connects to the “test” database available on localhost

 > help

➢ db.help() help on db methods

➢ db.mycoll.help() help on collection methods

➢ rs.help() help on replica set methods

➢ help connect connecting to a db help

➢ help admin administrative help

➢ help misc misc things to know

➢ help mr mapreduce help

➢ show dbs show database names

➢ show collections show collections in current database

➢ show users show users in current database

➢ exit quit the mongo shell

Creating the preferences Database

To start out, create a preferences database called prefs. After you create it, store tuples (or
pairs) of usernames and zip codes in a collection, named location, within this database. Then
store the available data sets in this defined structure.

In MongoDB terms it would translate to carrying out the following steps

The commands to do the following are as follows

 1. Switch to the prefs database.

 2. Define the data sets that need to be stored.

 3. Save the defined data sets in a collection, named location.

13 | P a g e
use prefs

w = {name: “John Doe”, zip: 10001};

x = {name: “Lee Chang”, zip: 94129};

y = {name: “Jenny Gonzalez”, zip: 33101};

z = {name: ”Srinivas Shastri”, zip: 02101};

db.location.save(w);

db.location.save(x);

db.location.save(y);

db.location.save(z);

 Running db.location.find() reveals the following output:

> db.location.find()

{ “_id” : ObjectId(“4c97053abe67000000003857”), “name” : “John Doe”,

“zip” : 10001 }

{ “_id” : ObjectId(“4c970541be67000000003858”), “name” : “Lee Chang”,

“zip” : 94129 }

{ “_id” : ObjectId(“4c970548be67000000003859”), “name” : “Jenny Gonzalez”,

“zip” : 33101 }

{ “_id” : ObjectId(“4c970555be6700000000385a”), “name” : “Srinivas Shastri”,

“zip” : 1089 }

Now add the following additional records to the location collection:

Don Joe, 10001

John Doe, 94129

You can accomplish this, via the mongo shell, as follows:

> a = {name:”Don Joe”, zip:10001};

{ “name” : “Don Joe”, “zip” : 10001 }

> b = {name:”John Doe”, zip:94129};

{ “name” : “John Doe”, “zip” : 94129 }

> db.location.save(a);

> db.location.save(b);

14 | P a g e
 To get a list of only those people who are in the 10001 zip code, you could query as follows:

> db.location.find({zip: 10001});

{ “_id” : ObjectId(“4c97053abe67000000003857”), “name” : “John Doe”, “zip” : 10001 }

{ “_id” : ObjectId(“4c97a6555c760000000054d8”), “name” : “Don Joe”,

“zip” : 10001 }

Storing and Accessing Data with Cassandra

 For starters, list the existing keyspaces in your Cassandra server. Go to the cassandra-cli,
type the show keyspaces command, and press Enter.

 ColumnFamily: IndexInfo “indexes that have been completed”

 ColumnFamily: LocationInfo “persistent metadata for the local node”

 ColumnFamily: Migrations “individual schema mutations”

 ColumnFamily: Schema “current state of the schema”

Working with Language Bindings

 To include NoSQL solutions into the application stack, it’s extremely important that robust
and flexible language bindings allow access and manipulation of these stores from some of
the most popular languages.

 MongoDB’s Drivers In this section, MongoDB drivers for four different languages, Java, PHP,
Ruby, and Python, are introduced in the order in which they are listed.

 MongoDB Java Driver

 MongoDB PHP Driver

 MongoDB Ruby Driver

 MongoDB Python Driver

Mongo Java Driver

 First, download the latest distribution of the MongoDB Java driver from the MongoDB
github code repository at http://github.com/mongodb. All offi cially supported drivers are
hosted in this code repository. The latest version of the driver is 2.5.2, so the downloaded jar
fi le is named mongo-2.5.2,jar.

 Once again start the local MongoDB server by running bin/mongod from within the
MongoDB distribution. Now use a Java program to connect to this server. Look at Listing 2-2
for a sample java program that connects to MongoDB, lists all the collections in the prefs
database, and then lists all the documents within the location collection.

15 | P a g e
import java.net.UnknownHostException;
import java.util.Set;
import com.mongodb.DB;
import com.mongodb.DBCollection;
import com.mongodb.DBCursor;
import com.mongodb.Mongo;
import com.mongodb.MongoException;
public class ConnectToMongoDB {
Mongo m = null;
DB db;
public void connect() {
try {
m = new Mongo(“localhost”, 27017 );
} catch (UnknownHostException e) {
e.printStackTrace();
} catch (MongoException e) {
e.printStackTrace();
}
}
public void listAllCollections(String dbName) {
if(m!=null){
db = m.getDB(dbName);
Set<String> collections = db.getCollectionNames();
for (String s : collections) {
System.out.println(s);
}
}
}
public void listLocationCollectionDocuments() {
if(m!=null){
db = m.getDB(“prefs”);
DBCollection collection = db.getCollection(“location”);
DBCursor cur = collection.find();
while(cur.hasNext()) {
System.out.println(cur.next());
}
} else {
System.out.println(“Please connect to MongoDB
and then fetch the collection”);
}
}
public static void main(String[] args) {
ConnectToMongoDB connectToMongoDB = new ConnectToMongoDB();
connectToMongoDB.connect();
connectToMongoDB.listAllCollections(“prefs”);
connectToMongoDB.listLocationCollectionDocuments();
}
}

16 | P a g e
MongoDB PHP Driver

First, download the PHP driver from the MongoDB github code repository and confi gure the driver
to work with your local PHP environment.

A sample PHP program that connects to a local MongoDB server and lists the documents in the
location collections in the prefs database is as follows:

$connection = new Mongo( “localhost:27017” );

$collection = $connection->prefs->location;

$cursor = $collection->find();

foreach ($cursor as $id => $value)

{ echo “$id: “;

var_dump( $value );

Mongodb Ruby Driver

 To get ready to connect to MongoDB from Ruby, get at least the mongo and bson gems. You
can install the mongo gem as follows:

gem install mongo

➢ Get all documents in a MongoDB collection using Ruby

db = Mongo::Connection.new(“localhost”, 27017).db(“prefs”)

locationCollection = db.collection(“location”)

locationCollection.find().each { |row| puts row.inspect}

MongoDB Python Driver

 The easiest way to install the Python driver is to run easy_install pymongo.

from pymongo import Connection

connection = Connection(‘localhost’, 27017)

db = connection.prefs

collection = db.location

for doc in collection.find():

doc

17 | P a g e
Thrift

 Thrift is a framework for cross-language services development. It consists of a software


stack and a code-generation engine to connect smoothly between multiple languages.

 Thrift is an interface definition language and binary communication protocol used for
defining and creating services for numerous programming languages. It was developed at
Facebook for "scalable cross-language services development“.

 Apache Cassandra uses the Thrift interface to provide a layer of abstraction to interact with
the column data store.

 The simplest command to generate all Thrift interfaces is:

thrift --gen interface/cassandra.thrift

to create only the Java Thrift interface run:

thrift --gen java interface/cassandra.thrift

 from thrift import Thrift:

Import the Thrift module from the thrift library.

Thrift is a software framework used for scalable cross-language services development.

 from thrift.transport import TTransport:

Import the TTransport module from the thrift.transport package.

TTransport provides the base class for Thrift transport implementations.

 from thrift.transport import TSocket:

Import the TSocket module from the thrift.transport package.

TSocket is a specific transport implementation that uses sockets for communication.

 from cassandra import Cassandra:

Import the Cassandra class from the cassandra library. This likely refers to a Python client for Apache
Cassandra, a NoSQL database.

 from cassandra.ttypes import *:

Import all types (*) from the cassandra.ttypes module.

This includes various Thrift-generated types used in communication with Cassandra, such as
ConsistencyLevel and ColumnParent.

 import time:

Import the standard Python time module, which provides various time-related functions.

18 | P a g e
Querying CarDataStore keyspace using the Thrift interface

Interfacing and interacting with NoSql

Storing and Accessing Data

To explain the different ways of data storage and access in NoSQL, I first classify them into the
following types:

➢ Document store — MongoDB and CouchDB

➢ Key/value store (in-memory, persistent and even ordered) — Redis and BerkeleyDB

➢ Column-family-based store — HBase and Hypertable

➢ Eventually consistent key/value store — Apache Cassandra and Voldermot

19 | P a g e
Storing Data In and Accessing Data from MongoDB

 The Apache web server Combined Log Format captures the following request and response
attributes for a web server:

➢ IP address of the client — This value could be the IP address of the proxy if the client
requests the resource via a proxy.

➢ Identity of the client — Usually this is not a reliable piece of information and often is not
recorded.

➢ User name as identified during authentication — This value is blank if no authentication is


required to access the web resource.

➢ Status code — The HTTP status code.

➢ Size of the object returned — Size is bytes.

➢ Referrer — Typically, the URI or the URL that links to a web page or resource.

➢ User-agent — The client application, usually the program or device that accesses a web page
or resource.

➢ Time when the request was received — Includes date and time, along with timezone.

➢ The request itself — This can be further broken down into four different pieces: method

➢ used, resource, request parameters, and protocol.

Querying MongoDB

 To list all the records in the logdata collection,

var cursor = db.logdata.find()

while (cursor.hasNext()) printjson(cursor.next());

This prints the data set in a nice presentable format like this:

“_id” : ObjectId(“4cb164b75a91870732000000”),

“http_vers” : “HTTP/1.1”,
“ident” : “-”,
“http_response_code” : “200”,
“referrer” : “-”,
“url” : “/hi/tag/2009/”,
“ip” : “123.125.66.32”,
“time” : “09/Oct/2010:07:30:01 -0600”,
“http_response_size” : “13308”,
“http_method” : “GET”,
“user_agent” : “Baiduspider+(+http://www.baidu.com/search/spider.htm)”,

20 | P a g e
“http_user“ : “-“,
“request_line“ : “GET /hi/tag/2009/ HTTP/1.1“
}

Look at Figure to see how cursors work.

 The method db.logdata.find() returns all the records in the logdata collection so you have
the entire set to iterate over using the cursor.
 The previous code sample simply iterates through the elements of the cursor and prints
them out.
 The printjson function prints out the elements in a nice JSON-style formatting for easy
readability.

Storing Data In and Accessing Data from Redis

 Redis is a persistent key/value store.


 For efficiency it holds the database in memory and writes to disks in an asynchronous
thread.
 The values it holds can be strings, lists, hashes, sets, and sorted sets.
 It provides a rich set of commands to manipulate its collections and insert and fetch data.

 To begin, start redis-cli and make sure it’s working


 Now run redis-cli to connect to this server.
 By default, the Redis server listens for connections on port 6379.
 To save a key/value pair — { akey: “avalue” } — simply type the following, from within the
Redis distribution folder:
 /redis-cli set akey “avalue”
 If you see OK on the console in response to the command you just typed, then things look
good

21 | P a g e
 Redis supports a few different data structures, namely:
 Lists, or more specifically, linked lists — Collections that maintain an indexed list of elements
in a particular order. With linked lists, access to either of the end points is fast irrespective of
the number of elements in the list.
 Sets — Collections that store unique elements and are unordered.
 Sorted sets — Collections that store sorted sets of elements.
 Hashes — Collections that store key/value pairs.
 Strings — Collections of characters.

Querying Redis

 Continuing with the redis-cli session, you can first list the title and author of member 1,
identified by the id 1, of the set of books as follows:
$ ./redis-cli smembers books:1:title
1. “The Omnivore\xe2\x80\x99s Dilemma”
$ ./redis-cli smembers books:1:author
1. “Michael Pollan”

Storing Data In and accessing Data from Hbase

 It’s an open-source implementation of the Google Bigtable.


 While key/value stores and non-relational alternatives like object databases have existed for
a while, HBase and its associated Hadoop tools were the first piece of software to bring the
relevance of large-scale Google-type NoSQL success catalysts in the hands of the masses.
 HBase is not the only Google Bigtable clone.
 Hypertable is another one.
 HBase is also not the ideal tabular data store for all situations.

To store this data, I intend to create a collection named blogposts and save pieces of information
into two categories, post and multimedia. So, a possible entry, in JSON-like format, could be as
follows:
{
“post” : {
“title”: “an interesting blog post”,
“author”: “a blogger”,
“body”: “interesting content”,
},
“multimedia”: {
“header”: header.png,
“body”: body.mpeg,
},
}

22 | P a g e
Querying HBase

 The simplest way to query an HBase store is via its shell.

23 | P a g e

You might also like