NoSQL DATABSES
NoSQL DATABSES
NoSQL Databases
[Type here]
NoSQL DATABSES
ASSIGNMENT REPORT
SUBMITTED BY :
SUBMITTED TO :
GAURAV ARORA
2CML1
101410018
[Type here]
NoSQL Databases
[Type here]
CONTENTS
Introduction
PAGE NO.
3
Methodology
NoSQL Classification
Applications
10
References
12
[Type here]
NoSQL Databases
[Type here]
INTRODUCTION
A NoSQL (originally referring to "non SQL" or "non relational") database provides a
mechanism for storage and retrieval of data that is modeled in means other than the tabular
relations used in relational databases. Such databases have existed since the late 1960s, but
did not obtain the "NoSQL" moniker until a surge of popularity in the early twenty-first
century, triggered by the needs of Web 2.0 companies such as Facebook, Google and
Amazon.com.
Over the last few years we have seen the rise of a new type of databases, known as NoSQL
databases, that are challenging the dominance of relational databases. Relational databases
have dominated the software industry for a long time providing mechanisms to store data
persistently, concurrency control, transactions, mostly standard interfaces and mechanisms to
integrate application data, reporting. The dominance of relational databases, however, is
cracking.
Motivations for this approach include: simplicity of design, simpler "horizontal" scaling to
clusters of machines, which is a problem for relational databases,[2] and finer control over
availability. The data structures used by NoSQL databases (e.g. key-value, graph, or
document) differ slightly from those used by default in relational databases, making some
operations faster in NoSQL and others faster in relational databases. The particular suitability
of a given NoSQL database depends on the problem it must solve. Sometimes the data
structures used by NoSQL databases are also viewed as "more flexible" than relational
database tables.
NoSQL databases are increasingly used in big data and real-time web applications.NoSQL
systems are also sometimes called "Not only SQL" to emphasize that they may support SQLlike query languages.
[Type here]
NoSQL Databases
[Type here]
METHODOLOGY
Application developers have been frustrated with the impedance mismatch between the
relational data structures and the in-memory data structures of the application. Using NoSQL
databases allows developers to develop without having to convert in-memory structures to
relational structures.
[Type here]
NoSQL Databases
[Type here]
There is also movement away from using databases as integration points in favour of
encapsulating databases with applications and integrating using services.
The rise of the web as a platform also created a vital factor change in data storage as the need
to support large volumes of data by running on clusters.
Relational databases were not designed to run efficiently on clusters.
The data storage needs of an ERP application are lot more different than the data storage
needs of a Facebook or an Etsy, for example.
[Type here]
NoSQL Databases
[Type here]
NoSQL Classification
NOSQL can be broken into 4 different categories.
Key Value Stores
Big Table
Document Databases
Graph Databases
Each database is individually good at dealing with size and complexities.
Key Value Stores
Key value data model means that a value
corresponds to a Key. Although the structure
is simpler, the query speed is higher than
relational database, supports mass storage
and high concurrency, etc., It provided
support for query and modify operations for
data through the primary key [3]. Key values
represent bucket of data. For example, in
case of a shopping cart mentioned in Figure 3,
each shopping cart are represented in
individual buckets and represented using a
key value which could be user id. The key
values can be serialized using either java
serialization or XML. This way is very fast to
store as it just writes bits to the discs. Some
of key value stores available in market are
Berkeley DB, Tokyo Tyrant, Voldemart,
Crassandra.
Big Table
Search engine Zvents develop open
source distributed data storage system
hyper table by drawing big table. A
BigTable is a light, scattered, constant
multidimensional sorted map. Indexing of
the map is done by a row key, column
key, and a timestamp. In BigTable, uninterpreted arrays of bytes are used as
values. BigTable stores structured data.
Any type of data from text to serialized
objects can be stored by applications. It
does not impose any size constraint for
each value. A table is allowed to have
limitless number of columns. Data is
indexed using row and column names
that can be arbitrary strings
GRAPH DATABASE
Graph databases allow you to store entities and relationships between these entities.
Entities are also known as nodes, which have properties. Think of a node as an instance of
an object in the application. Relations are known as edges that can have properties. Edges
have directional significance; nodes are organized by relationships which allow you to find
interesting patterns between the nodes. The organization of the graph lets the data to be
stored once and then interpreted in different ways based on relationships.
Usually, when we store a graph-like structure in RDBMS, it's for a single type of
relationship ("who is my manager" is a common example). Adding another relationship to
the mix usually means a lot of schema changes and data movement, which is not the case
6
[Type here]
NoSQL Databases
[Type here]
when we are using graph databases. Similarly, in relational databases we model the graph
beforehand based on the Traversal we want; if the Traversal changes, the data will have
to change.
In graph databases, traversing the joins or relationships is very fast. The relationship
between nodes is not calculated at query time but is actually persisted as a relationship.
Traversing persisted relationships is faster than calculating them for every query.
Document Databases
Documents are the main concept in document databases. The database stores and
retrieves documents, which can be XML, JSON, BSON, and so on. These documents are
self-describing, hierarchical tree data structures which can consist of maps, collections,
and scalar values. The documents stored are similar to each other but do not have to be
exactly the same. Document databases store documents in the value part of the key-value
store; think about document databases as key-value stores where the value is
examinable. Document databases such as MongoDB provide a rich query language and
constructs such as database, indexes etc allowing for easier transition from relational
databases.
Some of the popular document databases
we have seen are MongoDB, CouchDB ,
Terrastore, OrientDB, RavenDB, and of
course the well-known and often reviled
Lotus Notes that uses document storage.
[Type here]
NoSQL Databases
[Type here]
Given so much choice, how do we choose which NoSQL database? As described much depends on
the system requirements, here are some general guidelines:
Key-value databases are generally useful for storing session information, user profiles, preferences,
shopping cart data. We would avoid using Key-value databases when we need to query by data,
have relationships between the data
being stored or we need to operate on
multiple keys at the same time.
Document databases are generally
useful for content management
systems, blogging platforms, web
analytics, real-time analytics,
ecommerce-applications. We would
avoid using document databases for
systems that need complex
transactions spanning multiple
operations or queries against varying
aggregate structures.
Column family databases are
generally useful for content
management systems, blogging
platforms, maintaining counters,
expiring usage, heavy write volume
such as log aggregation. We would
avoid using column family databases for systems that are in early development, changing query
patterns.
Graph databases are very well suited to problem spaces where we have connected data, such as
social networks, spatial data, routing information for goods and money, recommendation engines
[Type here]
NoSQL Databases
[Type here]
thus falling into the AP category. The case of PNUTS, the NoSQL system from Yahoo, seem not to fit
into this definition. PNUTS relaxes consistency by only guaranteeing "timeline consistency" where
replicas may not be consistent with each other but updates are guaranteed to be applied in the
same order at all replicas. It also gives up availability - if the master replica for a particular data item
is unreachable, that item becomes unavailable for updates. CA systems: refer to systems that are not
tolerant to network partitions, traditional RDBMS fall into this category. But what if a partition
happens? It means that they lose availability, thus falling into the same group as CP systems.
Tuneable Consistency: allows the user to decide the level of consistency he wants. The consistency
level is a setting that clients must specify on every operation (insert, update, read) and that allows
the user to decide how many replicas in the cluster must acknowledge a write operation or respond
to a read operation in order to be considered successful. As many of the NoSQL can tune the
consistency level in case of partitions, this is the reason some NoSQL systems fall into AP and CP
categories. Table gives a summary at a high level of the features these systems provide.
[Type here]
NoSQL Databases
[Type here]
APPLICATIONS
Social Gaming
Ad Targeting
Session Store
10
[Type here]
NoSQL Databases
[Type here]
Mobile Applications
App developers' ability to update and enhance mobile appsquickly and without service
disruptionis critical to user adoption and loyalty. Because NoSQL databases can store
user information and application content in a schema-less format, developers can quickly
modify apps without major database infrastructure changes. That means users experience
no interruption to application uptime. Some popular companies that take advantage of
NoSQL for their mobile apps are Kobo and Playtika, both of which serve millions of users
across the globe.
Globally Distributed Data Repository
Organizations are generating enormous volumes of data spread across different systems.
Using NoSQL as a data repository allows users to not only bring this information together
but to better understand and use the information. With their real-time access, scalability
and flexible data model that accommodates a wide variety of data types, NoSQL document
databases can be a great fit to build such platforms.
E-Commerce
E-commerce companies live and die by seasonal swings. Come Christmastime, users are
scrambling to purchase last-minute gifts online or through mobile purchasing apps, creating
a massive spike in usage. The ability to handle these spikeswithout overinvesting in
infrastructureis critical to ensuring a pleasing shopper experience and minimizing
abandoned purchase transactions (and lost revenue). NoSQL is a good fit for this use
pattern because of its dynamic scalability (the ability to scale up to accommodate increased
user activity and to scale down as user activity subsides). Companies such as The Hut
Group depend on NoSQL to get them through the holiday rush.
11
[Type here]
NoSQL Databases
[Type here]
REFERENCES
12