This document introduces NoSQL databases by discussing what NoSQL is, categories of NoSQL databases, business drivers for using NoSQL, and comparing the pros and cons of relational databases and NoSQL databases. It also covers topics like data sharding, the CAP theorem, and ensuring high availability, consistency and partitioning tolerance in distributed systems.
This document introduces NoSQL databases by discussing what NoSQL is, categories of NoSQL databases, business drivers for using NoSQL, and comparing the pros and cons of relational databases and NoSQL databases. It also covers topics like data sharding, the CAP theorem, and ensuring high availability, consistency and partitioning tolerance in distributed systems.
Overview What is NoSQL? Categories of NoSQL. Business drivers. RDBMS vs. NoSQL.
IBAD - 2014 - MSS 2
What is NoSQL? What do you think it is?
NoSQL is a set of concepts that allows the rapid
and efficient processing of data sets with a focus on performance, reliability, and agility.
It is not the opposite of relational world!
IBAD - 2014 - MSS 3
Ok, so, what is NoSQL actually? Its more than rows. Its free of joins. Its schema-free. Its distributed. Its not about the language, SQL. Its not always about cloud.
Not only SQL
IBAD - 2014 - MSS 4
Categories of NoSQL
IBAD - 2014 - MSS 5
Business drivers
Volume, Velocity, Agility, Variability,
IBAD - 2014 - MSS 6
RDBMS vs. NoSQL: pros & cons
IBAD - 2014 - MSS 7
RDBMS, pros: Atomic, Consistent, Isolation, Durable Security on columns and rows using views. Most SQL code is portable (standardized). Typed columns and constraints will validate data before its added to the database and increase data quality. Entity-relational design and SQL are popular.
IBAD - 2014 - MSS 8
RDBMS, cons: The object-relational mapping can be complex. Entity-relationship modeling must be completed before testing begins. RDBMSs dont scale out when joins are required. Full-text search requires third-party tools. It can be difficult to store high-variability data.
IBAD - 2014 - MSS 9
NoSQL, pros: No ER modeling is required, faster development. Linear scaling takes place as new processing nodes are added to the cluster. Theres no need for an object-relational mapping layer. Its easy to store high-variability data. Designed for performance through data distribution.
IBAD - 2014 - MSS 10
NoSQL, cons: ACID transactions can be done only within a document at the database level. Other transactions must be done at the application level. Document stores dont provide fine-grained security at the element level. NoSQL is relatively not popular. The document store has its own proprietary nonstandard query language, less portable.
IBAD - 2014 - MSS 11
Data sharding Is splitting the data into chunks, then distribute those chunks to neighboring nodes in a distributed setting.
This action is taken when a node is almost
exceeding its maximum capacity.
IBAD - 2014 - MSS 12
Data sharding Is splitting the data into chunks, then distribute those chunks to neighboring nodes in a distributed setting.
This action is taken when a node is almost
exceeding its maximum capacity.
IBAD - 2014 - MSS 13
The CAP theorem [1] The CAP theorem is a set of property when working in a distributed setting. The properties are: Consistency: when multiple clients read the same query result. High availability: when the system is guaranteed to response any query. Partitioning tolerance: the system stays serving query when part of it is disconnected. IBAD - 2014 - MSS 14 The CAP theorem [2] The theorem, introduced by Brewer in 2000, states that only two from the three properties can be preserved in the context of distributed setting.