3.1 Introduction to NoSQL
3.1 Introduction to NoSQL
Introduction to NoSQL
NoSQL is a type of database management system (DBMS) that is designed
to handle and store large volumes of unstructured and semi-structured
data. Unlike traditional relational databases that use tables with pre-
defined schemas to store data, NoSQL databases use flexible data models
that can adapt to changes in data structures and are capable of scaling
horizontally to handle growing amounts of data.
The term NoSQL originally referred to “non-SQL” or “non-relational”
databases, but the term has since evolved to mean “not only SQL,” as
NoSQL databases have expanded to include a wide range of different
database architectures and data models.
Figure 1.1.
1.2.1. Volume
Without a doubt, the key factor pushing organizations to look at
alternatives to their current RDBMSs is a need to query big data
using clusters of commodity processors. Until around 2005,
performance concerns were resolved by purchasing faster
processors. In time, the ability to increase processing speed was
no longer an option. As chip density increased, heat could no
longer dissipate fast enough without chip overheating. This
phenomenon, known as the power wall, forced systems
designers to shift their focus from increasing speed on a single
chip to using more processors working together. The need to
scale out (also known as horizontal scaling), rather than scale up
(faster processors), moved organizations from serial to parallel
processing where data problems are split into separate paths
and sent to separate processors to divide and conquer the work.
1.2.2. Velocity
Though big data problems are a consideration for many
organizations moving away from RDBMSs, the ability of a single
processor system to rapidly read and write data is also key.
Many single-processor RDBMSs are unable to keep up with the
demands of real-time inserts and online queries to the database
made by public-facing websites. RDBMSs frequently index many
columns of every new row, a process which decreases system
performance. When single-processor RDBMSs are used as a
back end to a web store front, the random bursts in web traffic
slow down response for everyone, and tuning these systems can
be costly when both high read and write throughput is desired.
1.2.3. Variability
Companies that want to capture and report on exception data
struggle when attempting to use rigid database schema
1.2.4. Agility
The most complex part of building applications using RDBMSs is
the process of putting data into and getting data out of the
database. If your data has nested and repeated subgroups of
data structures, you need to include an object-relational mapping
layer. The responsibility of this layer is to generate the correct
combination of INSERT, UPDATE, DELETE, and SELECT SQL
statements to move object data to and from the RDBMS
persistence layer. This process isn’t simple and is associated with
the largest barrier to rapid change when developing new or
modifying existing applications.
1.3.3. Case study: Google’s Bigtable—a table with a billion rows and a
million columns
Background: Google’s Bigtable was designed to handle massive datasets
from web crawlers, which were too large for traditional relational
databases.
Solution: Bigtable is a distributed storage system that scales easily with
data growth without requiring costly hardware. It provides a single, large
table for storing structured data and operates across multiple data
centers globally.
Impact: Bigtable successfully managed Google’s extensive data needs
and influenced the development of similar technologies, such as Apache
HBase and Apache Cassandra.