0% found this document useful (0 votes)
10 views

NoSQL

Uploaded by

bscs23091
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

NoSQL

Uploaded by

bscs23091
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 29

NoSQL

NoSQL

Towards the end of RDBMS?


What is RDBMS
Issues with RDBMS-Scalability
Issues with RDBMS-Scalability

§ RDBMS vertically scales, by adding


resources
§ Expensive
§ Time taking: configuring multiple
hardware
Vertical Scaling
§ Vertical scaling is the process of improving the
processing power of the server, by increasing its
hardware capacity (e.g. CPU, RAM).
§ There is a threshold to the hardware
improvements — it’s capped at what’s currently
available.
§ In addition, after a certain point, vertical scaling
becomes too expensive to be a viable option.
Horizontal Scaling
§ Horizontal scaling, achieves scale by increasing the
number of servers.
§ Theoretically, you could scale to have as many servers
in parallel as you wish, which is why horizontal scaling
is the preferred option when databases have to be
scaled.
§ As servers are distributed, we gain the benefits of
being able to store more data, but we also inherit the
problems of a distributed system.
Issues with RDBMS-Scalability
Scaling RDBMS: Master Slave

• All writes are written to the


master.
• All reads are performed against
the replicated slave databases.
• Critical reads may be incorrect as
writes may not have been
propagated down.
• Large data sets can pose problems
as master needs to duplicate data
to slaves.
Sharding
§ The concept of database sharding is key to scaling,
and it applies to both SQL and NoSQL databases.
§ We’re slicing up the database into multiple pieces
(shards).
§ Each shard has a unique index (shard key) that
corresponds to the type of data it stores.
Sharding
§ Database sharding comes at a cost, especially for SQL
databases.
§ Oracle, MySQL, PostgreSQL do not support automatic
sharding and engineers would have to manually write
logic to handle the sharding.
§ Often, because of the high cost of maintainability,
changing schemas (e.g. how the databases are
sharded) becomes challenging.
Consistency
§ Once you explore distributed systems with SQL databases
and ensure the availability of data, you’re bound to run
into the issues of consistency which are inevitable in
distributed systems.
§ In the master-slave architecture, it will take some time for
the data to be propagated from the master to its slaves.
§ Therefore, it exists at a window of time that the master
and its slaves can have different states. In scaling a SQL
database, we sacrifice consistency for eventual
consistency.
What is NoSQL
CAP Theorem
• SQL can not provide consistency and high availability together

• Consistency: Every read receives the most recent write or an error


• Consistency in CAP is different than that of ACID.
• Consistency in CAP means having the most up-to-date
information.
• Availability: Every request receives a (non-error) response,
without the guarantee that it contains the most recent write
• Partition tolerance: The system continues to operate despite an
arbitrary number of messages being dropped (or delayed) by the
network between nodes. Partition tolerance is must.


Need of NoSQL
Need of NoSQL
• Easy scalability
• Schemaless design
• Structure doesn’t need to be defined
NoSQL Types
Recap: Data Model : Implementation
Models
• Relational
• NoSql
• Key/ Value • Other than the tabular relations
• Graph • Big data and real-time web apps
• Document • Examples: MongoDB, HBase
• Column-family
• Array / Matrix
• Hierarchical
• Network

18
Key Value Pair
• A key-value database is a type of
nonrelational database that uses a simple
key-value method to store data.

• A key-value database stores data as a


collection of key-value pairs in which a key
serves as a unique identifier.

• These databases contain a simple string


(the key) that is always unique and an
arbitrary large data field (the value).

• Key-value databases are highly partitionable


and allow horizontal scaling at scales that
other types of databases cannot achieve.
Column Based
• Column-oriented databases store data in columns.
• It can store more data in a smaller amount of memory. And because the
initial data retrieval is done on a column-by-column basis, only the columns
that need to be used are retrieved.
• Scales efficiently, handles large amount of data
• They are built for speed because when data is stored by column, you can
skip non-relevant data and immediately read what you are looking for.
• This makes aggregation queries especially fast.
• However, columnar data is not ideal when you need to view multiple fields
from each row.
Column Based
Document Based
Graph Based
CAP Theorem
Advantages of NoSQL
Downsides of NoSQL?
• Eventually consistent
• Since data is partitioned, a workaround is to be found
between availability and consistency
• Doesn’t support
• Joins
• Groups
• ACID transactions
• Data retrieval is limited, although there are
workarounds
Where to use NoSQL

• NoSQL databases don’t require any predefined schema, allowing you to work
more freely with “unstructured data.” (such as texts, social media posts, photos,
videos, email)
• Relational databases are vertically scalable, but usually more expensive, whereas
the horizontal scaling nature of NoSQL databases is more cost-efficient.
Is NoSQL better than SQL?
• NoSQL tends to be a better option for modern applications that have more
complex, constantly changing data sets, requiring a flexible data model that
doesn’t need to be immediately defined.

• Most developers or organizations that prefer NoSQL databases, are attracted to the
agile features that allow them to go to market faster, make updates faster. Unlike
traditional, SQL based, relational databases, NoSQL databases can store and
process data in real-time.

• While SQL databases do still have some specific use cases, NoSQL databases have
many features that SQL databases are not capable of handling without tremendous
costs, and critical sacrifices of speed, agility, etc.
Assignment
1.Compare the role of NoSQL databases in analytics versus traditional SQL
databases.
2.Explain how NoSQL databases handle schema evolution in analytics projects.
3.How does a NoSQL database handle large-scale distributed data for warehousing?
4.Compare and contrast document-oriented NoSQL databases (e.g., MongoDB) with
columnar databases (e.g., Cassandra) for warehousing purposes.
5.Explain how NoSQL databases support horizontal scaling in data warehousing.
6.Explain how a NoSQL system like MongoDB can handle multidimensional data for
analytical queries?
7.How do column-family NoSQL databases like Cassandra handle aggregate
analytical functions?
8.How does NoSQL handle the storage and analysis of unstructured or semi-
structured data for data mining?
9.Describe how graph databases (e.g., Neo4j) are used in data mining tasks like social
network analysis. What are the advantages of using a NoSQL database for real-time
data mining?
10.How can NoSQL databases integrate with machine learning frameworks for
advanced data mining?

You might also like