0% found this document useful (0 votes)

20 views

Bda CHP 3

BDA CHP 3

Uploaded by

Isam Syed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views

Bda CHP 3

BDA CHP 3

Uploaded by

Isam Syed

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 75

THE ANJUMAN-I-ISLAM’S

M. H. SABOO SIDDIK COLLEGE OF

ENGINEERING

Department of Computer Science &

Engineering(AI & ML)
CSC702
BIG DATA ANALYTICS

Subject I/c: Prof Arshi Khan

Module 3: NoSQL
⚫ Introduction to NoSQL
⚫ NoSQL Business Drivers
⚫ NoSQL Data Architecture Patterns: Key-value
stores, Graph stores, Column family
(Bigtable)stores, Document stores, Variations of
NoSQL architectural patterns, NoSQL Case Study
⚫ NoSQL solution for big data,
⚫ Understanding the types of big data problems;
Analyzing big data with a shared-nothing
architecture; Choosing distribution models:
master-slave versus peer-to-peer;
⚫ NoSQL systems to handle big data problems.
peer-to-peer;
⚫ Four ways that NoSQL systems handle big data
problems
Introduction to NoSQL

NoSQL databases (aka "not only SQL") are

non-tabular databases and store data
differently than relational tables. NoSQL
databases come in a variety of types based
on their data model. The main types are
document, key-value, wide-column, and
graph. They provide flexible schemas and
scale easily with large amounts of data and
high user loads.
⚫ we are familiar with the concept of relational
databases that store data in rows, form
relationships between the tables, and query
the data using SQL. However, a new type of
database, NoSQL, started to rise in popularity
in the early 21st century.

⚫ NoSQL is short for “not-only SQL”, but is also

commonly called “non-relational” or “non-SQL”.
Any database technology that stores data
differently from relational databases can be
categorized as a NoSQL database.
⚫ It wasn’t until around the late 1960s that the first
implementation of a computerized database came
into existence.

⚫ Relational databases gained popularity in the 1970s

and have remained a staple in the database world
ever since.

⚫ However, as datasets became exponentially larger

and more complex, developers began to seek a
flexible and more scalable database solution. This
is where NoSQL came in.
Advantages of NoSQL
databases
⚫ Scalability: NoSQL can be an excellent choice for
massive datasets that need to be distributed across
multiple servers and locations.

⚫ Flexibility: Unlike a relational database, NoSQL

databases don’t require a schema. This means that
NoSQL can handle unstructured or
semi-structured data in diﬀerent formats.

⚫ Developer Experience: NoSQL requires less

organization and thus lets developers focus more
on using the data than on ﬁguring out how to store
it.
Drawbacks
⚫ Data Integrity: Relational databases are
typically ACID compliant, ensuring high data integrity.
NoSQL databases follow BASE principles (basic availability,
soft state, and eventual consistency) and can often sacrifice
integrity for increased data distribution and availability.
However, some NoSQL databases do offer ACID compliance.

⚫ Language Standardization: While some NoSQL databases do

use the Structured Query Language (SQL), typically, each
database uses its unique language to set up, manage, and
query data.
Types of NoSQL Databases

Key-Value
A key-value database
consists of individual
records organized via
key-value pairs. In this
model, keys and values can
Ideally, the data is also
be any type of data, ranging
simple, and we are looking
from numbers to complex
to prioritize fast queries
objects. However, keys
over fancy features.
must be unique. This means
this type of database is
best when data is
attributed to a unique key,
like an ID number.
⚫ For example, let’s say we wanted to store
shopping cart information for customers who
shop in an e-commerce store. Our key-value
database might look like this:
⚫ Amazon DynamoDB and Redis are popular
options for developers looking to work with
key-value databases.
Document

A document-based (also
called document-oriented)
database consists of data
stored in hierarchical
structures. Some supported Documents are considered
document formats include very flexible and can evolve
JSON, BSON, XML, and to fit an application’s needs.
YAML. The document-based They can even model
model is considered an relationships!
extension of the key-value
database and provides
querying capabilities not
solely based on unique keys.
⚫ For example, let’s say we wanted to store
product information for customers who shop in
our e-commerce store. A products document
might look like this:
⚫ MongoDB is a popular option for developers
looking to work with a document database.
Graph

A graph database stores

data using a graph structure. The advantage of the
In a graph structure, data is relationships built using a
stored in individual nodes graph database as opposed
(also called vertices) and to a relational database is
establishes relationships via that they are much simpler
edges (also called links or to set up, manage, and query.
lines).
⚫ For example, let’s say we wanted to build a recommendation
engine for our e-commerce store. We could establish
relationships between similar items our customers searched
for to create recommendations.
⚫ In the graph above, we can see that there are
four nodes: “Neo”, “Hiking”, “Cameras”, and
“Hiking Camera Backpack”. Because the user,
“Neo”, searched for “Hiking” and “Cameras”,
there are edges connecting all 3 nodes. More
edges are created after the search, linking a
new node, “Hiking Camera Backpack”.

⚫ Neo4j is a popular option for developers

looking to work with a graph database.
Column Oriented

A column-oriented NoSQL Column-oriented databases

database stores data similar aim to provide faster read
to a relational database. speeds by being able to
However, instead of storing quickly aggregate data for a
data as rows, it is stored as specific column.
columns.
⚫ For example, take a look at the following
e-commerce database of products:
⚫
⚫ If we wanted to analyze the total sales for all the
products, all we would need to do is aggregate data
from the sales column.

⚫ This is in contrast to a relational model that would

have to pull data from each row. We would also be
pulling adjacent data (like size information in the
above example) that isn’t relevant to our query.

⚫ Amazon’s Redshift is a popular option for

developers looking to work with a column-oriented
database.
NoSQL Business Drivers
⚫ Businesses have found value in rapidly
capturing and analyzing large amounts of
variable data, and making immediate
changes in their businesses based on the
information they receive.

⚫ Fig shows how the demands of volume,

velocity, variability, and agility play a key
role in the emergence of NoSQL solutions.
As each of these drivers applies pressure to
the single-processor relational model, its
foundation becomes less stable and in time
no longer meets the organization’s needs.
VOLUME
⚫ Without a doubt, the key factor pushing
organizations to look at alternatives to
their current RDBMSs is a need to query
Big D

⚫ Until around 2005, performance concerns

were resolved by purchasing faster
processors. In time, however, the ability to
increase processing speed was no longer an
option. As chip density increased heat
could no longer dissipate fast enough
without chip overheating. ata using clusters
of commodity processors.
⚫ This phenomenon, known as the
PowerWall, forced systems designers to
shift their focus from increasing speed on a
single chip to using more processors
working together.

⚫ The need to scale out (also known as

horizontal scaling), rather than scale up
(faster processors), moved organizations
from serial to parallel processing where
data problems are split into separate paths
and sent to separate processors to divide
and conquer the work.
Velocity
⚫ While Big Data problems are a
consideration for many organizations
moving away from RDBMS systems, the
ability of a single processor system to
rapidly read and write data is also key.

⚫ Many single processor RDBMS systems are

unable to keep up with the demands of
real-time inserts and online queries to the
database made by public-facing websites.
⚫ RDBMS systems frequently index many
columns of every new row, a process that
decreases system performance.

⚫ When single processors RDBMSs are used

as a back end to a web storefront, the
random bursts in web traﬃc slow down
response for everyone and tuning these
systems can be costly when both high read
and write throughput is desired.
Variability
⚫ Companies that want to capture and report on
exception data struggle when attempting to
use rigid database schema structures imposed
by RDBMS systems.

⚫ For example, if a business unit wants to

capture a few custom ﬁelds for a particular
customer, all customer rows within the
database need to store this information even
though it doesn't apply.

⚫ Adding new columns to an RDBMS requires

the system to be shut down and ALTER
TABLE commands to be run. When a
database is large, this process can impact
system availability, losing time and money in
the process.
Agility

⚫ The most complex part of building

applications using RDBMSs is the process of
putting data into and getting data out of the
database

⚫ If your data has nested and repeated

subgroups of data structures you need to
include an object-relational mapping layer.
The responsibility of this layer is to generate
the correct combination of INSERT, UPDATE,
DELETE and SELECT SQL statements to
move object data to and from the RDBMS
persistence layer.

⚫ This process is not simple and is associated

with the largest barrier to rapid change when
developing new or modifying existing
⚫ Generally, object-relational mapping
requires experienced software developers
who are familiar with object-relational
frameworks such as Java Hibernate (or
NHibernate for .Net systems).

⚫ Even with experienced staﬀ, small change

requests can cause slowdowns in
development and testing schedules.
Case Study: How a
bank turned
challenges into
opportunities to serve
its customers using
NoSQL Database
⚫ Financial services industries are at
crossroads and are experiencing
massive changes in response to
shifting customer demands. With the
increasing adoption of cloud
technologies, digital-only enterprises
are offering innovative solutions at
the lowest cost.
⚫ Customer experience is a strategic
imperative for most organizations
today, but delivering an engaging
experience across the growing
number of digital customer
touchpoints can be challenging,
especially if they have an aging
technology stack.
⚫ Additionally, organizations have to
navigate these transformational changes
while managing vast volumes of digital
transactions, a variety of data, and velocity
without straining their business systems,
experiencing data loss, breaches, and/or
downtime.

⚫ The below graphic shows the IT priorities

of ﬁnancial services institutions, and it is
no surprise that 25% of them want to
modernize their systems and equally the
Some of the bank's challenges:
⚫ Exceeding customer expectations:
India has more than 50% of
its population below age 25 and more than
65% below age 35. Banks customers are
increasingly comparing banking
experiences to other areas of their digital
lives. These digital natives aren't looking to
check their balances and deposit checks.
They are looking for more meaningful
online experiences
⚫ The bank was looking at a system that can
provide an engaging and personalized
digital customer experience in real-time
⚫ Ability to provide comprehensive
services: Provide 'Always-on' digital
services and delight customers by assisting
them through chatbot interactions.
Additionally, they want to experiment and
deliver new services such as enhanced
payment and block-chain technologies
valued by their customers.
⚫ Provide customer 360 experience: customers
want a consistent experience, regardless of the
business division they are interacting with or
the device they use in the process. Delivering
an engaging and personalized customer
experience with a single customer view and a
uniﬁed view of all interactions encompassing
each touchpoint with the bank is challenging.

⚫ Managing change without disruption: The

bank needed agility to launch new services
and make their development staﬀ more
productive. They want to minimize outages
with high availability built into the system.
⚫ Choosing the right data management
strategy
A comprehensive data management strategy sets the stage
for establishing a deeper understanding of customer
experience.

It can oﬀer a single view by collecting all the customer's

structured and unstructured data from across the
organization and other relevant external sources into one
place
A NoSQL database is an ideal choice.
It can store personal and demographic information and
customer interactions with the company, including calls,
chats, emails, texts, social media responses,
product/service activity history, past and present
purchases.
McKinsey's study suggests that data-driven companies
tend to be 19X profitable when they use data as a
differentiation, as they tend to acquire 23X more
customers and retain 6X more customers.
Why Oracle NoSQL Database
⚫ Support for flexible data model:
⚫ Bank can localize all data for a given entity
– such as a financial asset class or user class
– into a single document, rather than
spreading it across multiple relational
tables.
⚫ Customers can access entire documents in
a single database operation, rather than
joining separate tables spread across the
database.
⚫ As a result of this data localization,
application performance is often much
higher when using Oracle NoSQL
Database, which can be the decisive factor
⚫ Predictable scalability with always-on
availability
⚫ An Oracle NoSQL cluster can be
expanded horizontally online without
incurring any application downtime and
one hundred percent transparent to the
application. Oracle NoSQL Database
maintains multiple copies of data for high
availability purposes.
⚫ Scale-out architecture for business
continuity
⚫ Oracle NoSQL Database supports
active-active architecture with
multi-region tables. A multi-region
architecture is two or more independent,
geographically distributed Oracle NoSQL
Database clusters bridged by
bi-directional replication, ensuring the
customers always have fast access to
services and the latest data.
⚫ Simplify application development with
rich query and APIs
⚫ Oracle NoSQL provides a rich query
language and extensive secondary indexes
giving users fast and flexible access to data
with any query pattern. This can range
from simple key-value lookups to complex
search, traversals, and aggregations across
rich data structures, including embedded
sub-documents and arrays.
High-level architecture of the proposed solution
Applications Layer:
⚫ Critical components in the architecture
include:
This layer manages all user
input applications, e.g., loan or
credit card applications. The The application layer is
applications are based on responsible for doing all the
forms technology, allowing the "application plumbing":
developers to create adaptive interacting with the
and responsive documents to database, enforcing
capture information. The validation at event points,
forms have a notion of etc. It interacts with the
fragments that allows for bank's backend system
pulling out standard segments through the API gateway and
such as personal details like doesn't store any personal or
name and address, family sensitive information.
details, income details, etc.
Database Layer:

A CRM system is used Oracle NoSQL Database

primarily for lead generation has an out-of-box
to target customers. Also integration with
available in this layer is the Elasticsearch. Oracle
ELK stack (Elasticsearch, NoSQL Database also
Logstash, Kibana), which is feeds the user drop-off
primarily used to audit the log (incomplete form activity)
data stored in the NoSQL data to the orchestration
Database. framework primarily used
for retargeting the users
Marketing Layer
Additionally, it handles
This layer hosts various personalization (showing the
servers that drive the product or service a
business decision process. It customer would be
comprises servers and tools interested in buying) and
used for customer retargeting (persuading the
segmentation (identify groups potential customers to
of individuals who are similar reconsider bank's products
in attitudes, demographic and services after they left
profile, etc.) and customer or got dropped off from
journey analysis (a sum of all their app) based on the
customer experiences with drop-off campaign's data
the bank). that's coming out the
Oracle NoSQL Database.
Banking experience
re-imagined
⚫ A typical user's journey, e.g., loan
processing, starts with a user interacting
with banks loan processing applications via
– the web, mobile device, email, or even
branch. The application is served oﬀ the
forms in the application layer. At this stage,
the user ﬁlls in details and submits the
scanned supporting documents.

⚫ These scanned forms are classiﬁed, and

information is extracted, and the data is
sent to the NoSQL Database store. The data
is sent to the processing system that
triggers the underwriting process,
⚫ Depending on the underwriting process
results, an application will be approved,
denied, or sent back to the user for
additional information. If the application is
approved, the loan amount is deposited
into the user's account.

⚫ Suppose the user drops oﬀ at any point

while filling the form. In that case, this
drop-off information is stored in the
NoSQL Database and feeds into the
orchestration system to kick start the
retargeting campaign that allows the bank
⚫ The process is repeated with specific ads,
emails, or WhatsApp messages retargeting
the customers. In the event the customer
returns, they can start the journey where
they left off.

⚫ In conclusion, one of India's leading

private banks modernized and expedited
its digital presence and provided an
enhanced experience for its customers
using Oracle NoSQL Database.
NoSQL solution for big data
1. The queries should be moved to the data rather than
moving data to queries:

⚫ At the point, when an overall query is needed to be sent by a

customer to all hubs/nodes holding information, the more
proficient way is to send a query to every hub than moving a
huge set of data to a central processor.

⚫ The stated statement is a basic rule that assists to see how

NoSQL data sets have sensational execution benefits on
frameworks that were not developed for queries distribution
to hubs.

⚫ The entire data is kept inside hub/node in document form

which means just the query and result are needed to move
over the network, thus keeping big data’s queries quick.
2. Hash rings should be used for
even distribution of data:
⚫ To figure out a reliable approach to allocating a
report to a processing hub/node is perhaps the
most difficult issue with databases that are
distributed.

With a help of an arbitrarily produced

40-character key, the hash rings method helps in
even distribution of a large amount of data on
numerous servers and this is a decent approach to
uniform distribution of network load.
3. For scaling read requests,
replication should be used:
⚫ In real-time, replication is used by databases
for making data’s backup copies. Read requests
can be scaled horizontally with the help of
replication. The strategy of replication
functions admirably much of the time.
⚫ Distribution of queries to nodes
should be done by the database:
⚫ Separation of concerns of evaluation of query
from the execution of the query is important
for getting more increased performance from
queries traversing numerous hubs/nodes. The
query is moved to the data by the NoSQL
database instead of data moving to the query.
Understanding the types of big
data problems
⚫ Storage
⚫ With vast amounts of data generated daily, the
greatest challenge is storage (especially when the
data is in different formats) within legacy
systems. Unstructured data cannot be stored in
traditional databases.
⚫ Processing
⚫ Processing big data refers to the reading,
transforming, extraction, and formatting of useful
information from raw information. The input and
output of information in unified formats continue
to present difficulties.
⚫ Security
⚫ Security is a big concern for organizations.
Non-encrypted information is at risk of theft or
damage by cyber-criminals. Therefore, data security
professionals must balance access to data against
maintaining strict security protocols.

⚫ Finding and Fixing Data Quality

Issues
⚫ When dealing with the data, the utmost importance is
its accuracy. After all, every insight you glean from
data will depend on the data itself. It all begins during
the data collection phase. At this time, you want to be
sure that you’re collecting data from the right sources
at the right time if you’re going to apply the data for
outputs.
⚫ Long Response Times from
System
⚫ Clean and accurate data is just as important as
data being accessible when you need it. If
you’re using a data tool that’s slow, then by the
time your data is available for use, it could be
considered outdated and old.

⚫ Confusion with Big Data Tool

Selection
⚫ In order to overcome this challenge, it’s best
to take time performing research and not jump
too quickly into a specific tool.
⚫ Real Time Big Data Problems
⚫ data is constantly changing and evolving, which
thus impacts the insights you glean from it.

⚫ Technically, this requires a tool that can

provide up-to-date filtering and remove
redundant or irrelevant data from the picture
when you’re applying it.
⚫ Lack of Understanding
⚫ Companies can leverage data to boost performance
in many areas.

⚫ Some of the best use cases for data are to:

decrease expenses, create innovation, launch new
products, grow the bottom line, and increase
efficiency, to name a few.

⚫ Despite the benefits, companies have been slow to

adopt data technology or put a plan in place for
how to create a data-centric culture.
⚫ High Cost of Data Solutions
⚫ After understanding how your business will
benefit most from implementing data solutions,
you’re likely to find that buying and maintaining
the necessary components can be expensive.

⚫ Along with hardware like servers and storage

to software, there also comes the cost of
human resources and time.
⚫ Complex Systems for Managing
Data
⚫ Moving from a legacy data management system
and integrating a new solution comes as a
challenge in itself.

⚫ Furthermore, with data coming from multiple

sources, and IT teams creating their own data
while managing data, systems can become
complex quickly.
⚫ Sharing and Accessing Data:
⚫ Perhaps the most frequent challenge in big data
efforts is the inaccessibility of data sets from
external sources.
⚫ Sharing data can cause substantial challenges.
⚫ It include the need for inter and intra- institutional
legal documents.
⚫ Accessing data from public repositories leads to
multiple difficulties.
⚫ It is necessary for the data to be available in an
accurate, complete and timely manner
⚫ Big Data Skills
⚫ Running Big Data tools requires expertise that
is possessed by data scientists, data engineers,
and data analysts.

⚫ They have the skills to handle Big Data

challenges and come up with valuable insights
for the company they work in. The problem is
not the demand but the lack of such skills that,
in turn, becomes a challenge.
Analyzing big data with a
shared-nothing architecture
⚫ Parallel database systems have great
advantages for online transaction
processing and decision support
applications. Parallel processing divides a
large task into multiple tasks and each task
is performed concurrently on several
nodes. This gives a larger task to complete
more quickly.
⚫ Architectural Models
⚫ There are several architectural models for
parallel machines, which are given below −
Shared
Shared nothing
memory Hierarchi
Shared cal
disk
⚫ Shared nothing architecture − In this each
node has its own mass storage as well as
main memory. The processor at one node
may communicate with another processor
at another node by a high speed
interconnection network. The node
functions as the server for the data on the
disk or disks that the node owns as each
processor has its own copy of OS, DBMS
and data.
⚫ Examples − Teradata, Gamma, Bubba.
⚫ it requires careful partitioning of the data on
multiple disk nodes. Furthermore, the addition of
new nodes in the system presumably requires
reorganizing and repartitioning of the database to
deal with the load balancing issues.

⚫ Finally, fault-tolerance is more diﬃcult than with

shared-disk, seeing as a failed node will make its
data on disk unavailable, thus requiring data
replication. It is due to its scalability advantage that
shared-nothing has been ﬁrst adopted for OLAP
workloads, in particular data warehousing, as it is
easier to parallelize read-only queries.
Advantages

⚫ these architectures are more scalable and

easily support a large number of
processors.

⚫ It overcomes the disadvantages requiring

all I/O to go through a single
intercommunication network.

⚫ It provides linear speed-up and linear

scale-up that is time taken for operations
decreases in proportion to the increase in
the number of CPU’s and disks
Disadvantages

⚫ CPU to CPU communication is very slow.

⚫ The cost of communication and no-local
disk access are higher than shared memory
or shared disk architecture because
sending data involves software interaction
at both sides.
⚫ Shared nothing architecture is diﬃcult to
load balance
Choosing distribution models:
master-slave versus peer-to-peer
⚫ What Is a Distributed System?

⚫ distributed system consists of multiple components,

possibly across geographical boundaries, that
communicate and coordinate their actions through
message passing.
⚫ To an actor outside this system, it appears as if its a single
coherent system:
⚫ Decentralized systems are distributed
systems where no speciﬁc component
owns the decision making.

⚫ While every component owns their part of

the decision, none of them have complete
information. Hence, the outcome of any
decision depends upon some sort of
consensus between all components.
⚫ In parallel computing, we use multiple
processors on a single machine to perform
multiple tasks simultaneously, possibly with
shared memory. However, in distributed
computing, we use multiple autonomous
machines with no shared memory and
communicating with message passing.
Distributed System Architecture
Master-slave:
⚫ In this model, one node of the distributed
system plays the role of master. Here,
the master node has complete
information about the system and
controls the decision making. The rest of
the nodes act as salves and perform tasks
assigned to them by the master. Further,
for fault tolerance, the master node can
have redundant standbys.
Peer to peer
⚫ There is no single master designated
amongst the nodes in a distributed system
in this model. All the nodes equally share
the responsibility of the master.

⚫ Hence, we also know this as the

multi-master or the master-less model. At
the cost of increased complexity and
communication overhead, this model
provides better system resiliency.
⚫ While both these architectures have their
own pros and cons, it’s unnecessary to
choose only one. Many of the distributed
systems actually create an architecture
that combines elements of both models.
⚫ A peer-to-peer model can provide data
distribution, while a master-slave model
can provide data replication in the same
architecture.

DP900 ExamTopics Question 241 - 285
No ratings yet
DP900 ExamTopics Question 241 - 285
14 pages
Technology Glossary For Recruiters
100% (2)
Technology Glossary For Recruiters
20 pages
Mysql Solution Engineer
No ratings yet
Mysql Solution Engineer
39 pages
Azure Data Engineer - Samatha Gudala
100% (1)
Azure Data Engineer - Samatha Gudala
8 pages
MongoDB Slides Until ClassTest
No ratings yet
MongoDB Slides Until ClassTest
221 pages
Unit 6
No ratings yet
Unit 6
143 pages
DBMS Unit2
No ratings yet
DBMS Unit2
26 pages
Lecture 3.1.2
No ratings yet
Lecture 3.1.2
47 pages
Lec 15 Notes
No ratings yet
Lec 15 Notes
3 pages
Nosql Module 1
No ratings yet
Nosql Module 1
23 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
Database Advice Guide
No ratings yet
Database Advice Guide
19 pages
MODULE 1 -ppt -7B
No ratings yet
MODULE 1 -ppt -7B
70 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
NO SQL Unit 1
No ratings yet
NO SQL Unit 1
66 pages
Bda Unit-5 PDF
No ratings yet
Bda Unit-5 PDF
83 pages
2 BDA A6515 Hadoop
No ratings yet
2 BDA A6515 Hadoop
55 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
BIG DATA UNIT 3
No ratings yet
BIG DATA UNIT 3
374 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Unit 2
No ratings yet
Unit 2
65 pages
DSA 4-Introduction To NoSQL
No ratings yet
DSA 4-Introduction To NoSQL
59 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
No SQL Database Compiled
No ratings yet
No SQL Database Compiled
20 pages
nosql-technology (1)
No ratings yet
nosql-technology (1)
8 pages
NoSQL MongoDB HBase Cassandra
100% (1)
NoSQL MongoDB HBase Cassandra
142 pages
BD Unit 4
No ratings yet
BD Unit 4
45 pages
Unit 2
No ratings yet
Unit 2
23 pages
no sql.pptx
No ratings yet
no sql.pptx
12 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
BDA Unit-3
No ratings yet
BDA Unit-3
13 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
DBMS (UNIT-6) (Advances in Databases and Big Data)
No ratings yet
DBMS (UNIT-6) (Advances in Databases and Big Data)
103 pages
Features of Nosql: Non-Relational
No ratings yet
Features of Nosql: Non-Relational
7 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
DBMS Unit 5 Notes
No ratings yet
DBMS Unit 5 Notes
57 pages
Unit 3
No ratings yet
Unit 3
10 pages
Chapter14_BigData&NoSQLDatabases
No ratings yet
Chapter14_BigData&NoSQLDatabases
39 pages
NOSQL, Graph Databases & Cypher
No ratings yet
NOSQL, Graph Databases & Cypher
78 pages
Unit 2
No ratings yet
Unit 2
26 pages
Relational DB
No ratings yet
Relational DB
32 pages
Bda Chapter 3 This Is The Notes of Bda
No ratings yet
Bda Chapter 3 This Is The Notes of Bda
14 pages
AWS1-1
No ratings yet
AWS1-1
38 pages
NoSQL
No ratings yet
NoSQL
18 pages
10gen Top 5 NoSQL Considerations
No ratings yet
10gen Top 5 NoSQL Considerations
10 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Why Nosql - Ibm
No ratings yet
Why Nosql - Ibm
6 pages
BDT Unit 4
No ratings yet
BDT Unit 4
93 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
NOSQL
No ratings yet
NOSQL
25 pages
Unit 2 _ Big Data Analytics_CCS334
No ratings yet
Unit 2 _ Big Data Analytics_CCS334
36 pages
Unit II No-SQL Db Managment
No ratings yet
Unit II No-SQL Db Managment
33 pages
03 Unit Bda Hadoop,Map Reduce
No ratings yet
03 Unit Bda Hadoop,Map Reduce
80 pages
BDA - M 3 - NoSQL
No ratings yet
BDA - M 3 - NoSQL
81 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Introduction To Nosql: What Is A Nosql Database Used For?
No ratings yet
Introduction To Nosql: What Is A Nosql Database Used For?
6 pages
Unit 2 Bda Bda
No ratings yet
Unit 2 Bda Bda
29 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
NOSQL Database
No ratings yet
NOSQL Database
10 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
DBA's Guide to NoSQL
From Everand
DBA's Guide to NoSQL
The Enlightened DBA
5/5 (1)
DBMS MASTER: Become Pro in Database Management System
From Everand
DBMS MASTER: Become Pro in Database Management System
Ummed Singh
No ratings yet
Learn SQL in 24 Hours
From Everand
Learn SQL in 24 Hours
Alex Nordeen
5/5 (4)
Land Use Policy: Sciencedirect
No ratings yet
Land Use Policy: Sciencedirect
14 pages
Big Data S All Units
No ratings yet
Big Data S All Units
122 pages
All Pandas Json - Normalize
No ratings yet
All Pandas Json - Normalize
17 pages
BDA Presentations
No ratings yet
BDA Presentations
26 pages
Baze de Date
No ratings yet
Baze de Date
17 pages
FC0-U61 Updated Dumps - CompTIA IT Fundamentals+ Certification Exam
No ratings yet
FC0-U61 Updated Dumps - CompTIA IT Fundamentals+ Certification Exam
29 pages
Big Data Seminar Report Rahul Jain
No ratings yet
Big Data Seminar Report Rahul Jain
41 pages
BDT Viva Questions
No ratings yet
BDT Viva Questions
2 pages
NPN 1 Credit Course Learning Guide V1
No ratings yet
NPN 1 Credit Course Learning Guide V1
7 pages
NoSQL - Database Revolution
No ratings yet
NoSQL - Database Revolution
10 pages
Fortinet FortiSIEM Study Guide For FortiSIEM 6.3 (Fortinet Training Institute) (Z-Library)
No ratings yet
Fortinet FortiSIEM Study Guide For FortiSIEM 6.3 (Fortinet Training Institute) (Z-Library)
558 pages
COMP4801 Project Plan
No ratings yet
COMP4801 Project Plan
5 pages
(Davoudian Et Al., 2018) A Survey On NoSQL Stores
No ratings yet
(Davoudian Et Al., 2018) A Survey On NoSQL Stores
43 pages
ELK 1 5 - Mapping Index Data
No ratings yet
ELK 1 5 - Mapping Index Data
12 pages
Cloudflare-Investor-Day-2024 P116
No ratings yet
Cloudflare-Investor-Day-2024 P116
116 pages
NoSQL CIA EXAMS QUESTIONS WITH ANSWERS
No ratings yet
NoSQL CIA EXAMS QUESTIONS WITH ANSWERS
32 pages
Big Data Analytics Laboratory
No ratings yet
Big Data Analytics Laboratory
57 pages
Databases On AWS: Raul Hugo, Solutions Architect
No ratings yet
Databases On AWS: Raul Hugo, Solutions Architect
74 pages
21AI44
No ratings yet
21AI44
2 pages
Big Data
No ratings yet
Big Data
957 pages
Module 5 - Developing Flexible NoSQL Solutions
No ratings yet
Module 5 - Developing Flexible NoSQL Solutions
78 pages
TENOR2015 Proceedings
No ratings yet
TENOR2015 Proceedings
259 pages
C3.Ai A New Technology Stack
No ratings yet
C3.Ai A New Technology Stack
22 pages
Architecture and Implementation of A Scalable Sensor Data Dpem18lb7j
No ratings yet
Architecture and Implementation of A Scalable Sensor Data Dpem18lb7j
12 pages
Final 7th Sem Syllabus
No ratings yet
Final 7th Sem Syllabus
39 pages
MC4202 ADT QUESTION BANKKK (1) (1)
No ratings yet
MC4202 ADT QUESTION BANKKK (1) (1)
4 pages

Bda CHP 3

Uploaded by

Bda CHP 3

Uploaded by

THE ANJUMAN-I-ISLAM’S

M. H. SABOO SIDDIK COLLEGE OF

Department of Computer Science &

Subject I/c: Prof Arshi Khan

NoSQL databases (aka "not only SQL") are

⚫ NoSQL is short for “not-only SQL”, but is also

⚫ Relational databases gained popularity in the 1970s

⚫ However, as datasets became exponentially larger

⚫ Flexibility: Unlike a relational database, NoSQL

⚫ Developer Experience: NoSQL requires less

⚫ Language Standardization: While some NoSQL databases do

A graph database stores

⚫ Neo4j is a popular option for developers

A column-oriented NoSQL Column-oriented databases

⚫ This is in contrast to a relational model that would

⚫ Amazon’s Redshift is a popular option for

⚫ Fig shows how the demands of volume,

⚫ Until around 2005, performance concerns

⚫ The need to scale out (also known as

⚫ Many single processor RDBMS systems are

⚫ When single processors RDBMSs are used

⚫ For example, if a business unit wants to

⚫ Adding new columns to an RDBMS requires

⚫ The most complex part of building

⚫ If your data has nested and repeated

⚫ This process is not simple and is associated

⚫ Even with experienced staﬀ, small change

⚫ The below graphic shows the IT priorities

⚫ Managing change without disruption: The

It can oﬀer a single view by collecting all the customer's

A CRM system is used Oracle NoSQL Database

⚫ These scanned forms are classiﬁed, and

⚫ Suppose the user drops oﬀ at any point

⚫ In conclusion, one of India's leading

⚫ At the point, when an overall query is needed to be sent by a

⚫ The stated statement is a basic rule that assists to see how

⚫ The entire data is kept inside hub/node in document form

With a help of an arbitrarily produced

⚫ Finding and Fixing Data Quality

⚫ Confusion with Big Data Tool

⚫ Technically, this requires a tool that can

⚫ Some of the best use cases for data are to:

⚫ Despite the benefits, companies have been slow to

⚫ Along with hardware like servers and storage

⚫ Furthermore, with data coming from multiple

⚫ They have the skills to handle Big Data

⚫ Finally, fault-tolerance is more diﬃcult than with

⚫ these architectures are more scalable and

⚫ It overcomes the disadvantages requiring

⚫ It provides linear speed-up and linear

⚫ CPU to CPU communication is very slow.

⚫ distributed system consists of multiple components,

⚫ While every component owns their part of

⚫ Hence, we also know this as the

You might also like