0% found this document useful (0 votes)

5 views

BDA (18CS72) Module-III

The document discusses the VTU Connect app, which provides students with instant updates, notes, question papers, and a community platform. It also covers key concepts in Big Data Analytics, focusing on distributed computing, NoSQL data stores, and the CAP theorem. The document highlights the advantages and limitations of NoSQL systems, including flexibility, scalability, and eventual consistency.

Uploaded by

gn21cs048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

5 views

BDA (18CS72) Module-III

Uploaded by

gn21cs048

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 14

Best VTU Student Companion App You Can Get

DOWNLOAD NOW AND GET

Instant VTU Updates, Notes, Question Papers,
Previous Sem Results (CBCS), Class Rank, University Rank,
Time Table, Students Community, Chat Room and Many
More

CLICK BELOW TO DOWNLOAD VTU CONNECT APP

IF YOU DON’T HAVE IT

* Visit https://vtuconnect.in for more info. For any queries or questions wrt our
platform contact us at: [email protected]
Download & Share VTU Connect App Now From Google Play Store
1 Big Data Analytics (18CS72)

Module -3
NoSQL
3.1 Introduction
Big Data uses distributed systems. A distributed system consists of multiple data nodes at
clusters of machines and distributed software components. The tasks execute in parallel with
data at nodes in clusters. The computing nodes communicate with the applications through a
network.

Following are the features of distributed-computing architecture (Chapter

l. Increased reliability and fault tolerance: The important advantage of distributed computing
system is reliability. If a segment of machines in a cluster fails then the rest of the machines
continue work. When the datasets replicate at number of data nodes, the fault tolerance increases
further. The dataset in remaining segments continue the same computations as being done at
failed segment machines.

2. Flexibility makes it very easy to install, implement and debug new services in a distributed
environment.

3. Sharding is storing the different parts of data onto different sets of data nodes, clusters or
servers. For example, university students huge database, on sharding divides in databases, called
shards. Each shard may correspond to a database for an individual course and year. Each shard
stores at different nodes or servers.

4. Speed: Computing power increases in a distributed computing system as shards run parallelly
on individual data nodes in clusters independently (no data sharing between shards).

5. Scalability: Consider sharding of a large database into a number of shards, distributed for
computing in different systems. When the database expands further, then adding more machines

and increasing the number of shards provides horizontal scalability. Increased computing power
and running number of algorithms on the same machines provides vertical scalability.Resources
sharing: Shared resources of memory, machines and network architecture reduce the cost.

Open system makes the service accessible to all nodes.

6. Performance: The collection of processors in the system provides higher performance than

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 1

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
2 Big Data Analytics (18CS72)

a centralized computer, due to lesser cost of communication among machines (Cost means time
taken up in communication).

3.2 NOSQL DATA STORE

SQL is a programming language based on relational algebra. It is a declarative language and it

defines the data schema . SQL creates databases and RDBMS s. RDBMS uses tabular data store
with relational algebra, precisely defined operators with relations as the operands. Relations are
a set of tuples. Tuples are named attributes. A tuple identifies uniquely by keys called candidate
keys.

ACID Properties in SQL Transactions

Atomicity of transaction means all operations in the transaction must complete, and if
interrupted, then must be undone (rolled back). For example, if a customer withdraws an amount
then the bank in first operation enters the withdrawn amount in the table and in the next operation
modifies the balance with new amount available. Atomicity means both should be completed,
else undone if interrupted in between.

Consistency in transactions means that a transaction must maintain the integrity constraint, and
follow the consistency principle. For example, the difference of sum of deposited amounts and
withdrawn amounts in a bank account must equal the last balance. All three data need to be
consistent.

Isolation of transactions means two transactions of the database must be isolated from each
other and done separately.

Durability means a transaction must persist once completed

NOSQL

A new category of data stores is NoSQL (means Not Only SQL) data stores. NoSQL is an
altogether new approach of thinking about databases, such as schema flexibility, simple
relationships, dynamic schemas, auto sharding, replication, integrated caching, horizontal
scalability of shards, distributable tuples, semi-structures data and flexibility in approach.

Issues with NoSQL data stores are lack of standardization in approaches, processing difficulties
for complex queries, dependence on eventually consistent results in place of consistency in all
states.

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 2

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
3 Big Data Analytics (18CS72)

Big Data NoSQL

NoSQL records are in non-relational data store systems. They use flexible data models. The
records use multiple schemas. NoSQL data stores are considered as semi-structured data. Big
Data Store uses NoSQL.

NoSQL data store characteristics are as follows:

1. NoSQL is a class of non-relational data storage system with flexible data model.
Examples of NoSQL data-architecture patterns of datasets are key-value pairs,
name/value pairs, Column family,Big-data store, Tabular data store, Cassandra (used in
Facebook/Apache), HBase, hash table [Dynamo (Amazon S3)], unordered keys using
]SON (CouchDB), ]SON (PNUTS), ]SON (MongoDB), Graph Store, Object Store,
ordered keys and semi-structured data storage systems.

2. NoSQL not necessarily has a fixed schema, such as table; do not use the concept of Joins
(in distributed data storage systems); Data written at one node can be replicated to
multiple nodes. Data store is thus fault- tolerant. The store can be partitioned into
unshared shards.

Features in NoSQL Transactions NoSQL transactions have following features:

1. Relax one or more of the ACID properties.

2. Characterize by two out of three properties (consistency, availability and partitions) of

CAP theorem, two are at least present for the application/ service/process.

3. Can be characterized by BASE properties

Big Data NoSQL Solutions NoSQL DBs are needed for Big Data solutions. They play an
important role in handling Big Data challenges. Table 3.1 gives the examples of widely used
NoSQL data stores.

Table 3.1 NoSQL data stores and their characteristic features

HDFS compatible, open-source and non-relational data store written inJava;

Apache's A column-family based NoSQL data store, data store providing BigTable-like
HBase capabilities (Sections 2.6 and 3.3.3.2); scalability, strong consistency,
versioning, configuring and maintaining data store characteristics

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 3

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
4 Big Data Analytics (18CS72)

HDFS compatible; master-slave distribution model (Section 3.5.1.3);

Apache's document-oriented data store withJSON-like documents and dynamic
MongoDB schemas; open-source, NoSQL, scalable and non-relational database; used by
Websites Craigslist, eBay, Foursquare at the backend

HDFS compatible DBs; decentralized distribution peer-to-peer model

Apache's (Section 3.5.1.4); open source; NoSQL; scalable, non-relational, column-
Cassandra family based, fault-tolerant and tuneable consistency (Section 3.7) used by
Facebook and Instagram

A project of Apache which is also widely used database for the web.
Apache's CouchDB consists of Document Store. It uses theJSON data exchange format
CouchDB to store its documents,JavaScript for indexing, combining and transforming
documents, and HTTP APis

Oracle Step towards NoSQL data store; distributed key-value data store; provides
NoSQL transactional semantics for data manipulation , horizontal scalability, simple
administration and monitoring

An open-source key-value store; high availability (using replication

Riak concept), fault tolerance, operational simplicity, scalability and written in
Erlang

CAP Theorem Among C, A and P, two are at least present for the
application/service/process. Consistency means all copies have the same value like in
traditional DBs. Availability means at least one copy is available in case a partition
becomes inactive or fails. For example, in web applications, the other copy in the
other partition is available. Partition means parts which are active but may not
cooperate (share) as in distributed DBs.

1. Consistency in distributed databases means that all nodes observe the same data at the
same time. Therefore, the operations in one partition of the database should
reflect in other related partitions in case of distributed database. Operations,
which change the sales data from a specific showroom in a table should also
reflect in changes in related tables which are using that sales data.
2. Availability means that during the transactions, the field values must be available
in other partitions of the database so that each request receives a response on
success as well as failure. (Failure causes the response to request from the replicate
of data). Distributed databases require transparency between one another. Network

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 4

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
5 Big Data Analytics (18CS72)

failure may lead to data unavailability in a certain partition in case of no replication.

Replication ensures availability.

3. Partition means division of a large database into different databases without

affecting the operations on them by adopting specified procedures.
4. Partition tolerance: Refers to continuation of operations as a whole even in case of
message loss, node failure or node not reachable.

Brewer's CAP (c.onsistency, Availability and fartition Tolerance) theorem

demonstrates that any distributed system cannot guarantee C, A and P together.

1. Consistency- All nodes observe the same data at the same time.

2. Availability- Each request receives a response on success/failure.

3. Partition Tolerance-The system continues to operate as a whole even in case of

message loss, node failure or node not reachable.

Partition tolerance cannot be overlooked for achieving reliability in a distributed

database system. Thus, in case of any network failure, a choice canbe:

• Database must answer, and that answer would be old or wrong data (AP).

• Database should not answer, unless it receives the latest copy of the data(CP).

The CAP theorem implies that for a network partition system, the choice of consistency
and availability are mutually exclusive. CA means consistency andavailability, AP means
availability and partition tolerance and CP means consistency and partition tolerance.
Figure 3.1 shows the CAP theorem usage in Big Data Solutions.

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 5

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
6 Big Data Analytics (18CS72)

Schema Less Database

Schema of a database system refers to designing of a structure for datasets and data structures
for storing into the database. NoSQL data not necessarily have a fixed table schema. The
systems do not use the concept of Join (between distributed datasets). A cluster-based highly
distributed node manages a single large data store with a NoSQL DB. Data written at one
node replicates to multiple nodes. Therefore, these are identical, fault-tolerant and partitioned
into shards. Distributed databases can store and process a set of information on more than one
computing nodes.

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 6

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
7 Big Data Analytics (18CS72)

Increasing Flexibility for Data Manipulation

NoSQL data store possess characteristic of increasing flexibility for data manipulation.
The new attributes to database can be increasingly added. Late binding of them is also
permitted.

BASE Properties BA stands for basic availability, S stands for soft state and E stands
for eventual consistency.

l. Basic availability ensures by distribution of shards (many partitions of huge data store)
across many data nodes with a high degree of replication. Then, a segment failure does not
necessarily mean a complete data store unavailability.

2. Soft state ensures processing even in the presence of inconsistencies but achieving
consistency eventually. A program suitably takes into account the inconsistency found
during processing. NoSQL database design does not consider the need of consistency all
along the processing time.

3. Eventual consistency means consistency requirement in NoSQL databases meeting

at some point of time in future. Data converges eventually to a consistent state with no time-
frame specification for achieving that. ACID rules require consistency all along the
processing on completion of each transaction. BASE does not have that requirement and has
the flexibility.
SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 7

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
8 Big Data Analytics (18CS72)

3.3 NOSQL DATA ARCHITECTURE PATTERNS

3.3.1 Key-Value Store

The simplest way to implement a schema-less data store is to use key-value pairs.
The data store characteristics are high performance, scalability and flexibility. Data retrieval
is fast in key-value pairs data store. A simple string called, key maps to a large data string
or BLOB (Basic Large Object). Key-value store accesses use a primary key for accessing the
values. Therefore, the store can be easily scaled up for very large data. The concept is similar
to a hash table where a unique key points to a particular item(s) of data. Figure 3.4 shows key-
value pairs architectural pattern and example of students' database as key-value pairs

Advantages of a key-value store are as follows:

1. Data Store can store any data type in a value field. The key-value system
stores the information as a BLOB of data (such as text, hypertext, images,video
and audio) and return the same BLOB when the data is retrieved. Storage is like
an English dictionary. Query for a word retrieves the meanings, usages, different
forms as a single item in the dictionary. Similarly, querying for key retrieves the
values.

2. A query just requests the values and returns the values as a single item. Values can

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 8

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
9 Big Data Analytics (18CS72)

be of any data type.

3. Key-value store is eventually consistent.

4. Key-value data store may be hierarchical or may be ordered key-value store.

5. Returned values on queries can be used to convert into lists, table- columns, data-
frame fields and columns.

6. Have (i) scalability, (ii) reliability, (iii) portability and (iv) low operationalcost.

7. The key can be synthetic or auto-generated. The key is flexible and can be represented
in many formats: (i) Artificially generated strings created from a hash of a value, (ii)
Logical path names to images or files, (iii) RESTweb-service calls (request response
cycles), and (iv) SQL queries.

Limitations of key-value store architectural pattern are:

1. No indexes are maintained on values, thus a subset of values is not searchable.
2. Key-value store does not provide traditional database capabilities, such as atomicity of
transactions, or consistency when multiple transactions are executed simultaneously.
The application needs to implement such capabilities.
3. Maintaining unique values as keys may become more difficult when the volume of data
increases. One cannot retrieve a single result when a key- value pair is not uniquely
identified.
4. Queries cannot be performed on individual values. No clause like 'where' in a relational
database usable that filters a result set.

Table 3.2 Traditional relational data model vs. the key-value store model

Traditional relational model Key-value store model

Result set based on row values Queries return a single item

Values of rows for large datasets are indexed No indexes on values

Same data type values in columns Any data type values

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 9

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
10 Big Data Analytics (18CS72)

Typical uses of key-value store are:

(i) Image store,

(ii) Document or file store,

(iii) Lookup table, and

(iv) Query-cache.
Riak is open-source Erlang language data store. It is a key-value data store system. Data auto-
distributes and replicates in Riak. It is thus, fault tolerant and reliable. Some other widely used
key-value pairs in NoSQL DBs are Amazon's DynamoDB, Redis (often referred as Data Structure
server), Memcached and its flavours, Berkeley DB, upscaledb (used for embedded databases),
project Voldemort and Couchbase.

Document Store
Characteristics of Document Data Store are high performance and flexibility. Scalability
varies, depends on stored contents. Complexity is low compared to tabular, object and graph
data stores.

Following are the features in Document Store:

1. Document stores unstructured data.

2. Storage has similarity with object store.

3. Data stores in nested hierarchies. For example, inJSON formats data model[Example
3.3(ii)], XML document object model (DOM), or machine-readable data as one BLOB.
Hierarchical information stores in a single unit called document tree. Logical data stores
together in a unit.

4. Querying is easy. For example, using section number, sub-section number and figure
caption and table headings to retrieve document partitions.

5. No object relational mapping enables easy search by following paths fromthe root of
document tree.

6. Transactions on the document store exhibit ACID properties.

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 10

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
11 Big Data Analytics (18CS72)

Typical uses of a document store are: (i) office documents, (ii) inventory store,

(iii) forms data, (iv) document exchange and (v) document search.

Examples of Document Data Stores are CouchDB and MongoDB.

CSV and JSON File Formats CSV data store is a format for records CSV does not represent
object-oriented databases or hierarchical data records. ]SON and XML represent semistructured
data, object- oriented records and hierarchical data records. ]SON (Java Script Object Notation)
refers to a language format for semistructured data. ]SON represents object-oriented and
hierarchical data records, object, and resource arrays in JavaScript.

JSON Files
 Semi-structured data
 object-oriented records and hierarchical data records
 JSON refers to a language format for semistructured data. JSON represents object-oriented and
hierarchical data records, object, and resource arrays in JavaScript

Document JSON Format CouchDB Database Apache CouchDB is an open- source

database. Its features are:
 CouchDB provides mapping functions during querying, combining and filtering of
information.
 CouchDB deploys JSON Data Store model for documents. Each document maintains separate
data and metadata (schema).
 CouchDB is a multi-master application. Write does not require field locking when controlling
the concurrency during multi-master application.
SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 11

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
12 Big Data Analytics (18CS72)

 CouchDB querying language is JavaScript. Java script is a language which

XML

 An extensible, simple and scalable language. Its self-describing format describes structure and
contents in an easy to understand format
 XML is widely used. The document model consists of root element and their sub-elements.
XML document model has a hierarchical structure. XML document model has features of
object-oriented records. XML format finds wide uses in data store and
 XML document model has a hierarchical structure. XML document model has features of
object-oriented records. XML format finds wide uses in data store

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 12

Download & Share VTU Connect App Now From Google Play Store
Download & Share VTU Connect App Now From Google Play Store
13 Big Data Analytics (18CS72)

Tabular data stores use rows and columns. Row-head field may be used as a keywhich
access and retrieves multiple values from the successive columns in that row. The OLTP is
fast on in-memory row-format data.

Columnar Data Store A way to implement a schema is the divisions into columns.
Storage of each column, successive values is at the successive memory addresses.
Analytics processing (AP) In-memory uses columnar storage in memory. A pair of row-
head and column-head is a key-pair. The pair accesses a field in the table.

Column-Family Data Store Column-family data-store has a group of columns as a

column family. A combination of row-head, column-family head and table- column
head can also be a key to access a field in a column of the table during querying.
Combination of row head, column families head, column-family head and column head
for values in column fields can also be a key to access fields ofa column. A column-
family head is also called a super-column head.

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 13

Download & Share VTU Connect App Now From Google Play Store

AWS Certified Solutions Architect - Professional
From Everand
AWS Certified Solutions Architect - Professional
VB Dev
No ratings yet
UUID or GUID As Primary Keys Be Careful
100% (1)
UUID or GUID As Primary Keys Be Careful
33 pages
Digital Transformation in The OSS-BSS Space
75% (4)
Digital Transformation in The OSS-BSS Space
26 pages
Module-2
No ratings yet
Module-2
100 pages
Introduction to NoSQL
No ratings yet
Introduction to NoSQL
13 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
unit 4 BDA
No ratings yet
unit 4 BDA
22 pages
BDA Module-3
No ratings yet
BDA Module-3
7 pages
RK NoSQL
No ratings yet
RK NoSQL
35 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
NoSQL (1)
No ratings yet
NoSQL (1)
12 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
bda module 3
No ratings yet
bda module 3
35 pages
Bda - 4 Unit
No ratings yet
Bda - 4 Unit
10 pages
NoSQL_Notes
No ratings yet
NoSQL_Notes
11 pages
No SQL Lecture Notes
No ratings yet
No SQL Lecture Notes
17 pages
Bda Module 3
No ratings yet
Bda Module 3
24 pages
Nosql
No ratings yet
Nosql
20 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Introduction To NoSQL
No ratings yet
Introduction To NoSQL
29 pages
BDA MODULE 3
No ratings yet
BDA MODULE 3
20 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Big Data Analysis
No ratings yet
Big Data Analysis
9 pages
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
No ratings yet
Unit 4: Big Data Tehnology Landscape Two Inportant Technologies
42 pages
No SQL
No ratings yet
No SQL
19 pages
CS3492-DBMS unit-5
No ratings yet
CS3492-DBMS unit-5
9 pages
Module 5_NoSQL databases
No ratings yet
Module 5_NoSQL databases
33 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
No ratings yet
Massively Parallel Cloud Data Storage Systems: S. Sudarshan IIT Bombay
17 pages
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
No ratings yet
Explain The Term Nosql'. Describe Vertical and Horizontal Scaling
13 pages
NoSQL Database
No ratings yet
NoSQL Database
8 pages
Iccmc51019 2021 9418441
No ratings yet
Iccmc51019 2021 9418441
5 pages
Module 1
No ratings yet
Module 1
34 pages
BDA CW Chapter 3
No ratings yet
BDA CW Chapter 3
9 pages
Nosql Tricks
No ratings yet
Nosql Tricks
34 pages
Big Data Analytics Module-3
No ratings yet
Big Data Analytics Module-3
160 pages
Dbms Presentation
No ratings yet
Dbms Presentation
22 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Bda Unit-2
No ratings yet
Bda Unit-2
29 pages
Mongo Nosql
No ratings yet
Mongo Nosql
12 pages
Unit 4
No ratings yet
Unit 4
7 pages
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
No ratings yet
CIS - 468 - 04 - NOSQL Databases and Big Data Storage Systems
102 pages
Database And Computer Management: SERIES 1, #3
From Everand
Database And Computer Management: SERIES 1, #3
Elias Mutegi
No ratings yet
41 NoSQL Introduction.pptx
No ratings yet
41 NoSQL Introduction.pptx
18 pages
NoSQL Database
No ratings yet
NoSQL Database
64 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
Nosql Databases
No ratings yet
Nosql Databases
2 pages
P.prabu (29x61c) CCS334 BDA - Unit 2
No ratings yet
P.prabu (29x61c) CCS334 BDA - Unit 2
29 pages
No SQL
No ratings yet
No SQL
109 pages
NoSQL Big Data Management
No ratings yet
NoSQL Big Data Management
36 pages
2 Big Data Analytics-Hadoop R21 A7902 ABP
No ratings yet
2 Big Data Analytics-Hadoop R21 A7902 ABP
16 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
Chapter24 Nosql Dbs
No ratings yet
Chapter24 Nosql Dbs
35 pages
Introduction To Nosql: Gabriele Pozzani
No ratings yet
Introduction To Nosql: Gabriele Pozzani
49 pages
Unit 2
No ratings yet
Unit 2
26 pages
Chapter14_BigData&NoSQLDatabases
No ratings yet
Chapter14_BigData&NoSQLDatabases
39 pages
NoSQL Intro
No ratings yet
NoSQL Intro
26 pages
Unit 6
No ratings yet
Unit 6
143 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Big Data Storage and Processing
No ratings yet
Big Data Storage and Processing
49 pages
A Thorough Introduction To Distributed Systems
No ratings yet
A Thorough Introduction To Distributed Systems
31 pages
Mastering Vrealize Operations Manager - Sample Chapter
No ratings yet
Mastering Vrealize Operations Manager - Sample Chapter
27 pages
Hindu. internshipFINAL1
No ratings yet
Hindu. internshipFINAL1
30 pages
Unit 2 _ Big Data Analytics_CCS334
No ratings yet
Unit 2 _ Big Data Analytics_CCS334
36 pages
A Hybrid Data Model To Share Medical Images
No ratings yet
A Hybrid Data Model To Share Medical Images
6 pages
06 Cloud Database Solution Design
No ratings yet
06 Cloud Database Solution Design
131 pages
Mongodb Schema Validation
No ratings yet
Mongodb Schema Validation
8 pages
BD_Unit4_Summary_efde2208-1937-44c2-9c1d-e0d171eb6120
No ratings yet
BD_Unit4_Summary_efde2208-1937-44c2-9c1d-e0d171eb6120
6 pages
Mongodblabmanual1 240305075254 f531f8f5
No ratings yet
Mongodblabmanual1 240305075254 f531f8f5
73 pages
System Design Interview Prep
No ratings yet
System Design Interview Prep
11 pages
Chatgpt
No ratings yet
Chatgpt
7 pages
PingCAP Ebook Modern Distributed Database Fundamentals
No ratings yet
PingCAP Ebook Modern Distributed Database Fundamentals
42 pages
Big Data Unit 5
No ratings yet
Big Data Unit 5
16 pages
Cassandra High Availability Sample Chapter
No ratings yet
Cassandra High Availability Sample Chapter
16 pages
Enterprise Caching Strategies For Caching at Scale
No ratings yet
Enterprise Caching Strategies For Caching at Scale
30 pages
(Ebook) MongoDB in Action by Kyle Banker ISBN 1935182870 pdf download
100% (1)
(Ebook) MongoDB in Action by Kyle Banker ISBN 1935182870 pdf download
46 pages
9-MongoDB Limitations
No ratings yet
9-MongoDB Limitations
6 pages
Nosql PDF
No ratings yet
Nosql PDF
21 pages
Mongo DB
No ratings yet
Mongo DB
26 pages
AWS Big Data Specialty
100% (1)
AWS Big Data Specialty
211 pages
9 HBase
No ratings yet
9 HBase
77 pages
Sharding in MongoDB
No ratings yet
Sharding in MongoDB
3 pages
Mongodb QRC Booklet
No ratings yet
Mongodb QRC Booklet
47 pages
Grok System Design Interview
100% (4)
Grok System Design Interview
163 pages
Report On SaaSGrid
No ratings yet
Report On SaaSGrid
15 pages
Mongodb 2.4 Manual
No ratings yet
Mongodb 2.4 Manual
1,226 pages
Messari Report Eth2 The Next Evolution of Cryptoeconomy
No ratings yet
Messari Report Eth2 The Next Evolution of Cryptoeconomy
70 pages
Introduction To Oracle Sharding
100% (1)
Introduction To Oracle Sharding
13 pages

BDA (18CS72) Module-III

Uploaded by

BDA (18CS72) Module-III

Uploaded by

Best VTU Student Companion App You Can Get

DOWNLOAD NOW AND GET

CLICK BELOW TO DOWNLOAD VTU CONNECT APP

Following are the features of distributed-computing architecture (Chapter

Open system makes the service accessible to all nodes.

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 1

3.2 NOSQL DATA STORE

SQL is a programming language based on relational algebra. It is a declarative language and it

ACID Properties in SQL Transactions

Durability means a transaction must persist once completed

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 2

Big Data NoSQL

NoSQL data store characteristics are as follows:

Features in NoSQL Transactions NoSQL transactions have following features:

1. Relax one or more of the ACID properties.

2. Characterize by two out of three properties (consistency, availability and partitions) of

3. Can be characterized by BASE properties

Table 3.1 NoSQL data stores and their characteristic features

HDFS compatible, open-source and non-relational data store written inJava;

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 3

HDFS compatible; master-slave distribution model (Section 3.5.1.3);

HDFS compatible DBs; decentralized distribution peer-to-peer model

An open-source key-value store; high availability (using replication

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 4

failure may lead to data unavailability in a certain partition in case of no replication.

3. Partition means division of a large database into different databases without

Brewer's CAP (c.onsistency, Availability and fartition Tolerance) theorem

2. Availability- Each request receives a response on success/failure.

3. Partition Tolerance-The system continues to operate as a whole even in case of

Partition tolerance cannot be overlooked for achieving reliability in a distributed

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 5

Schema Less Database

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 6

Increasing Flexibility for Data Manipulation

3. Eventual consistency means consistency requirement in NoSQL databases meeting

3.3 NOSQL DATA ARCHITECTURE PATTERNS

3.3.1 Key-Value Store

Advantages of a key-value store are as follows:

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 8

be of any data type.

3. Key-value store is eventually consistent.

4. Key-value data store may be hierarchical or may be ordered key-value store.

Limitations of key-value store architectural pattern are:

Traditional relational model Key-value store model

Result set based on row values Queries return a single item

Values of rows for large datasets are indexed No indexes on values

Same data type values in columns Any data type values

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 9

Typical uses of key-value store are:

(i) Image store,

(ii) Document or file store,

(iii) Lookup table, and

Following are the features in Document Store:

1. Document stores unstructured data.

2. Storage has similarity with object store.

6. Transactions on the document store exhibit ACID properties.

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 10

Examples of Document Data Stores are CouchDB and MongoDB.

Document JSON Format CouchDB Database Apache CouchDB is an open- source

 CouchDB querying language is JavaScript. Java script is a language which

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 12

Column-Family Data Store Column-family data-store has a group of columns as a

SUNIL G L, A.P, DEPT. OF CSE, SVIT , BENGALURU 13

You might also like