0% found this document useful (0 votes)
21 views18 pages

Chapter 5c

BIG DATA ANALYSIS POWER POINT SLIDE CHAPTER 5 c

Uploaded by

Shams AlHadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views18 pages

Chapter 5c

BIG DATA ANALYSIS POWER POINT SLIDE CHAPTER 5 c

Uploaded by

Shams AlHadi
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 18

Big Data Storage

Click to edit Master title style


Concepts
Chapter 5
Part c

1
Contents
Click to edit Master title style

• BASE
• No SQL

2 2
BASE
Click to edit Master title style
• BASE is a database design principle based on the CAP theorem and
leveraged by database systems that use distributed technology.

• BASE stands for:

• basically available

• soft state

• eventual consistency

• When database support BASE. It favor availability over consistency

• A + P from a CAP perspective.

3 3
BASE….
Click to edit Master title style
• Database will always acknowledge a client’s request, either in the form of
the requested data or a success/failure notification.

• The database is basically available, even though it has been partitioned as


a result of a network failure.

4 4
Soft State
Click to edit Master title style

• Soft state means that a database may be in an inconsistent state


when data is read; thus, the results may change if the same data
is requested again.

• This property is closely related to eventual consistency.

1. User A updates a record on Peer A.

2. Before the other peers are updated, User B requests the same
record from Peer C.

3. The database is now in a soft state, and stale data is returned to


User B. 5 5
Soft to
Click State
edit Master title style

6 6
Click to edit
Eventual Master title style
consistency
• Eventual consistency is the state in which reads by
different clients, immediately following a write to the
database, may not return consistent results.
• The database only attains consistency once the changes
have been propagated to all nodes.
1.User A updates a record.
2.The record only gets updated at Peer A, but before the
other peers can be updated, User B requests the same
record.
3.The database is now in a soft state. Stale data is
returned to User B from Peer C.
4.However, the consistency is eventually attained, and
User C gets the correct value.
7 7
Click to edit Master title style
No SQL?

• NoSQL stands for:


• No Relational
• No RDBMS
• Not Only SQL
• NoSQL is an umbrella term for all databases and data stores
that don’t follow the RDBMS principles
• A class of products
• A collection of several (related) concepts about data
storage and manipulation
• Often related to large data sets

8 8
NoSQL
Click Definition
to edit Master title style

From www.nosql-database.org:

Next Generation Databases mostly addressing


some of the points: being non-relational,
distributed, open-source and horizontal scalable.
The original intention has been modern web-scale
databases. The movement began early 2009 and
is growing rapidly. Often more characteristics
apply as: schema-free, easy replication support,
simple API, eventually consistent / BASE (not
ACID), a huge data amount, and more.

9 9
NoSQL
Click and
to edit Bigtitle
Master Datastyle

• NoSQL comes from Internet, thus it is often related to


the “big data” concept
How much big are “big data”?
Over few terabytes enough to start spanning multiple
storage units
Challenges
• Efficiently storing and accessing large amounts of
data is difficult, even more considering fault tolerance
and backups
• Manipulating large data sets involves running
immensely parallel processes
• Managing continuously evolving schema and
metadata for semi-structured and un-structured data 10
is difficult 10
Howtodid
Click edit we get
Master here?
title style

• Explosion of social media sites (Facebook, Twitter) with


large data needs
• Rise of cloud-based solutions such as Amazon S3
(simple storage solution)
• Just as moving to dynamically-typed languages (Python,
Ruby, Groovy), a shift to dynamically-typed data with
frequent schema changes

11
11
Whytoare
Click edit RDBMS
Master titlenot
stylesuitable for
Big Data
• The context is Internet
• RDBMSs assume that data are Dense
• Largely uniform (structured data)
• Data coming from Internet are
Massive and sparse
Semi-structured or unstructured
• With massive sparse data sets, the typical storage
mechanisms and access methods get stretched

12
12
NoSQL
Click Database
to edit Master titleTypes
style
Discussing NoSQL databases is complicated because there are a
variety of types:

• Sorted ordered Column Store


Optimized for queries over large datasets, and store
columns of data together, instead of rows
• Document databases:
pair each key with a complex data structure known as a
document.
• Key-Value Store :
are the simplest NoSQL databases. Every single item in the
database is stored as an attribute name (or 'key'), together
with its value.
13
13
• Graph Databases :
Document
Click Databases
to edit Master title style (Document
Documents
Store)
• Loosely structured sets of key/value pairs in
documents, e.g., XML, JSON, BSON
• Encapsulate and encode data in some standard
formats or encodings
• Are addressed in the database via a unique key
• Documents are treated as a whole, avoiding splitting a
document into its constituent name/value pairs
Allow documents retrieving by keys or contents
Notable for:
• MongoDB (used in FourSquare, Github, and more)
• CouchDB (used in Apple, BBC, Canonical, Cern, and
more)
14
14
Document
Click Databases,
to edit JSON
Master title style

{
_id: ObjectId("51156a1e056d6f966f268f81"),
type: "Article",
author: "Derick Rethans",
title: "Introduction to Document Databases with MongoDB",
date: ISODate("2013-04-24T16:26:31.911Z"),
body: "This arti…"
},
{
_id: ObjectId("51156a1e056d6f966f268f82"),
type: "Book",
author: "Derick Rethans",
title: "php|architect's Guide to Date and Time Programming with PHP",
isbn: "978-0-9738621-5-7"
}

15
15
Click to edit
Key/Value Master title style
stores

• Store data in a schema-


less way
• Store data as maps
• HashMaps or
associative arrays
• Provide a very efficient
average running time
algorithm for accessing
data

16
16
Click
SortedtoOrdered
edit Master title style
Column-Oriented Stores

• Data are stored in a column-


oriented way
• Data efficiently stored
• Columns are grouped in
column-families
• Data isn’t stored as a single
table but is stored by column
families
• Notable for:
• Google's Bigtable (used in all
Google's services)
• HBase (Facebook,
StumbleUpon, Hulu,
Yahoo!, ...)
17
17
Click
GraphtoDatabases
edit Master title style

• Everything is stored as an
edge, a node or an attribute.
• Each node and edge can have
any number of attributes.
• Both the nodes and edges can
be labelled.
• Labels can be used to narrow
searches.

18
18

You might also like