0% found this document useful (0 votes)
22 views

Unit 5 Lecture 1

Uploaded by

Mansi Varshney
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Unit 5 Lecture 1

Uploaded by

Mansi Varshney
Copyright
© © All Rights Reserved
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 19

Subject Name :-Cloud Computing

Subject Code :- KCS 713


Unit No. :- 5
Lecture No. :- 1
Topic Name :- Cloud Computing: Managing Data,
Parallel Database Architecture
Introduction
• Relational database
• Default data storage and retrieval mechanism since 80s
• Efficient in: transaction processing
• Example: System R, Ingres, etc.
• Replaced hierarchical and network databases
• Data Storage Techniques
• Row Oriented Database
• Column Storage Database
• Parallel Database Architecture
• Important Questions
• References
Relational Databases
• Users/application programs interact with an RDBMS through SQL
• RDBM parser:(that has tuples and a set of rows of same schema)
– Transforms queries into memory and disk-level operations
– Optimizes execution time
• Disk-space management layer:
– Stores data records on pages of contiguous memory blocks
– Pages are fetched from disk into memory as requested using pre-fetching and
page replacement policies (Query Processing and storage processing)

SQL or Structured Query Language is the primary interface used to communicate with Relational Databases. SQL became a standard
of the American National Standards Institute (ANSI) in 1986. The standard ANSI SQL is supported by all popular relational database
engines, and some of these engines also have extension to ANSI SQL to support functionality which is specific to that engine. SQL is
used to add, update or delete rows of data, retrieving subsets of data for transaction processing and analytics applications, and to
manage all aspects of the database.
Relational Databases

• Database file system layer:


– Independent of OS file system
– Reason:
• To have full control on retaining or releasing a page in memory
• Files used by the DB may span multiple disks to handle large storage
– Uses parallel I/O systems, viz. RAID disk arrays or multiprocessor clusters
Data Storage Techniques

• There are two types of data storage database one is row oriented database and
another one is column oriented database.
• Row oriented database is traditional database like Oracle ,MySql and etc. It stores
data table by row and common method of storing a table is to serialize each row of
data. Row-based systems are designed to efficiently return data for an entire row, or
record.
• On the other hand, column based database are "No SQL" database such as HBase
and Cassandra. Column oriented databases do not support "traditional" transactional
secondary indices. It is the responsibility of the user to maintain "inverted index"
Sr. No. Key Row Oriented Database Column Oriented Database

1 Basic It stores data table by row. It stores data table by column.

2 Data Data accessing happens row by row Data accessing happens column by
Accessing column

3 Storage Storage size optimization limited due Column based systems provide better
to reduced ability of data compression storage size optimization capabilities.
in row based systems

4. Performance It takes longer time than column It is faster than row oriented database
oriented database because it requires
multiple disk read

5. Use Case Best suited for OLTP Best suited for OLAP
Data Storage Techniques
Types of data stores
Key / value stores (opaque)

• Keys are mapped to values


• Values are treated as BLOBs (opaque data) No type information is stored
• Values can be heterogenous

key value

key value

Example values:
{ name: „foo“, age: 25, city: „bar“ } => JSON, but store will not care about it
\xde\xad\xb0\x0b => binary, but store will not care about it
Key / value stores (typed)
Document stores (non-shaped)
Document stores (shaped)
Parallel Database Architectures
• Shared memory
– Suitable for servers with multiple CPUs
– Memory address space is shared and managed by a symmetric multi-processing (SMP)
operating system
– SMP:
• Schedules processes in parallel exploiting all the processors
• Shared nothing
– Cluster of independent servers each with its own disk space
– Connected by a network
• Shared disk
– Hybrid architecture
– Independent server clusters share storage through high-speed network storage viz. NAS
(network attached storage) or SAN (storage area network)
– Clusters are connected to storage via: standard Ethernet, or faster Fiber Channel or Infiniband
connections
Parallel Database Architectures
Advantages of Parallel DB over Relational DB
• Efficient execution of SQL queries by exploiting multiple processors
• For shared nothing architecture:
– Tables are partitioned and distributed across multiple processing nodes
– SQL optimizer handles distributed joins
• Distributed two-phase commit locking for transaction isolation between processors
• Fault tolerant
– System failures handled by transferring control to “stand-by” system [for transaction
processing]
– Restoring computations [for data warehousing applications]

• A transaction is said to follow Two Phase Locking protocol if Locking and Unlocking can be done in
two phases.
1.Growing Phase: New locks on data items may be acquired but none can be released.
2.Shrinking Phase: Existing locks may be released but no new locks can be acquired.
Advantages of Parallel DB over Relational DB

• Examples of databases capable of handling parallel processing:


– Traditional transaction processing databases: Oracle, DB2, SQL
Server
– Data warehousing databases: Netezza, Vertica, Teradata
Important Questions

1. What is NoSQL?

2. Explain the difference between NoSQL v/s Relational database?

3. What is row oriented data storage?

4. What is column oriented data storage?

5. Explain the difference between row and column oriented data storage.
References
 Dan C Marinescu: “ Cloud Computing Theory and Practice.” Elsevier(MK) 2013.
 RajkumarBuyya, James Broberg, Andrzej Goscinski: “Cloud Computing Principles
and Paradigms”, Willey 2014.
 https://www.ques10.com/p/13989/explain-architecture-of-google-file-system-1/
 https://www.sciencedirect.com/topics/computer-science/google-file-system
 https://www.researchgate.net/publication/220910111_The_Google_File_System

You might also like