0% found this document useful (0 votes)
10 views

CS3492-DBMS unit-5

DBMS - CS3492 -UNIT 5

Uploaded by

anithaselvi92
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
10 views

CS3492-DBMS unit-5

DBMS - CS3492 -UNIT 5

Uploaded by

anithaselvi92
Copyright
© © All Rights Reserved
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 9

UNIT -5

Distributed Databases: Architecture, Data Storage, Transaction Processing, Query processing and
optimization – NOSQL Databases: Introduction – CAP Theorem – Document Based systems – Key value
Stores – Column Based Systems – Graph Databases. Database Security: Security issues – Access control
based on privileges – Role Based access control – SQL Injection – Statistical Database security – Flow
control – Encryption and Public Key infrastructures – Challenges

Distributed Databases: Architecture, Data Storage, Transaction Processing

(pg no:825-830)

Distributed databases are collections of data distributed across multiple locations, connected
via a network. They enable data to be stored, managed, and accessed efficiently across different
geographical sites, improving performance, scalability, and reliability.

Architecture

1. Homogeneous Architecture: All sites use the same database management system
(DBMS) and data models, simplifying integration and communication.
2. Heterogeneous Architecture: Different sites may use different DBMSs and data
models, requiring middleware for translation and integration.
3. Client-Server Architecture: Clients request data and services from centralized servers.
This architecture is simple but can have scalability issues.
4. Peer-to-Peer Architecture: Each site (or peer) functions both as a client and a server,
sharing equal responsibilities in data processing and management, improving fault
tolerance and load balancing.
5. Middleware Systems: Middleware layers facilitate communication and data exchange
between different sites in heterogeneous architectures, ensuring data consistency and
query optimization.

Data Storage

1. Fragmentation: Data is divided into fragments to improve access speed and efficiency.
o Horizontal Fragmentation: Divides a table into rows, distributing different
subsets of rows to different sites.
2. For example, consider an EMPLOYEE table (T) :
Desig
Eno Ename Salary Dep
n

101 A abc 3000 1


Desig
Eno Ename Salary Dep
n

102 B abc 4000 1

103 C abc 5500 2

104 D abc 5000 2

105 E abc 2000 2

3. This EMPLOYEE table can be divided into different fragments like:


4. EMP 1 = σDep = 1 EMPLOYEE
5. EMP 2 = σDep = 2 EMPLOYEE

o Vertical Fragmentation: Divides a table into columns, distributing different


subsets of columns to different sites.add special attribute tuple _idk

Eno Ename Design Tuple_id

101 A abc 1

102 B abc 2

103 C abc 3

104 D abc 4

105 E abc 5

o Hybrid Fragmentation: Combines both horizontal and vertical fragmentation.


6. Replication: Copies of data fragments are stored at multiple sites to increase data
availability and fault tolerance.
o Full Replication: All sites store a complete copy of the database.
o Partial Replication: Only certain data fragments are replicated at some sites.
7. Data Allocation: Determines where data fragments are stored.
o Centralized Allocation: All data is stored at a single central site.
o Partitioned Allocation: Different fragments are stored at different sites based on
access patterns.

Transaction Processing

1. ACID Properties: Ensures that transactions are processed reliably and ensures
consistency.
o Atomicity: Ensures all parts of a transaction are completed; otherwise, the
transaction is aborted.
o Consistency: Ensures transactions transition the database from one consistent
state to another.
o Isolation: Ensures that transactions are executed independently without
interference.
o Durability: Ensures that once a transaction is committed, it remains so, even in
the event of a failure.
2. Concurrency Control: Manages simultaneous transactions to prevent conflicts and
ensure data integrity.
o Two-Phase Locking (2PL): Ensures that once a transaction releases a lock, it
cannot obtain any new locks.
o Timestamp Ordering: Orders transactions based on timestamps to prevent
conflicts.
3. Commit Protocols:
o Two-Phase Commit (2PC): Ensures all sites in a distributed transaction agree to
commit or abort the transaction.
 Phase 1: The coordinator sends a prepare request to all sites.
 Phase 2: Based on the responses, the coordinator sends a commit or abort
request.
o Three-Phase Commit (3PC): Adds a pre-commit phase to 2PC, reducing the
likelihood of blocking in the event of a failure.
4. Recovery Mechanisms: Ensure data integrity and consistency in case of system failures.
o Checkpointing: Periodically saves the state of the database to facilitate recovery.
o Logging: Records changes made by transactions to facilitate rollback or redo
operations.

Example Scenario:

Imagine a multinational corporation with offices in New York, London, and Tokyo, using a
distributed database.

 Architecture: A heterogeneous architecture with middleware to manage different


DBMSs at each office.
 Data Storage:
o Fragmentation: Customer data is horizontally fragmented by region, with New
York handling North America, London handling Europe, and Tokyo handling
Asia.
o Replication: Product catalog data is fully replicated at all sites to ensure quick
access and high availability.
 Transaction Processing:
o ACID Compliance: Ensured through two-phase commit protocol during global
transactions involving multiple sites.
o Concurrency Control: Managed via timestamp ordering to handle high
transaction volumes without conflicts.
o Recovery: Implemented through checkpointing and logging to quickly recover
from failures.

NoSQL databases are a category of database management systems that diverge from traditional
relational database systems by not relying on fixed schemas, allowing for more flexibility and
scalability. They are designed to handle large volumes of data, high velocity of data generation,
and varying data structures. NoSQL databases are particularly suited for big data and real-time
web applications.HJJ

NOSQL Databases

Introduction
• NoSQL stands for not only SQL.
• It is nontabular database system that store data differently than relational tables.
There are various types of NoSQL databases such as document, key-value, wide
column and graph.
• Using NoSQL we can maintain flexible schemas and these schemas can be scaled
easily with large amount of data

Need
The NoSQL database technology is usually adopted for following reasons -ut
1) The NoSQL databases are often used for handling big data as a part of
fundamental architecture.
2) The NoSQL databases are used for storing and modelling structured, semi-
structured and unstructured data.
3) For the efficient execution of database with high availability, NoSQL is used.
4) The NoSQL database is non-relational, so it scales out better than relational
databases and these can be designed with web applications.
5) For easy scalability, the NoSQL is used.

Features
1) The NoSQL does not follow any relational model.
2) It is either schema free or have relaxed schema. That means it does not require
specific definition of schema.
3) Multiple NoSQL databases can be executed in distributed fashion.
4) It can process both unstructured and semi-structured data.
5) The NoSQL have higher scalability.
6) It is cost effective.
7) It supports the data in the form of key-value pair, wide columns and graphs.

Comparison between RDBMS and NoSQL

CAP Theorem (852)


CAP Theorem, proposed by Eric Brewer, states that in a distributed data store, it is impossible
to simultaneously achieve all three of the following properties:

1. Consistency: Every read receives the most recent write or an error.


2. Availability: Every request receives a (non-error) response, without guarantee that it
contains the most recent write.
3. Partition Tolerance: The system continues to operate despite an arbitrary number of
messages being dropped or delayed by the network.

According to the CAP theorem, a distributed system can satisfy any two of these guarantees at
the same time, but not all three. Most NoSQL databases are designed to be either CP (Consistent
and Partition-tolerant) or AP (Available and Partition-tolerant).

Document-Based Systems

Document-based systems store data in documents, typically using formats such as JSON,
BSON, or XML. Each document contains semi-structured data, and different documents can
have different structures.

 Examples: MongoDB, CouchDB.


 Use Cases: Content management systems, blogging platforms, e-commerce applications.
 Advantages: Flexibility in data models, easy to scale horizontally, good for storing
complex nested data structures.

Key-Value Stores

Key-value stores are the simplest form of NoSQL databases. They store data as a collection of
key-value pairs, where the key is a unique identifier, and the value can be a string, JSON
document, or any other type of data.

 Examples: Redis, DynamoDB, Riak.


 Use Cases: Caching, session management, real-time analytics.
 Advantages: High performance, simplicity, fast lookups.

Column-Based Systems

Column-based systems store data in columns rather than rows, optimizing read and write
operations for large datasets and enabling efficient retrieval of large volumes of data.

 Examples: Apache Cassandra, HBase.


 Use Cases: Data warehousing, real-time analytics, logging.
 Advantages: High write and read throughput, scalability, suitable for time-series data.

Graph Databases
Graph databases are designed to store and navigate relationships between entities. Data is
stored in nodes (entities) and edges (relationships), with properties associated with both.

 Examples: Neo4j, Amazon Neptune.


 Use Cases: Social networks, recommendation engines, fraud detection.
 Advantages: Efficiently handle complex relationships, easy to model and query
interconnected data.

Database Security

Database security encompasses measures to protect database management systems from various
threats, ensuring data confidentiality, integrity, and availability. This involves addressing
security issues such as unauthorized access, data breaches, and other vulnerabilities.

Security Issues

1. Unauthorized Access: Unauthorized users gaining access to sensitive data.


2. Data Breaches: Exposure of sensitive information due to malicious attacks.
3. Insider Threats: Employees or insiders misusing their access privileges.
4. SQL Injection: Attackers inserting malicious SQL queries into input fields to manipulate
the database.

Access Control Based on Privileges

Access control mechanisms restrict access to data based on user privileges.

 User Privileges: Define specific permissions for individual users.


 Permissions: Include operations like SELECT, INSERT, UPDATE, DELETE, and more.
 Grant and Revoke: SQL commands used to assign or remove privileges from users.

Role-Based Access Control (RBAC)

RBAC assigns permissions to roles rather than individuals, simplifying management.

 Roles: Groups of privileges that can be assigned to users.


 Role Hierarchies: Roles can be arranged in a hierarchy, with higher roles inheriting
permissions from lower roles.
 Example: An "admin" role might include all privileges, while a "user" role has limited
permissions.

SQL Injection

SQL injection is a common attack method where attackers exploit vulnerabilities in input fields
to execute malicious SQL code.

 Techniques: Include manipulating SQL queries to extract, modify, or delete data.


 Prevention: Use parameterized queries, prepared statements, and input validation to
mitigate risks.

Statistical Database Security

Statistical databases provide aggregate information without revealing individual data records.

 Risk: Sensitive information can be inferred from statistical queries.


 Techniques: Include data perturbation, noise addition, and query restriction to protect
individual data.

Flow Control

Flow control prevents unauthorized data flow within the database system.

 Goal: Ensure that information only flows in allowed paths.


 Techniques: Include labeling data with security levels and enforcing policies to prevent
illegal information flow.

Encryption and Public Key Infrastructures (PKI)

Encryption protects data by converting it into an unreadable format, only accessible by


authorized users with the decryption key.

 Symmetric Encryption: Uses a single key for both encryption and decryption (e.g.,
AES).
 Asymmetric Encryption: Uses a pair of keys (public and private) for encryption and
decryption (e.g., RSA).
 Public Key Infrastructure (PKI): Manages digital certificates and public-private key
pairs to facilitate secure communication and data exchange.

Challenges

Challenges
Following are the challenges faced by the database security system -
(1) Data Quality
• The database community need the solution to assess the quality of data. The
quality of data can be assessed by a simple mechanism such as quality stamps that
are posted on web sites:
• The database community may need more effective technique of integrity
semantic verification for accessing the quality of data.
• Application level recovery techniques are also used to repair incorrect data.
(2) Intellectual Property Rights
• Everywhere there is increasing use of internet and intranet. Due to which there
are chances of making un-authorized duplication and distribution of the contents.
Hence digital watermarking technique is used to protect the contents from
unauthorized access or ownership.
• However, research is needed to develop the techniques for preventing intellectual
property right violation.
(3) Database Survivability
• It is desired that the database systems must continue to work even after
information warfare attacks.
• The goal of information warfare attacker is to damage the organization's
operation.

Following are the corrective actions for handling this situation –

Confinement : Take immediate action to eliminate attacker's access to the system.


Isolate the affected components to avoid further spread.
• Damage Assessment: Determine the extent of problem.
• Reconfiguration:Re-configuration allows the system to be in operation in
degraded mode while recovery is going on.
• Repair:Recover the corrupted or lost data by repairing or reinstalling the
system.
• Fault treatment: Identify the weakness exploited in the attack and take steps
to prevent a recurrence.

1. Performance Impact: Security measures can affect database performance.


2. Complexity: Implementing comprehensive security policies can be complex.
3. Insider Threats: Difficult to detect and prevent malicious activities by authorized users.
4. Evolving Threats: Continuous adaptation is required to address new and sophisticated
attack methods.
5. Compliance: Ensuring compliance with various regulations (e.g., GDPR, HIPAA) can be
challenging.

You might also like