Advanced Database - Allchapters
Advanced Database - Allchapters
Evaluation Schemes
9
Brain Storming
1. Differentiate the terms:
• Database,
• Database System.
Definition of Terms
11
Brain Storming
1. How has the database technology evolved?
Types of DBMS Models
13
• In a Hierarchical database model, the data is
organized in a tree-like structure.
• Data is Stored Hierarchically (top down or
bottom up) format.
• Data is represented using a parent-child
relationship that are one to one or one to many.
• In Hierarchical DBMS parent may have many
children, but children have only one parent.
14
• The network database model allows each child to have multiple
parents.
• More complex relationships such as many-to-many relationship
(e.g., orders/parts).
• The entities are organized in a graph which can be accessed
through several paths.
15
• Easiest and Most widely used DBMS model
• Based on normalizing data in the rows and columns
of the tables.
• Data is stored in fixed structures and manipulated
using SQL.
16
Characteristics of Relational model (70’s) ….
Clean and simple.
17
What is missing??
Handling of complex objects
– Could not store complex data types
such as images or sound.
Handling of complex data types
– RDBMSs have provided limited data types.
Code is not coupled with data
– SQL is declarative but programming
languages are procedural.
18
• In Object-oriented Model data stored in the form of objects.
• The structure which is called classes which display data within
it.
• It defines a database as a collection of objects which stores both
data member values and operations.
• Properties
• Name
• Height
• Weight……..
• Behaviors
• Eat
• Pray
• Walk …..
19
Object-Oriented models (80’s):
▪ Complicated, but some influential ideas from Object Oriented
programming
Distribution
Design transactions
Versions
24
Discussion …
1. What are the main features of OOP?
30
Object Data Management Group(ODMG )
❑ ODMG — to define standards for OODBMSs.
❑ Its standard is 3.0 which is popular.
– provide a standard where previously there was none
Example:
▪ Ethiopian calendar
▪ European calendar
▪ Arabic calendar
▪ Indian calendar
ODMG Literals
▪ ODMG Literals- are special values used in Object Database
Management Systems (ODBMS) that follow the ODMG (Object Data
Management Group) standard.
Object types
• An object type is a blueprint or template that defines the structure
and behavior of its objects.
• The object type specifies what attributes (data) objects of that type
will have and what methods (actions) they can perform.
• For example, if you have an object type named “Car”, all Car objects
would share certain attributes like model, color, and number of
doors.
• All Car objects also share methods like accelerate(), brake(), and
turn().
37
Object types
▪ An object is made of two things
▪ State
▪ Behaviour
State
– is defined by the values of an object carries for a set of properties,
which may be either an attribute of the object or a relationship
between the object and one or more other objects.
– Example- Attributes (name, address, birthDate of a person)
38
Relationships
Relationships are defined between types.
39
Relationships
40
Relationships
41
Behaviour
▪ Behaviour – is defined by a set of operations that can be performed
on or by the object.
▪ Types of operations:
▪ Query: accesses the state of an object but does not alter its state
▪ Atom,
▪ Collection.
43
Atomic constructors
44
struct (or tuple) constructor
Create standard structured types, such as the tuples (record
types) in the basic relational model.
Example
45
Collection (or multivalued)
Collection is used to create complex nested type structures in the
object model.
Collection type constructors includes;
1. Set(T) - unordered collections that do not allow duplicates.
2. Bag(T) - allows duplicate elements in the collection and also
inherits the collection interface.
3. List(T) - create collections where the order of the elements is
important.
Interface is Noninstantiable
47
Classes
Class defines both the abstract state and behavior of an object type.
48
Classes
• Classes are blueprints or templates for creating objects.
• A class defines the common properties and behaviors that
objects of the same type possess.
• It specifies the attributes and methods.
• A class has:
– A name
– A set of attributes
– A set of methods
– A set of constraints
49
Extents and keys
Extent and keys are Specified during class definition:
Extent is the set of all instances of a given type within a particular ODMS.
Deleting an object removes the object from the extent of the type.
Abstraction
▪ Abstraction focuses on hiding unnecessary details and exposing only
relevant information.
age(): Integer
It enables the reuse of attributes and methods from a changeAddress(newAdd: string)
This leads to the creation of a type lattice rather than a type hierarchy.
▪ Overloading –
▪ Overriding –
57
Versioning
An object version represents an identifiable state of an object.
58
Overview of ODL & OQL
The Object Definition Language (ODL) is a language for defining
the specifications of object types for ODMG-compliant systems.
59
Overview of ODL & OQL
Object Query Language (OQL) provides declarative access to
the object database using an SQL-like syntax.
It does not provide explicit update operators, but leaves this to the
operations defined on object types.
60
Querying object-relational database
Most relational operators work on the object-relational tables
61
Brain Storming
1. Can you provide an overview of what SQL is and the basic
commands for querying and manipulating data?
❖ Aims
Scanner
Runtime DB Relational Algebra Form
Processor parser
Convert to
Machine code Intermediate Form RA/RC
processing a query.
❖ Task: Find an efficient physical query plan (aka execution plan) for an
SQL query
Goal: Minimize the evaluation time for the query, i.e., compute
query result as fast as possible
Cost Factors: Disk accesses, read/write operations, [I/O, page
transfer] (CPU time is typically ignored)
Optimization: find the most efficient evaluation plan for a query because
there can be more than one way.
Examples:
❖ Find all Managers who work at a London branch.
Example - 2
SELECT * FROM Staff s, Branch b WHERE s.branchNo = b.branchNo
AND (s.position = ‘Manager’ AND b.city = ‘London’);
The equivalent relational algebra queries corresponding to
this SQL statement are:
The
Different
Strategies
Cost Comparison
❖ Cost (in disk accesses) are:
❖ Cartesian product and join operations are much more expensive than
selection.
We will see shortly that one of the fundamental strategies in query
processing is to perform the unary operations, Selection and Projection,
as early as possible, thereby reducing the operands of any subsequent
binary operations.
Phases of query processing
▪ Query Decomposition
Transform high-level query into RA query.
▪ Analysis,
▪ Normalization,
▪ Semantic analysis,
▪ Simplification,
▪ Query restructuring.
▪ Analysis
▪ Analyze query lexically and syntactically using compiler techniques.
▪ Verify relations and attributes exist.
▪ Verify operations are appropriate for object type.
Analysis
▪ Finally, query transformed into a query tree constructed as follows:
equivalent to
SELECT TITLE FROM Emp E WHERE ENAME= “J.Doe”;
Restructuring - Transforming the query into a formal relational
algebra expression suitable for optimization.
C
. onvert SQL to relational algebra
Make use of query trees
Example: SELECT Ename FROM Emp,
Works, Project WHERE Emp.Eno =
Works.Eno AND Works.Pno = Project.Pno
AND Ename <> ‘J. Doe’ AND Pname =
‘CAD/CAM’ AND (Dur = 12 OR Dur = 24)
Query tree
Query tree is a data structure that corresponds to a relational algebra
expression
2. Commutativity of Selection
3. In a sequence of Projection operations, only the last in the sequence is
required.
❖ Techniques:
Heuristic rules
▪ Query tree (relational algebra) optimization
This is typically done by estimating the cost of each possible execution plan
based on factors such as the number of rows to be processed, the complexity
of the query, and the available resources.
Cost can be CPU time, I/O time, communication time, main memory
usage, or a combination.
The candidate query tree with the least total cost is selected for execution.
Measures of Query Cost
▪ There are many possible ways to estimate cost, e.g., based on
– bFactor(R) – the blocking factor of R (that is, the number of tuples of R that fit
into one block).
– We use [x] to indicate that the result of the calculation is rounded to the
smallest integer that is greater than or equal to x.
For each attribute A of base relation R
nDistinctA(R) – the number of distinct values that appear for
attribute A in relation R.
▪ Cost depends on
▪ Types of query conditions
• NS = 10000/50 = 200
• = log2(500) + 200/20 - 1 = 18
Cost of Join
Additional notation:
Ignoring timing
nR nS nR nS
NJ = min( , )
dist (R. A) dist (S.B)
Estimate Size of Join Result (cont.)
How wide is a tuple in join result?
Natural join: W = W(R) + W(S) – W(SR)
Theta join: W = W(R) + W(S)
What is blocking factor of join result?
bfJoin = block size / W
How many blocks does join result have?
bJoin = NJ / bfJoin
Query Execution Plans
A query execution plan is a blueprint that outlines the specific
steps the DBMS will take to retrieve data for a given SQL query.
The goal is to retrieve the desired data as quickly and with as few
resources as possible.
Tasks includes –
Proper Table Indexing:
Denormalization
— Optimization
— Evaluation
TRANSACTION PROCESSING
Chapter Outline
01: Introduction to Transaction Processing
Parallel processing:
– Processes are concurrently
executed in multiple CPUs.
Transaction boundaries:
▪ Begin and End transaction.
▪ Suppose a bank employee transfers $500 from A‘s account to B's account.
▪ This very simple and small transaction involves several low-level tasks.
Simple Model of a Database
▪ A database - collection of named data items
Those are;
Atomicity
Consistency preservation
Isolation
Durability (permanency)
Atomic transactions
• Atomicity: A transaction is an atomic unit of processing; it is either
performed in its entirety or not performed at all.
• Example: John wants to move $200 from his savings account to his
checking account.
Example:
Durability
• Once a transaction changes the database and the changes are committed,
these changes must never be lost because of subsequent failure.
Concurrency Control
Isolation (+ Consistency) => Concurrency Control
T1 T2 State of X
read_item(X); 20
X:= X+10; read_item(X); 20
X:= X+20;
write_item(X); 40
commit;
Lost update write_item(X); 30
commit;
T1 T2 State of X sum
read_item(X); 20 0
X:= X+10;
Dirty update write_item(X);
read_ item(X); 30
sum:= sum+X;
write_item(sum); 30
X:=X+10; commit;
write_item(X); 40
Rollback
commit;
read_item(A); 0
sum:= sum+A;
write_item(A);
commit; 100
read_item(X); 30
X:= X-10;
write_item(X);
commit; read_item(X); 20
sum:= sum+X;
read_item(Y); 10
sum:= sum+Y;
read_item(Y); 10
Y:= Y+10;
write_item(Y); 20
commit;
Incorrect summary
Discuss what problem is found in the schedule and what will be the correct
value of Accounts A, B & C averages?
➢ WHY RECOVERY IS NEEDED: (WHAT CAUSES A
TRANSACTION TO FAIL?)
➢ A computer failure (system crash)
➢ Disk failure
✓ Redoing transactions:
▪ Serializable schedule
• A schedule is equivalent to some serial schedule of the same n transactions.
▪ Result equivalent
• Two schedules are producing the same final state of the database.
▪ Conflict equivalent
• The order of any two conflicting operations is the same in both schedules.
Figure 3.2 Examples of serial and nonserial schedules involving transactions
T1 and T2. (a) Serial schedule A: T1 followed by T2. (b) Serial schedule B: T2
followed by T1. (c) Two nonserial schedules C and D with interleaving of
operations.
Schedule Notation
• A more compact notation for schedules:
T3
b3, r3(Y), w3(Y), e3, c3
begin
read(Y)
r3(Y)
Y = Y+1
write(Y)
operation data item
end
transaction commit
note: we ignore the computations on the local copies of the data when
considering schedules (they're not interesting)
Examples
A serial schedule is one in which the transactions do not overlap (in
time).
– Conflict Serializability
– View Serializability:
▪ Conflict serializable:
• Two schedules are conflict equivalent if the order of any two conflicting
operations is the same in both schedules.
• Two operations conflict
– they access the same data item (read or write)
– if they belong to different transactions
– at least one is a write
T1: b1,r1(X),w1(X),r1(Y),w1(Y),e1,c1,
conflicting operations:
r1(X),w2(X)
T2: b2,r2(X),w2(X),e2,c2 w1(X), r2(X)
w1(X), w2(X)
– Find the conflicting operation?
Two operations are conflicting, if changing their order can result in a
different outcome
Example: Conflict Equivalence
schedule 1:
b1,r1(X),w1(X),r1(Y),w1(Y),e1,c1,
b2,r2(X),w2(X),e2,c2
schedule 2: r1(X) < w2(X), w1(X) < r2(X), w1(X) < w2(X)
b2,r2(X),w2(X),
b1,r1(X),w1(X),r1(Y),w1(Y),e1,c1, e2,c2
w2(X) < r1(X), r2(X) < w1(X), w2(X) < w1(X)
schedule 3:
b1,r1(X),w1(X),
b2,r2(X),w2(X),e2,c2, r1(Y),w1(Y),e1,c1,
r1(X) < w2(X), w1(X) < r2(X), w1(X) < w2(X)
Schedule1and schedule 3 are conflict equivalent schedule 2 is not
conflict equivalent to either schedule 1 or 3
Testing for Conflict Serializability
• Precedence graphs are a more efficient test
– graph indicates a partial order on the transactions required
by the order of the conflicting operations.
– the partial order must hold in any conflict equivalent serial
schedule
– if there is a loop in the graph, the partial order is not
possible in any serial schedule
– if the graph has no loops, the schedule is conflict serializable
Precedence Graph Examples: find the graph the conflict
operation between the transactions?
schedule 3:
b1,r1(X),w1(X),
b2,r2(X),w2(X),e2,c2, r1(Y),w1(Y),e1,c1,
Find the conflict operations ?
r1(X) < w2(X), w1(X) < r2(X), w1(X) < w2(X)
T1 T2
r2(X) < w1(X)
T1 T2
r2(X) < w1(X)
S: r1(x) r2(z) r3(x) r1(z) r2(y) r3(y) w1(x) w2(z) w3(y) w2(y)
e1,c1,e2,c2,e3,c3
Chapter -4
Concurrency Control
What is Concurrency Control?
▪ Why?
Lost Updates
Temporary update (dirty read)
Non-Repeatable Read
Incorrect Summary issue
Purpose of Concurrency Control
- To force isolation (through mutual exclusion) among
conflicting Transactions
• Timestamp-Based Protocols
• Validation-Based Protocols
– The 3 activities taking place in the two phase update algorithm are:
(i). Lock Acquisition
Lock-compatibility matrix
Example of a transaction performing locking:
T1 T2
LOCKX(A) LOCKX(SUM)
READ(A) A=1000 SUM:=0
A:=A-200 A=800 LOCKS(A)
WRITE(A) A=800 READ(A) A=800
UNLOCK(A) SUM:=SUM+A SUM=800
LOCKX(B) UNLOCK(A)
READ(B) B=900 LOCKS(B)
B:=B+200 B=1100 READ(B) B=1100
WRITE(B) B=1100 SUM:=SUM+B SUM=1900
UNLOCK(B) WRITE(SUM) SUM=1900
UNLOCK(B)
UNLOCK(SUM)
IF EXECUTED SERIALLY THE OUTPUT
WILL BE 1900
Consider another example
▪ T1
▪ UNLOCK(A)
▪ LOCKX(B)
▪ READ(B) ▪ LOCKS(B)
▪ B:=B-50
▪ WRITE(B) ▪ READ(B) B=150
▪ UNLOCK(B)
▪ LOCKX(A) ▪ UNLOCK(B)
▪ READ(A)
▪ DISPLAY(A+B)
▪ A:=A+50
▪ WRITE(A) ▪ THIS IS CLEAR THAT IF THEY
▪ UNLOCK(A) RUN SEQUENTIALLY THE OUT
▪ T2:
PUT WILL BE 300
▪ LOCKS(A)
▪ READ(A) A=150
▪ Phase 1: Growing Phase
Transaction may obtain locks
Transaction may not release locks
Phantom problem can occur when a new record that is being inserted by
some transaction T satisfies a condition that a set of records accessed by
another transaction T must satisfy.
DATABASE RECOVERY
TECHNIQUES
approach.
Commit point:
This is the point at which transaction is finished, and all of the database
The updates are recorded in the log must contain the old
values and new values.
Advantages
•No-redo/no-undo
Disadvantages
•Creating shadow directory may take a long time.
•Updated database pages change locations.
•Garbage collection is needed
“ARIES” Recovery algorithm.
Recovery algorithms are techniques to ensure database
consistency ,transaction atomicity and durability without any
failure.
Undo - Scan the log backward and undo the actions of the
active transactions in the reverse order.
Recovery from disk crashes.
Recovery from disk crashes is much more difficult than recovery
from transaction failures or machines crashes.
Loss from such crashes is much less common today than it was
previously, because of the wide use of redundancy in secondary
storage (RAID( Redundant Array of Independent Disk) technology).
(RAID - method of combining several hard disk drives into one
logical unit.)
Typical methods are;
The log for the database system is usually written on a separate
physical disk from the database.
or,
Periodically, the database is also backed up to tape or other
archival storage.
Conclusion.
✓ Types of failures.
✓ Steal/no steal, Force/no force approaches.
✓ Deferred and immediate update strategies.
✓ Shadow paging technique.
✓ ARIES recovery algorithm.
✓ Recovery from disk crashes.
DATABASE SECURITY AND
AUTHORIZATION
• Loss of integrity: (users should be able to modify things they are not
supposed to.)
to control login
Controlling the access to a statistical database - used to provide
…
…If any tampering with the database is suspected, a database audit is performed,
This consists of
reviewing the log-
to examine all accesses and operations applied to the database
during a certain time period.
• …
A database log that is used mainly for security purposes is sometimes called an
audit trail.
Types of database security mechanisms:
The relation (or table level): At this level, the DBA can control the privilege
to access each individual relation or view in the database.
… The granting and revoking of privileges generally follow an authorization
model for discretionary privileges known as the access matrix model,
here the rows of a matrix M represents subjects (users, accounts, programs) and
the columns represent objects (relations, records, columns, views, operations).
Each position M(i, j) in the matrix represents the types of privileges (read, write,
update) that subject i holds on object j.
… To control the granting and revoking of relation privileges, each relation R
in a database is assigned and owner account (created first)
The owner of a relation is given all privileges on that relation.
The owner account holder can pass privileges on any of the owned relation to
other users by granting privileges to their accounts.
In SQL the following types of privileges can be granted on each individual
relation R:
SELECT (retrieval or read) privilege on R: Gives the account retrieval privilege.
In SQL this gives the account the privilege to use the SELECT statement to
retrieve tuples from R.
MODIFY privileges on R: Gives the account the capability to modify tuples of R.
▪ In SQL this privilege is further divided into UPDATE, DELETE, and INSERT
privileges to apply the corresponding SQL command to R.
▪ In addition, both the INSERT and UPDATE privileges can specify that only
certain attributes can be updated by the account.
REFERENCES privilege on R: This gives the account the
capability to reference relation R when specifying integrity
constraints.
The privilege can also be restricted to specific attributes of R.
Notice that to create a view, the account must have SELECT
privilege on all relations involved in the view definition.
Specifying Privileges Using Views
The mechanism of views is an important discretionary
…
authorization mechanism in its own right.
…Example:
• …
Suppose that the DBA creates four accounts A1, A2, A3, and A4 and wants only A1 to
be able to create base relations; then the DBA must issue the following GRANT
command in SQL:
…
GRANT CREATE TABLE TO A1;
• …
In SQL the same effect can be accomplished by having the DBA issue
a CREATE SCHEMA command as follows:
…
CREATE SCHAMA EXAMPLE AUTHORIZATION A1;
… User account A1 can create tables under the schema called EXAMPLE.
• …Suppose that A1 creates the two base relations EMPLOYEE and DEPARTMENT; A1
is then owner of these two relations and hence all the relation privileges on each of
them.
• Suppose that A1 wants to grant A2 the privilege to insert and delete tuples in both of these
…
relations, but A1 does not want A2 to be able to propagate these privileges to additional
accounts:
After the view is created, A1 can grant SELECT on the view A3EMPLOYEE
to A3 as follows:
GRANT SELECT ON A3EMPLOYEE TO A3 WITH GRANT OPTION;
Example(5)
Finally, suppose that A1 wants to allow A4 to update
only the SALARY attribute of EMPLOYEE;
A1 can issue:
• Subject S can read object O only if class(S) >= class(O) (Simple Security
Property)
– DDBMS –
▪ Distribution transparency
– This refers to the physical placement of data (files, relations, etc.) is not
known to the user.
▪ Network transparency
– Users do not have to worry about operational details of the network.
▪ Location transparency
– refers to freedom of issuing command from any location without
affecting its work.
Advantages DDS…
▪ Naming transparency
– Allows access to any named object (files, relations, etc.) from any
location.
▪ Replication transparency
− Allows to store copies of a data at multiple sites.
▪ Fragmentation transparency
− Allows to segment a relation horizontally (create a subset of tuples of a
relation) or vertically (create a subset of columns of a relation).
Advantages of DDS
2. Increase reliability and availability:
− Reliability refers to system live time, that is, system is running efficiently most
of the time.
− A distributed database system has multiple nodes (computers) and if one fails
then others are available to do the job.
3. Improved performance:
− DDBMS fragments the database to keep data closer to where it is needed most.
− This reduces data management (access and modification) time significantly.
4. Scalability - Easier expansion
− Allows new nodes (computers) to be added anytime without chaining the entire
configuration.
– Complexity
– Cost
Disadvantages
of – Security
DDS – Integrity control more difficult
– Lack of standards
– Lack of experience
✓ Different data center may run different DBMS products, with possibly different
underlying data models.
Object Unix Relational
Oriented Site 5 Unix
✓ Translations required to allow for: Site 1
Hierarchical
▪ Different hardware. Window
Site 4 Communications
▪ Change of codes and word lengths. network
▪ Different DBMS products. Network
▪ Mapping of data structures in one Object DBMS
Oriented Site 3 Site 2 Relational
data model to the equivalent data
Linux Linux
structures in another data model
▪ Translate the query language used (for example, a relational model SQL SELECT
statements are mapped to the network FIND and GET statements)
▪ Different hardware and different DBMS products.
▪ If both the hardware and software are different, then both these types of
translation are required. This makes the processing extremely complex.
Heterogeneous
⚫ Advantages
✓ Huge data can be stored in one Global center from different data center
✓ Remote access is done using the global schema.
✓ Different DBMSs may be used at each node
⚫ Disadvantages
✓ Difficult to mange
✓ Difficult to design.
.
Multidatabase system (MDBS)
• MDBS allows users to access and share data without requiring full database
schema integration.
❑ Differences in constraints
– Synchronize all data received from DPs (TP side) and route
retrieved data to the appropriate TPs (DP side).
▪ Vertical Fragmentation
▪ Mixed Fragmentation
Horizontal Fragmentation
• Transaction transparency
• Failure transparency
• Performance transparency
Distribution Transparency
• Distribution transparency allows the user to perceive the database as a
single, logical entity.
– Location transparency
– Remote Requests
– Remote Transactions
– Distributed Transactions
– Distributed Requests
A Remote Request
▪ Allows us to access data to be processed by a single remote database
processor.
A Remote Transaction
▪ Composed of several requests, may access data at only a single
site.
▪ Allows a transaction to reference several (local or remote) DP sites.
A Distributed Request
▪ Reference data from several remote DP sites.
▪ Allows a single request to reference a physically partitioned table.
Example2:
Distributed Request
Distributed Transactions and 2 Phase Commit
▪ Transaction transparency in a DDBMS environment ensures that all distributed
transactions maintain the distributed database’s integrity and consistency.
UNDO reverses an operation, using the log entries written by the DO portion
of the sequence.
REDO redoes an operation, using the log entries written by DO portion of the
sequence.
• Coordinator and
• Phase 1: Preparation
• The coordinator sends a PREPARE TO COMMIT message
to all subordinates.
• In this final section, we list Date’s twelve rules (or objectives) for
DDBMSs (Date, 1987b).
• Fundamental principle
3) Continuous operation
4) Location independence
Date’s Twelve Rules for a DDBMS
5) Fragmentation independence
6) Replication independence
9) Hardware independence