Database Systems
Database Systems
SYSTEM STRUCTURE
Physical Database
MAIN COMPONENTS OF DBMS
• Database manager
• Query processor
• DML precompiler
• DDL compiler
• File manager
• Transaction manager
DATABASE USERS
• Application programmers
• Casual users
• Naive users
• Database administrator
DATABASE ADMINISTRATOR
Declarative Languages
DATA ABSTRACTION
Conceptual Schema
Internal Schema
DATA INDEPENDENCE
• Data structures
• Operators
• Integrity constraints
DATA MODELS
• Relational
• Hierarchical
• Network
• Object-oriented
• Object-relational
RELATIONAL DATABASES
RELATIONS
• SCH = {A1, A2, …, An} – set of attributes
• DOM (A1) – domain of A1
• Relation R(A1, A2, …, An) – subset of
DOM (A1) DOM (A2) … DOM (An)
• SCH = {A1, A2, …, An} – schema of
R(A1, A2, …, An)
RELATIONS
C# Cname Ctown
C1 Evans Glasgow
C2 Hayes Leeds
C3 Jackson Brussel
C4 Mills Rome
C5 Robson London
C6 Smith Paris
CONTRACTS
P# E# C# Salary P# E# C# Salary
P1 E1 C1 1 000 P2 E6 C3 3 000
P1 E1 C2 4 000 P3 E1 C4 9 000
P1 E2 C3 3 000 P3 E2 C4 6 000
P1 E3 C2 8 000 P3 E3 C4 5 000
P2 E1 C1 2 000 P3 E4 C4 4 000
P2 E2 C1 7 000 P4 E1 C1 4 000
P2 E3 C1 2 000 P4 E3 C2 3 000
P2 E4 C1 5 000 P4 E4 C4 7 000
P2 E5 C3 5 000 P4 E5 C2 6 000
RELATIONAL ALGEBRA
• Union R S = {t: t R t S}
• Intersection R S = {t: t R t S}
• Difference R - S = {t: t R t S}
• Product
R(X) S(Y) =
{t: r R s S t(X) = r(X) t(Y) = r(Y)}
• Rename R(X) = R(X) (S(Y))
RELATIONAL ALGEBRA
A datalog rule:
R(t): R1(t1), R2(t2), …, Rn(tn)
where R – header of the rule
R1(t1), R2(t2), …, Rn(tn) - predicates
UNION ST
R(X, Y) S(X, Y) R(X, Y) T(X, Y)
INTERSECTION ST
R(X, Y) S(X, Y), T(X, Y)
DIFFERENCE S-T
R(X, Y) S(X, Y), T(X, Y)
ALGEBRAIC OPERATIONS IN
DATALOG
PROJECTION X (S)
R(X) S(X, _)
PRODUCT S T
R(X, Y) S(X), T(Y)
JOIN S * T
R(X, Y, Z) S(X, Y), R(Y, Z)
EXAMPLES
PAR(C, P)
C – child, P - parent
GRANDPARENT:
GRPAR (X, Y) PAR (X, Z), PAR (Z, Y)
In algebra
GRPAR (X, Y) = R(X, Z) (PAR) * S(Z, Y) (PAR)
RECURSIVE RELATIONSHIPS
FOREFATHER
FOREF (X, Y) PAR (X, Y)
FOREF (X, Y) PAR (X, Z), FOREF (Z, Y)
FIXPOINT OPERATOR
FP(FOREF = R(X, Y) ( C, Y (PAR * P=X FOREF))
PAR(X, Y)(PAR))
EXAMPLE
b c
d e
a
X Y
b c
a b
d e a c
b d
b c
c d
c e
d e
a X Y Ways
b c
a b
a c ac, abc
d e b d bd, bcd
b c
II step
c d
New connections: c e ce, cde,
abc, abd, acd, ace, bcd, bce, d e
bde, cde a d abd, acd
New tuples: <a,d>, <a, e>, a e ace
<b, e> b e bde, bce
SQL
SQL
Integrity Conditions:
Primary Key (list of attributes)
Check(Predicate)
CREATE TABLE
SELECT Ename
FROM EMP
WHERE E# IN
(SELECT E#
FROM CON
WHERE P# = ‘P1’);
SUBQUERIES
Names of employees working at P1
(another solution)
SELECT Ename
FROM EMP
WHERE ‘P1’ IN
(SELECT P#
FROM CON
WHERE E# = EMP.E#);
ANY and ALL
Numbers of projects that have greater budgets than
at least one project realized in London
SELECT P#
FROM PROJ
WHERE BUDGET > ANY
(SELECT BUDGET
FROM PROJ
WHERE Ptown = ‘London’);
ANY and ALL
Numbers of projects that have greater budgets than
all projects realized in London
SELECT P#
FROM PROJ
WHERE BUDGET > ALL
(SELECT BUDGET
FROM PROJ
WHERE Ptown = ‘London’);
EXISTS
Numbers of employees who do not work at any
project
SELECT E#
FROM EMP
WHERE NOT EXISTS
(SELECT *
FROM CON
WHERE E# = EMP.E#);
SYNONIMS
Names of projects that have budgets greater than P1
SELECT Y.Pname
FROM PROJ X, PROJ Y
WHERE X.P# = ‘P1’AND X.BUDGET<Y.BUDGET;
AGGREGATE FUNCTIONS
• COUNT
• SUM
• AVG
• MAX
• MIN
AGGREGATE FUNCTIONS
Number of projects
SELECT COUNT(*)
FROM PROJ;
SELECT E#
FROM CON
GROUP BY E#
HAVING COUNT (*) > 1;
VIEWS
CREATE VIEW PROJ_LON AS
SELECT *
FROM PROJ
WHERE Ptown = ‘London’;
SELECT P#
FROM PROJ_LON
WHERE Budget > 50000;
INSERT, UPDATE, DELETE
INSERT INTO EMP VALUES (‘E7’, ‘Howard’,
‘Liege’, ‘Lawyer’);
DELETE
FROM PROJ
WHERE Ptown = ‘London’;
UPDATE PROJ
SET Budget = Budget * 1,1;
GRANT and REVOKE
GRANT <priviledge list> on <element of a database>
to <user list>
S=0;
While NOT END_OF_CURSOR
{EXEC SQL FETCH FROM C INTO :PR_BUD;
S = S + PR_BUD;
}
ENTITY-RELATIONSHIP
MODEL
ENTITY SET
PROJECTS
(a,b) (c,d)
A R B
ab cd
MANY TO MANY
A1 … Am X B1 … Bn
(*,*) (*,*)
A R B
A1 … Am B1 … Bn
(*,*) (1,1)
A R B
A1 … Am (0, *) C1 … Cp
(0, *) (0, *)
A R C
E1 … En
RA(*,*)
E R X
RB(*,*)
E1 … En
RA(0,1)
E R
RB(*,*)
ISA
B1 … Bn C1 … Cp
B C
• Y X X Y (reflexivity)
• X Y Z W XW YZ (augmentation)
• X Y Y Z X Z (transitivity)
ARMSTONG RULES
• X Y X Z X YZ (union)
• X Y WY Z XW Z (pseudotransitivity)
• X Y Z Y X Z (decomposition)
MULTI-VALUED DEPENDENCY
Given a relation R(X, Y, Z). The multivalued
dependency X Y holds in R if and only if the set
of Y-values depends only on the X-value and is
independent on the Z-value.
For all pairs of tuples t1 and t2 such that t1 (X) = t2 (X)
there exist tuples t3 and t4 such that:
t1 (X) = t2 (X) = t3 (X) = t4 (X),
t3 (Y) = t1 (Y), t3 (Z) = t2 (Z),
t4 (Y) = t2 (Y), t4 (Z) = t1 (Z)
MULTI-VALUED DEPENDENCY
c1 d2 e4
c1 d2 e5
a2 b2
c2 d3 e5
c4 d4 e3
a1 b3 c4 d4 e4
FIRST NORMAL FORM
A B C D E
a1 b1 c1 d2 e3
a1 b1 c1 d2 e4
a2 b2 c1 d2 e4
a2 b2 c1 d2 e5
a2 b2 c2 d3 e5
a1 b3 c4 d4 e3
a1 b3 c4 d4 e4
FIRST NORMAL FORM
• Fixed-length records
• Variable-length records
VARIABLE-LENGTH RECORDS
Reserved-space method
VARIABLE-LENGTH RECORDS
Constant area
Pointer method
ORGANIZATION OF RECORDS
INTO BLOCKS
block 1
block 2
Unspanned records
ORGANIZATION OF RECORDS
INTO BLOCKS
block 1
block 2
Spanned records
FILES
• Unordered files
• Hash files
• Ordered files
HASH FILES
0
1
Hash table
INDICES
• Primary Index
• Clustering Index
• Secondary Index
INDEX-SEQUENTIAL FILES
A A
D D
F
F
H
A H
K
H K
P M M
P
P S
S W
W
Index File
CLUSTERING INDEX
A
A
A
A
A
F
A F
F K
K
P K
W K
K
P
P
P
W
W
SECONDARY INDEX
A F
D P
F H
H A
A
K K W
S M S
P D
S M
W K
Index File
7
B-tree P2 11 P1 9
15
B2 7 20
11
29
P4 48 P3 32
B1 52
58
7 B3 29
29 48 P5 63
65
73 63
73
P7 82 P6 75
B4 73 84
82
90 90
Index File P8 93
Insertion 34, 22, 6, 43 6 7 9
6 B2 P1
B1 11 11 15
6 20 P2
29 20 22
P9
B3
29 29 32
B7 34 P3
6 34 43
48 P10
B5 48 52 58
48 P4
B6 63 63 65
48 P5
73 73 75
P6
B4
73 82 84
82 P7
90 P8 90 93
Deletion 52, 73, 65 6
P2 11 P1 7
15 9
B2 6
11 P9 20
20 22
P3 29
B1 32
6 B3 29 34
29 34
48 P10 43
P4
48
P6 75 58
B5 48 82 63
75 84
90 90
P8 93
7
B+-tree P2 11 P1 9
15
B2 20
11
29
P4 48 P3 32
B1 52
58
B3
29 48 P5 63
65
73 63
73
P7 82 P6 75
B4 84
82
90 90
Index File P8 93
QUERY PROCESSING
BASIC STEPS
• R S= S R, (R S) T = R (S T),
• R S= S R, (R S) T = R (S T)
• R * S = S * T, (R * S) * T = R * (S * T)
• R * (S T) = (R * S) (R * T)
F (R * S) = F (R) * F (S)
F (R - S) = F (R) - F (S)
F (R S) = F (R) F (S)
F (R S) = F (R) F (S)
EQUIVALENCE RULES
F G (R) = F ( G (R)) = G (F (R)) =
F (R) G (R)
F G (R) = F (R) G (R)
F (R S) = R *F S
• R *F S = S * F R
F (R *G S) = R *F G S
• (R *F S) * G H T = R * F G (S *H T)
F G (R *H S) = F (R) * H G (S)
EQUIVALENCE RULES
Ename=‘London’
PROJ *
EMP CON
TRANSFORMATION OF
RELATIONAL EXPRESSIONS
E# (Ptown = ‘London’ (PROJ )* (EMP * CON))
E#
Ename=‘London’ *
• Linear search
C = b(R)
with equality condition: Caverage = b(R)/2, Cmax = b(R)
• Binary search
C = log2 (b(R)) + card(R)*sel(R[A])/f(R) - 1
with equality condition on a key attribute
C = log2 (b(R))
COSTS OF SEARCHING
• Clustering index
C = L(I) + card(R)*sel(R[A])/f(R)
• Secondary index
equality C = L(I) + card(R)*sel(R[A])
EXAMPLE
Ptown = ‘London’ (PROJ )
card(PROJ) = 1000, f(R) = 5, val(R[Ptown]) = 20
b(PROJ) = 1000/5 = 200
• Atomicity
• Consistency
• Isolation
• Durability
TRANSACTION STATE
• Active
• Partially committed
• Failed
• Aborted
• Committed
SERIALIZABILITY
S X
S Yes No
X No No
TWO-PHASE LOCKING
PROTOCOL – 2PL