DBMS - Unit 2
DBMS - Unit 2
Unit -2
Intro-to-XP.ppt
U2.1
Relational Model
• Main idea:
Table: relation
Column header: attribute
Row: tuple
• Relational schema: name(attributes)
Example: employee(ssno,name,salary)
• Attributes:
Each attribute has a domain – domain constraint
Each attribute is atomic: we cannot refer to or
directly see a subpart of the value.
Fall 2005
Components of Relational Model
• There are three components:
1. A set of domains and a set of relations
2. Integrity rules
3. Operations on relations
• Characteristics of Relations
1. Ordering of Tuple in Relation
2. Ordering of values within a Tuple
3. Values in Tuples : Atomic and NULL values
4. Interpretation of a Relation
Relation Example
Account Customer
AccountId CustomerId Balance Id Name Addr
150 20 11,000 20 Tom Irvine
160 23 2,300 23 Jane LA
180 23 32,000 32 Jack Riverside
• Database schema consists of
– a set of relation schema
– Account(AccountId, CustomerId, Balance)
– Customer(Id, Name, Addr)
– a set of constraints over the relation schema
– AccountId, CustomerId must be integers
– Name and Addr must be string of characters
– CustomerId in Account must be from Ids in Customer
NULL value
• Key:
Minimal superkey (no proper subset is a superkey)
If more than one key: choose one as a primary key
• Example:
Key 1: LogID (primary key)
Key 2: AccountId, Xact#
Superkeys: all supersets of the keys
Log(LogId, AccountId, Xact#, Time, Ammount)
LogID AccountID Xact# Time Amount
1001 111 4 1/12/02 $100 OK
1002 122 4 12/28/01 $20
1003 333 6 9/1/00 $60
Integrity Rules
There are two Integrity Rules that every relation should
follow :
1. Entity Integrity (Rule 1)
2. Referential Integrity (Rule 2)
r
s
S R
Examples of Referential Integrity
Account Customer
AccountId CustomerId Balance Id Name Addr
150 20 11,000 20 Tom Irvine
160 23 2,300 23 Jane LA
180 23 32,000 32 Jack Riverside
Account.customerId to Customer.Id
Student Dept
Id Name Dept Name chair
1111 Mike ICS ICS Tom
2222 Harry CE CE Jane
3333 Ford ICS MATH Jack
• INTERSECTION of R and S
the intersection of R and S is a relation that includes all tuples
that are both in R and S.
• DIFFERENCE of R and S
the difference of R and S is the relation that contains all the
tuples that are in R but that are not in S.
Union , Intersection , Difference -
For example, find all employees born after 1st Jan 1950:
SELECT dob > ’01/JAN/1950’ (employee)
or
σ dob > ’01/JAN/1950’ (employee)
OR
πename, salary(employee)
Projection
Emp Contact
E.name Dept C.Name Addr
Jack Physics Jack Irvine
Jack Physics Tom LA
Jack Physics Mary Riverside
Tom ICS Jack Irvine
Tom ICS Tom LA
Tom ICS Mary Riverside
JOIN Example
JOIN Operator
R⋈S
Join Condition
Join
R S = s c (R S)
C
• Join condition C is of the form:
<cond_1> AND <cond_2> AND … AND <cond_k>
Each cond_i is of the form A op B, where:
– A is an attribute of R, B is an attribute of S
– op is a comparison operator: =, <, >, , , or .
• Different types:
– Theta-join
– Equi-join
– Natural join
Theta-Join
R S
R.A>S.C
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
RS Result
R.A R.B S.C S.D
3 4 2 7 R.A R.B S.C S.D
3 4 6 8 3 4 2 7
5 7 2 7 5 7 2 7
5 7 6 8
Theta-Join
R S
R.A>S.C, R.B S.D
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
RS Result
R.A R.B S.C S.D R.A R.B S.C S.D
3 4 2 7 3 4 2 7
3 4 6 8
5 7 2 7
5 7 6 8
Equi-Join
R(A,B) S(C,D)
A B C D
3 4 2 7
5 7 6 8
R S
R.B=S.D
RS Result
R.A R.B S.C S.D R.A R.B S.C S.D
3 4 2 7 5 7 2 7
3 4 6 8
5 7 2 7
5 7 6 8
Natural-Join
R S
Name Dept Name Addr
Jack Physics Jack Irvine
Tom ICS Mike LA
Mary Riverside
• Outer join: natural join, but use NULL values to fill in dangling tuples.
• Three types: “left”, “right”, or “full”
Left Outer Join
Name Dept Name Addr
R Jack Physics Jack Irvine S
Mike LA
Tom ICS
Mary Riverside
Pad null values for both left and right dangling tuples.
OUTER JOIN Example 1
R LEFT OUTER JOIN R.ColA = S.SColA S
R ColA ColB
A 1 A 1 A 1
B 2 D 3 D 3
D 3 E 5 E 4
F 4 B 2 - -
E 5 F 4 - -
balance ( custssnssn
(account (
name tom
customer )))
balance
Tree representation custssnssn
account s name=tom
customer
Example 1(cont)
balance
ssn=custssn
account s name=tom
customer
Division Operator
Not supported as a primitive operator, but useful for expressing
queries like:
“Find sailors who have reserved all boats.”
Let A have 2 fields, x and y; B have only field y:
A/B contains all x tuples (sailors) such that for every y tuple
(boat) in B, there is an xy tuple in A Or : If the set of y values
(boats) associated with an x value (sailor) in A contains all y
values in B, the x value is in A/B.
• In general, x and y can be any lists of fields; y is the list of
fields in B , and xy is the list of fields of A
Division Example
Example 1
STUDENT(Sno, Name, Major, Ddate)
COURSE(Cno, cname, Dept)
ENROLL(Sno, Cno, quarter, BISBN)
TEXT(BISBN, title, Publisher, Author)
1. List the names of courses taken by at least one
student with quarter =‘w99’
2. List any department that has all its books
published by ‘BPB’
Example 1 cont…
SQL Solution:
SELECT dept FROM Course
WHERE cno IN (
(SELECT E.cno FROM Enroll E, Text T
WHERE E.BISBN = T.BISBN
AND T.Publisher = ‘BPB’)
MINUS
(SELECT E.cno FROM Enroll E, Text T
WHERE E.BISBN = T.BISBN
AND T.Publisher <>‘BPB’)
)
Example 2
EMPLOYEE(eno, name, age, dno, salary)
WORK_ON(pno, eno)
PROJECT(pno, pname, location)
1. Display the names of projects at “Delhi”
2. Find the project name of employee whose salary
is greater than 10000.
3. Retrieve the name and eno of employee working
on pno =100.
Example 3
SUPPLIER(sid, sname, saddr)
PARTS(pid, pname, color)
CATALOG(sid, pid, cost)
1. Find the name of all suppliers who supply yellow parts.
2. Find the name of suppliers who supply both blue and black
parts.
3. Find supplier ids who supply all parts.
4. Find supplier ids who do not supply red parts.
Example 3 cont..
SQL Solution (2):
SELECT sname FROM Supplier S, Catalog C
WHERE S. sid = C.sid AND C.sid =
(SELECT C.sid FROM Catalog C, Part P WHERE
C.pid=P.pid and P.color= ‘Blue’)
INTERSECT
SELECT sname FROM Supplier S, Catalog C
WHERE S. sid = C.sid AND C.sid =
(SELECT C.sid FROM Catalog C, Part P WHERE
C.pid=P.pid and P.color= ‘Black’)
Example 3 cont..
SQL Solution (3):
SELECT sid FROM Catalog GROUP BY sid
HAVING count(*) = (SELECT count(pid) FROM Parts)
52
Functional Dependencies
• Motivation: avoid redundancy in database design.
Relation R(A1,...,An,B1,...,Bm,C1,...,Cl)
Definition: A1,...,An functionally determine B1,...,Bm,i.e.,
(A1,...,An B1,...,Bm)
iff for any two tuples r1 and r2 in R,
r1(A1,...,An ) = r2(A1,...,An )
implies r1(B1,...,Bm) = r2(B1,...,Bm)
ICS184 53
Example
55
Trivial Dependencies
• Reflexivity:
– If Y is a subset of X, then X Y.
– Example: ABA, ABCAB, etc.
• Augmentation:
– If XY, then XZYZ.
– Example: If AB, then ACBC.
• Transitivity:
– If XY, and YZ, then XZ.
– Example: If ABC, and CD, then ABD.
58
More Rules Derived from AAs
• Projectivity
If XYZ, then XY and XZ
• Pseudo-Transitivity Rule:
If XY, WYZ, then WXZ
Algo to find closure
To find the closure X+ of X under FDs in F
X+ = X (initialize X+ with X)
Change = true
While change do
Begin
Change = false
For each FD W Z in F do
Begin
If W C X+ then
X+ = X+ U Z
Change= true
End if
End
End
“Superkey”
• Given a set F of FDs for a relation, how to find the candidate keys?
• One naïve approach: consider each subset X of the relation attribute, and
compute X+ to see if it includes every attribute.
• Tricks:
If an attribute A does not appear in any RHS in FD, A must be in every
candidate key
As a consequence, if A must be in every candidate key, and A B is true, then
B should not be in any candidate key.
• Example:
R(A,B,C,D,E,F,G,H)
{A B, ACD E, EF GH}
Candidate key: {ACDF}
Equivalent FD Sets
• Two sets of FDs F and G are equivalent if F+ = G+,That is:
Each FD in F can be implied by G; and
Each FD in G can be implied by F
• Example:
F = {AB, BC, ABC}
G = {AB, BC} F and G are equivalent.
• F is minimal if the following is true. If any of the following operation is done,
the resulting FD set will not be equivalent to F
Any FD is eliminated from F; or
Any attribute is eliminated from the left side of an FD in F; or
Any attribute is eliminated from the right side of an FD in F.
E.g.: G (above) is a minimal set of FDs of F.
63
Examples : Minimizing FDs
• Example 1:
F = {A B, B C, A C}
Minimal: F’ = {A B, B C}
Remove redundant FD
• Example 2:
F = {A B, B C, AC D}
Minimal: F’ = {A B, B C, A D}
Remove attributes from LHS
• Example 3:
F = {A B, B C, A CD}
Minimal: F’ = {A B, B C, A D} Remove attributes from RHS
64
The Normalization Process
• In relational databases the term normalization refers to a reversible
step-by-step process in which a given set of relations is
decomposed into a set of smaller relations that have a
progressively simpler and more regular structure.
Functional
dependency
No transitive
of nonkey
dependency
attributes on
between
the primary
nonkey Boyce- key - Atomic
attributes
Codd and values only
All
Higher Full
determinants Functional
are candidate dependency
keys - Single of nonkey
multivalued attributes on
dependency the primary
key
Relationship of Normal Forms
Normal Forms
1NF 2 NF 3 NF BCNF 4 NF 5 NF
Unnormalized Relations
Gallstone
s removal;
Jan 1, 15 New St. Beth Little Kidney
145 1995; June New York, Michael stones Penicillin, rash
1111 311 12, 1995 John White NY Diamond removal none- none
Eye
Charles Cataract
Apr 5, Field removal
243 1994 May 10 Main St. Patricia Thrombos Tetracyclin Fever
1234 467 10, 1995 Mary Jones Rye, NY Gold is removal e none none
Dogwood
Lane Open
Jan 8, Harrison, David Heart Cephalosp
2345 189 1996 Charles Brown NY Rosen Surgery orin none
55 Boston
Post Road,
Nov 5, Chester, Cholecyst
4876 145 1995 Hal Kane CN Beth Little ectomy Demicillin none
Blind Brook Gallstone
May 10, Mamaronec s
5123 145 1995 Paul Kosher k, NY Beth Little Removal none none
Eye
Cornea
Replacem
Apr 5, Hilton Road ent Eye
1994 Dec Larchmont, Charles cataract Tetracyclin
6845 243 15, 1984 Ann Hood NY Field removal e Fever
First Normal Form
• To move to First Normal Form a relation must
contain only atomic values at each row and column.
No repeating groups
Relation in 1NF contains only atomic
values.
First Normal Form
• Three Formal definitions of First Normal Form
Emp-ID Emp-Hrly-Rate
C
Third Normal Form
• A relation is in 3NF iff it is in 2NF and every non key attribute is non
transitively dependent on the primary key.
• A relation r(R) is in Third Normal Form (3NF) if and only if the following
conditions are satisfied simultaneously:
r(R) is already in 2NF.
No nonprime attribute is transitively dependent on the key.
A* A*
B B
Convert to
C
B*
* indicates the key or the C
determinant of the relation.
Third Normal Form
• The Third Normal Form helped us to get rid of the data anomalies
caused either by
transitive dependencies on the PK or
by dependencies of a nonprime attribute on another nonprime attribute.
• 3NF
There is always a dependency-preserving, lossless-join
decomposition into a collection of 3NF relation schemas.
Properties of a good Decomposition
R = { A, B, C, D }
F = { A B, C D }.
Key is {AC}.
introduce
Decomposition: { (A, B), (C, D), (A, C) } virtually
Consider it a two step decomposition:
1. Decompose R into R1 = (A, B), R2 = (A, C, D)
2. Decompose R2 into R3 = (C, D), R4 = (A, C)
This is a lossless join decomposition.
A B C D E A B C D E
R1 α α α R1 α α α α α
R2 α α R2 α α α α
R3 α α α R3 α α α
Now update the table while placing α and check whether any row contains α in all the columns. If so,
then the decomposition is lossless.
In this example the second table contains all α’s in the first row. So the decomposition is lossless.
Fourth Normal Form
A relation R is in 4NF if and only if it satisfies following
conditions:
• If R is already in 3NF or in BCNF.
• If it contains no multi valued dependencies.
The redundancy has been eliminated, but the information about which companies
make which products and which of these products they supply to which agents has
been lost. The natural join of these two projections will result in some spurious tuples
(additional tuples which were not present in the original relation).
Fifth Normal Form
This table can be decomposed into its three projections
without loss of information as demonstrated below .