Lecture 12
Lecture 12
Outline
Relational decomposition Normal forms Begin relational algebra
SSN
Name, City,
PhoneNumber
Anomalies:
Redundancy = repeat data Update anomalies = Fred moves to Bellvue Deletion anomalies = Fred drops all phone numbers: what is his city ?
Relation Decomposition
Break the relation into two:
Name Fred Joe SSN 123-45-6789 987-65-4321 City Seattle Westfield
Conceptual Model:
price
Product
buys name
Person ssn
Decompositions in General
R(A1, ..., An) Create two relations R1(B1, ..., Bm) and R2(C1, ..., Cp) such that: B1, ..., Bm C1, ..., Cp = A1, ..., An and: R1 = projection of R on B1, ..., Bm R2 = projection of R on C1, ..., Cp
Incorrect Decomposition
Sometimes it is incorrect:
Name Gizmo OneClick DoubleClick Price 19.99 24.99 29.99 Category Gadget Camera Camera
Incorrect Decomposition
Name Gizmo
Normal Forms
First Normal Form = all attributes are atomic Second Normal Form (2NF) = old and obsolete Third Normal Form (3NF) = this lecture Boyce Codd Normal Form (BCNF) = this lecture Others...
Whenever there is a nontrivial dependency A1, ..., An in R , {A1, ..., An} is a key for R
In English (though a bit vague): Whenever a set of attributes of R is determining another attribute, should determine all the attributes of R.
Example
Name Fred Fred Joe Joe SSN 123-45-6789 123-45-6789 987-65-4321 987-65-4321 PhoneNumber 206-555-1234 206-555-6543 908-555-2121 908-555-1234 City Seattle Seattle Westfield Westfield
What are the dependencies? SSN Name, City What are the keys? {SSN, PhoneNumber} Is it in BCNF?
SSN
Name, City
R1
R2
Example Decomposition
Person(name, SSN, age, hairColor, phoneNumber) SSN name, age age hairColor Decompose in BCNF (in class): Step 1: find all keys
Other Example
R(A,B,C,D) A B, B C
Correct Decompositions
A decomposition is lossless if we can recover: R(A,B,C)
Decompose
R1(A,B)
R2(A,C)
Recover
Correct Decompositions
Given R(A,B,C) s.t. A B, the decomposition into R1(A,B), R2(A,C) is lossless
No problem so far. All local FDs are satisfied. Lets put all the data back into a single table again: Unit Galaga99 Bingo Company UW UW Product databases databases
Relational Algebra
Formalism for creating new relations from existing ones Its place in the big picture:
Declartive query language SQL, relational calculus
Implementation
Relational Algebra
Five operators:
Union: Difference: Selection: W Projection: 4 Cartesian Product: v
R1 R2 Example:
AllEmployees -- RetiredEmployees
3. Selection
Returns all tuples which satisfy a condition Notation: Wc(R) Examples
WSalary > 40000 (Employee) Wname = Smithh (Employee)
DepartmentID 1 1 2
Find all employees with salary more than $40,000. WSalary > 40000 (Employee)
DepartmentID 2
Salary 45,000
4. Projection
Eliminates columns, then removes duplicates Notation: 4A1,,An (R) Example: project social-security number and names:
4 SSN, Name (Employee) Output schema: Answer(SSN, Name)
DepartmentID 1 1 2
5. Cartesian Product
Each tuple in R1 with each tuple in R2 Notation: R1 v R2 Example:
Employee v Dependents
Cartesian Product Example Employee Name John Tony Dependents EmployeeSSN 999999999 777777777
Employee x Dependents Name SSN EmployeeSSN John 999999999 999999999 John 999999999 777777777 Tony 777777777 999999999 Tony 777777777 777777777
Relational Algebra
Five operators:
Union: Difference: Selection: W Projection: 4 Cartesian Product: v
Renaming
Changes the schema, not the instance Notation: V B1,,Bn (R) Example:
VLastName, SocSocNo (Employee) Output schema: Answer(LastName, SocSocNo)
Renaming Example
Employee Name John Tony SSN 999999999 777777777
Natural Join
Notation: R1 Meaning: R1 Where:
The selection WC checks equality of all common attributes The projection eliminates the duplicate common attributes
R2 R2 = 4A(WC(R1 v R2))
Natural Join Example Employee Name John Tony Dependents SSN 999999999 777777777 SSN 999999999 777777777
Employee Dependents = 4Name, SSN, Dname(W SSN=SSN2(Employee x VSSN2, Dname(Dependents)) Name John Tony SSN Dname 999999999 Emily 777777777 Joe
Natural Join
R=
A X X Y Z B Y Z Z V
S=
B Z V Z
C U W V
S=
A X X Y Y Z
B Z Z Z Z V
C U V U V W
Natural Join
Given the schemas R(A, B, C, D), S(A, C, E), what is the schema of R S ? Given R(A, B, C), S(D, E), what is R Given R(A, B), S(A, B), what is R S ? S ?
Theta Join
A join that involves a predicate R1 U R2 = W U (R1 v R2) Here Ucan be any condition
Eq-join
A theta join where Uis an equality R1 A=B R2 = WA=B (R1 v R2) Example:
Employee
SSN=SSN
Dependents
Semijoin
R S = 4 A1,,An (R S) Where A1, , An are the attributes in R Example:
Employee Dependents
network
Employee
R = Employee T
ssn=ssn
(Wage>71 (Dependents))
T = 4 SSN Wage>71 (Dependents) Answer = R Dependents
Complex RA Expressions
4 name
buyer-ssn=ssn pid=pid
seller-ssn=ssn
4 ssn Wname=fred
Person Purchase Person
4 pid Wname=gizmo
Product
Operations on Bags
A bag = a set with repeated elements All operations need to be defined carefully on bags {a,b,b,c}{a,b,b,b,e,f,f}={a,a,b,b,b,b,b,c,e,f,f} {a,b,b,b,c,c} {b,c,c,c,d} = {a,b,b,d} WC(R): preserve the number of occurrences 4A(R): no duplicate elimination Cartesian product, join: no duplicate elimination Important ! Relational Engines work on bags, not sets !
Reading assignment: 5.3 5.4
Find all direct and indirect relatives of Fred Cannot express in RA !!! Need to write C program