Lecture 02 - Database Concepts
Lecture 02 - Database Concepts
Learning Objectives
2
DATA NORMALIZATION Discuss the problems or anomalies caused by unnormalized databases and
the need for Data Normalization.
What is a Database?
4
COMM 335 1
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Potential problems:
Data Redundancy Data is shared by
Data Inconsistency various programs
Data Isolation
Byte = 8 Bits
Bit = 0 or 1
Values in one table may relate to records in other table(s) Primary Keys
Primary Key (PK) is an attribute or a collection of attributes whose value(s)
Students Table Courses Table
uniquely and concisely identify each row in a table Student# Name Address Course# Title Credits
Composite (Concatenated) Key consists of two or more attributes 1234567 John Smith West Mall Comm437 Database 3.0
9385093 Mary Cox Main Mall Comm335 MS Access 1.5
Candidate Keys are all possible primary keys in one table 2040544 Mike Luck East Mall Comm436 IS Analysis 3.0
Natural Key used in the real-life and recognized as the identifier 1:N 1:N
Grades Table
Surrogate (Synthetic) Key generated serial number when there is no suitable Student# Course# Grade
primary key available or preferred (e.g. MS Access “Autonumber”) 1234567 Comm335 88 1:N
1234567 Comm437 85 means one
2040544 Comm335 70 to many
Foreign Key (FK) is an attribute or a collection of attributes that corresponds
to the primary key(s) of a related table
Foreign Keys
COMM 335 2
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Data
Enterprise databases include: Warehouses External
Operational Databases
Databases
Internal Databases
Operational Databases
Data Warehouses
External Databases
Accessed over the Internet
Owned by other organizations,
such as SEC database, Gartner
Research database, etc.
Operational Databases To design a database, you need to have a conceptual (abstract) view of the
Databases used in the operations of the business, such as Accounting, HR, entire database
Marketing, Operations, etc. The conceptual view illustrates the different entities and relationships
Most are Relational Databases between the entities
Used for transaction processing supporting cross-functional systems Database designers use entity-relationship diagrams to visualize the
(SCM, HRM, CRM, ERP) database elements
Data Warehouse The data dictionary is a “blueprint” of the structure of the database and
Contains data collected from a variety of sources includes data elements, field types, programs that use the data element,
Data NOT used for routine business activities
outputs, and so on
Used for data analytics Business Intelligence (BI) purposes
COMM 335 3
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Data Element Description Records in Source Field Field Programs in Outputs in Authorized
MS Access Name Which Length Type Which Used Which Contained Users
Field Properties Contained
Customer ID Unique Customer Customer 10 Numeric Customer file update, Customer status No restrictions
identifier of record, number listing A/R update, sales report, A/R aging
each customer A/R record, update, credit analysis report, sales report,
sales record credit report
Customer Complete name Customer Initial 20 Alphanumeric Customer file update, Customer status No restrictions
name of customer record customer statement processing report, monthly
order statement
Address Street, city, Customer Credit 30 Alphanumeric Customer file update, Customer status No restrictions
COMM 335 4
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
COMM 335 5
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
DDL builds the data dictionary, creates the structure of the database, and DML changes database content, including data element creations, updates,
specifies record or field security constraints insertions, and deletions
Examples: DML statements are used to manipulate data within a database
CREATE statement is used to create a database or a table
Examples:
ALTER statement is used to change the relational database object, such as add a
INSERT statement is used to insert data into a table
new field to a specific table
UPDATE statement is used to update existing data within a table
DROP statement is used to delete a table or remove an existing field from a
DELETE statement is used to delete records from a database table
specific table
DDL should be restricted to authorized administrators and programmers DML is only given to those who maintain data in the database
COMM 335 6
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
DQL contains powerful, easy-to-use commands that enable users to • Identification (user name)
retrieve, sort, order, and display data What you Know • Authentication (password)
DQL statements are used to create database queries to retrieve the data
• Smart Cards (embedded microchips)
from the database What you Have • Authentication App
Example:
SELECT statement is used to retrieve data FROM the database table(s)
WHERE condition What you Are • Biometrics: fingerprints, facial and retina scans
GROUP BY (Attributes)
HAVING condition
• Biometric signature pressure patterns, voiceprint
ORDER BY (Attributes) What you Do recognition, gait/walking patterns
Database Access
27
COMM 335 7
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Outline Notation is the standard format for expressing database design A relational database is comprised of a set of tables (relations)
Relations (tables) = R A table contains rows (records or tuple) and
Attributes (columns) = A, B, C, D, … columns (fields or attributes)
R(A,B,C,D) Primary Keys (PK)
Repeating group = {X} Unique identifiers of the table
EMPLOYEE(EmployeeID, Name, Address)
R(A,B,C,{D})
Primary Key underlined = A Foreign Keys (FK)
Identifiers that enable a dependent
Foreign Key dash underlined (or italic) = B table (many-side) to refer to its
DEPARTMENT(DeptName, Location, Manager) parent table (one-side of relationship)
DEPARTMENT(DeptName, Location, Manager)
EMPLOYEE(EmployeeID, Name, Address, DeptName)
EMPLOYEE(EmployeeID, Name, Address, DeptName)
COMM 335 8
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Data Normalization
Designer begins by assuming that everything is initially stored in one large table
Rules are then followed to decompose that initial table into a set of tables that
are called third normal form (3NF) because they are free of update, insert, and
delete anomalies
Data Modeling
Designer uses knowledge of business processes and information needs to
create a diagram (entity-relationship diagram) that shows what to include in
the database
This diagram is used to create a set of relational tables that are already in 3NF
Invoice Item = Quantity (Sales-Inventory) * Unit Price (Inventory)
Following these rules allows databases to be normalized and solves the Rule 1: Customer table can have only one single value in the State column
update, insert, and delete anomalies Rule 2: In both Sales and Customer tables, primary key attributes, Sales Invoice #
and Customer # are NOT empty, ensuring entity integrity
1. Every column in a row must be single valued (atomic)
Rule 3: The foreign key Customer # in Sales table has referential integrity with the
2. The Primary Key cannot be null (empty), also known as entity integrity Customer table (customer must exist before sales)
3. If a Foreign Key is not null, it must have a value that corresponds to the Rule 4: all other nonkey attributes in Customer table are associated with the
value of a primary key in another table, also known as referential integrity Customer #, such as Customer Name, Street, City, and State
COMM 335 9
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Databases: Divide & Conquer Step 1: Identify and Build Database Tables
37 38
Quick steps to build an accounting database for Cafe Expresso: Database tables can consists of data about:
People (Customers)
1. Identify and build database tables
Things (Inventory Items)
2. Identify and enter fields in each table Transactions (Sales Orders)
Step 2: Identify and Enter Fields in each Table Step 3: Select the Primary Key for each Table
39 40
Fields for three database tables for Cafe Expresso: Primary Key is a field that uniquely identifies a record in a table:
COMM 335 10
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Step 4: Identify and Build Relationships between Tables Example: Identifying Relationships
41 42
There are three types of database table relationships: Q: What is the maximum number of orders a customer can place?
One-to-One Relationship 1:1 A: Many
One-to-Many Relationship 1:M
Many-to-Many Relationship M:N Q: What is the maximum number of customers who can place a
specific order?
A: 1
For each one record in one database table, For each one record in one database table,
there is one record in the related table there are many records in the related table
1:1 relationship
1:M relationship
COMM 335 11
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
The Sales Order Line table (associative table) eliminated the many-to-many
1.Create an associative table at the relationship
Associative
intersection of the many-to-many Table
relationship The Primary Key (Customer No) of the Customer table is the Foreign Key in
the Sales Order table, connecting the two tables
2.Create two new one-to-many
relationships to connect the Primary Key ? ? ?
associative table
3.Create a Composite Primary Key
for the associative table using 1:M 1:M
COMM 335 12
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Normalization
50
Update Anomaly:
Changing data in a row forces changes to other rows because of
duplication
Insertion Anomaly: ASSIGNMENT table has repeating rows about the project
Not able to add a row due to missing part(s) of the composite primary key
Deletion Anomaly: UPDATE problem: changing the project contact will require updating
multiple rows, this can also lead to inconsistent data
Deleting rows may cause a loss of data that would be needed for other
future rows
COMM 335 13
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
ASSIGNMENT table has a composite primary key (emp, project_num) ASSIGNMENT table contains the information about which employee is
assigned to which project
INSERTION problem: since primary key fields cannot be null, we cannot add
a new project (project_num) without having employee (emp) assigned to it DELETION problem: if we remove the row once an employee has finished
the assignment (e.g., Employee 1001 working on Smith&Co project), we will
lose all information about this particular project
Database Anomalies are operational problems caused by poor database General rule of thumb: one table should NOT pertain to more than one
design entity type
Data redundancy occurs when you replicate the same field in multiple Solution: Decompose the ASSIGNMENT table and add a second table to
tables, other than to set up foreign keys record information about PROJECTS
Database includes functional dependencies whose determinants are not
candidate keys, such as partial dependencies and transitive dependencies
EMPLOYEE ASSIGNMENT PROJECT
This results in data duplication and an unnecessary dependency between
the entities
possible inconsistent data when repeated information is updated
problems inserting new records because part of the primary key may be empty
accidental loss of information as a by-product of a deletion
COMM 335 14
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
The value of one attribute (the determinant) determines the value For any relation R, attribute B is functionally dependent on attribute A
of another attribute (the dependent) if, for every valid instance of A, that value of A is uniquely determines
the value of B
If I know the value for this attribute(s), I can uniquely tell you the value of
some other attribute(s) R(A,B)
If I know the value of an employee’s ID number, I can tell you their last name A B (B is determined by A = A determines B)
with certainty A: determinant
COMM 335 15
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
If we know the values of the key fields of a table, we can find a unique row Movie relation deals with movies and directors
in the table MOVIE(Title, Year, Genre, Genre-description, Director, Director-home-country)
Once we have that row, then we know the value of all other non-key fields
Identify Functional Dependencies
in that row
Title Year, Genre, Genre-description, Director, Director-home-country
empID last_name, first_name
Genre Genre-description
The key fields functionally determine all other non-key fields in the table Director Director-home-country
Candidate Key is a unique identifier
Identify the Candidate Key(s)
One of the Candidate Keys will become the Primary Key
Title
Primary Key has no subset of the fields that is also a key
Partial Dependency is a functional dependency within a table whose Transitive Dependency is a functional dependency between attributes
determinant is part of the primary key but NOT all of it within the same table whose determinant is NOT the primary key or part of
the primary key
If Members are allowed multiple Activities
If Members are allowed only one Activity
ACTIVITY(MemberID, Activity, Fee) MemberID Activity Fee
ACTIVITY(MemberID, Activity, Fee) MemberID Activity Fee
Activity Fee (partial dependency) 121 Ceramics $145 121 Ceramics $145
MemberID Activity Fee
Only part of the composite key (MemberID, Activity) 121 Swimming $200 121 Swimming $200
Activity Free (transitive dependency)
202 Zumba $150 202 Zumba $150
Activity is NOT the primary key (MemberID)
202 Cooking $125 202 Cooking $125
NOTE: Partial Dependency can occur only in tables 199 Zumba $150 Solution: Split data into two tables 199 Zumba $150
with Composite Primary Keys 175 Poetry $100 ACTIVITY(Activity, Fee) 175 Poetry $100
215 Tennis $200 MEMBER(MemberID, Activity) 215 Tennis $200
215 Sculpture $150 215 Sculpture $150
COMM 335 16
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
A table is NOT in First Normal Form if it is keeping multiple values for a piece PLANT uses are multivalued
of information UNNORMALIZED
A table is in Second Normal Form if it is in 1NF AND we need ALL the fields in ASSIGNMENT
the key to determine the values of the nonkey fields emp and project_num are
composite primary key
1NF PLUS every nonkey attribute is fully functionally dependent on the
ENTIRE primary key
Every nonkey attribute must be defined by the entire primary key,
project_name and contact
not by only part of the primary key depend ONLY on project_num
ASSIGNMENT PROJECT
No partial dependencies
Solution: Remove those nonkey fields that are not dependent on the whole
of the primary key. Create another table with these fields and the part of
the primary key on which they do depend.
COMM 335 17
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
A table is in Third Normal Form if it is in 2NF AND no nonkey fields depend EMPLOYEE
Source: William Kent, (1983), A Simple Guide to Five Normal Forms in Relational Database Theory,
Communications of the ACM 26(2), Feb. 1983, 120-125. Source: Prof Cavusoglu
COMM 335 18
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
• The existence of sub-table (i.e. EMPLOYEES) within the larger PROJECT table.
• Each record in PROJECT table contains a subsidiary table indicating EMPLOYEES • All attributes are atomic (single value)
who are assigned to that project – but some rows are missing project data! • Selected composite primary key as (Project_Num, Emp_Num)
COMM 335 19
©2021 COMM335 Teaching Team. This content is protected and may not be shared, uploaded, or distributed.
Linked to JOB Update anomalies can generate conflicting and obsolete database values
via Foreign Key
Insertion anomalies can result in unrecorded transactions and incomplete
audit trails
Deletion anomalies can cause the loss of accounting records and the
destruction of audit trails
This lecture presentation and the slides that accompany it are the
exclusive copyright of COMM 335 teaching team and may only be
used by students enrolled in COMM 335 course at the University of
British Columbia, Sauder School of Business. Unauthorized or
commercial use of these lectures, including uploading to sites off of
the University of British Columbia servers, is expressly prohibited!
COMM 335 20