100% found this document useful (1 vote)
320 views

Chapter 9. Database Design

This chapter discusses database design, which involves identifying data items to be stored in the database and determining how to store them. The chapter covers logical database design using entity-relationship-attribute (ERA) diagrams and normalization to reduce data redundancy and anomalies. Physical database design includes indexing, denormalization, and partitioning tables. The chapter provides an illustration of designing a database for a proposed appointment scheduling system.

Uploaded by

Asma Ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
100% found this document useful (1 vote)
320 views

Chapter 9. Database Design

This chapter discusses database design, which involves identifying data items to be stored in the database and determining how to store them. The chapter covers logical database design using entity-relationship-attribute (ERA) diagrams and normalization to reduce data redundancy and anomalies. Physical database design includes indexing, denormalization, and partitioning tables. The chapter provides an illustration of designing a database for a proposed appointment scheduling system.

Uploaded by

Asma Ali
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 52

Chapter 9.

Database Design

Table of Contents

Chapter 9. Database Design..............................................................................................................2


Introduction.............................................................................................................................2
Learning Outcomes................................................................................................................2
Database Design Basics........................................................................................................2
Identification of Attributes in a Database...............................................................................4
Logical Database Design.......................................................................................................6
E-R-A Diagram..........................................................................................................7
Normalization.......................................................................................................................20
First Normal Form (1NF)........................................................................................20
Second Normal Form (2NF)...................................................................................21
Third Normal Form (3NF).......................................................................................22
Illustration of Normalization Procedure..................................................................23
Physical Database Design...................................................................................................24
Indexing Tables.......................................................................................................24
Denormalization......................................................................................................25
Partitioning..............................................................................................................26
Database and Programming................................................................................................27
Structured Query Language...................................................................................27
Stored Procedure....................................................................................................28
Trigger….................................................................................................................29
Illustration of Logical Database Design: Proposed AP System..........................................29
Chapter Summary................................................................................................................38
End of Chapter Exercises....................................................................................................39
Key Terms...............................................................................................................39
Self-Study Questions..............................................................................................41
Review Questions...................................................................................................44
Exercises.................................................................................................................45

Page 1 of 52
Chapter 9. Database Design

Introduction

In the previous chapter, we discussed how the physical architecture for the proposed system is

determined. The physical architecture includes several components such as database, programs, and

inputs/outputs, which need to be designed in the detailed design phase. Database design marks the first

major step within the design phase. Database design deals with the selection of the best structure for

storing the required data for the proposed system. The logical system specifications that facilitate

detailed design are available at the end of high level design stage. One of the logical system

specifications is the list of unique data items for the proposed system. This list is the primary input for the

detailed database design stage. The first step in the database design process is to identify data items

that should be stored in the database. The next step is to determine how those data items should be

stored.

Learning Outcomes

After studying this chapter, the reader will be able to understand:

 The database design procedure

 How data to be stored in the database are identified

 The following steps of the logical database design process:

 ERA diagram that identifies a system’s data model

 Converting an ERA diagram into database tables

 Normalization to avoid data anomalies

 Physical database design

 Integrating process model and database design

Database Design Basics

We assume that the database will be implemented using a relational database management

system. Although a database can follow a “hierarchical”, “network”, or relational model, most commercial

Page 2 of 52
database management systems follow relational models. Further, a majority of databases are

implemented using relational database software. In a relational database system, a database is defined

as a collection of related tables. A table is a collection of related records (rows), and each record consists

of a set of related data items (fields). For most practical purposes, a field is the primitive data item in a

database. For example, employee and department could be two tables in a database used by a firm.

The database could store information about which employees belong to which departments, a relationship

between employees and departments. The employee table stores a record for each employee in the firm,

and each employee record could contain fields such as employee#, name, title, phone#, and

department_code. Similarly, department table could contain department records with department_code,

and department_name as fields. In a relational database, relationship between two tables is established

by creating common fields in them. For instance, in the previous example, department_code is the

common field in employee and department tables. Please note that an employee record without the

department_code, cannot relate an employee to the department to which the employee belongs. In

relational models, the relationship between the two tables is thus established with a field common to both

the tables. Such relationships can exist between two or more tables, and even within one table as

discussed later in this chapter. In contrast, the hierarchical and network databases use pointers to

identify the relationships. Although these databases can perform faster than relational databases, they

lost favor with the database designers due to complexities in creating and maintaining the databases.

Database design consists of two steps: logical database design and physical database design.

The objective of logical database design is to produce a database structure that will result in a high level

of data integrity. The logical database design process achieves this by reducing data redundancy and

maintenance anomalies. Data redundancy or data duplication not only increases the storage space, but

also, more importantly, makes data maintenance error-prone. For instance, suppose multiple copies of a

data item are stored in the database and the data item needs to be updated, then, unless the update

procedure ensures that all copies of the data item are updated, the update will result in an inconsistent

database. Similarly, maintenance (insertion, deletion, and modification) anomalies occur when the

maintenance activity results in unintended consequences. For instance, in a database designed for a

university, assume that a table stores both course and faculty data. When a course is deleted from this

Page 3 of 52
table (perhaps because the university has eliminated the course from its curriculum), then there is

potential to lose the data associated with the faculty member teaching that course. This will occur if that

faculty member’s data exist only as part of the record that is deleted. So, the faculty member data could

be deleted even though he/she is still a faculty member of the university! Such anomalies occur if the

database is poorly designed. One way to deal with the anomaly problem is to let the maintenance

programs take care of it. However, this approach makes maintenance programs complex. The cause of

such maintenance anomalies is poor database design, and hence, logical database design process

addresses database maintenance anomalies. The logical database design produces a “data model” of

the system in the form of an Entity-Relationship-Attribute (ERA) diagram. Please recall that dataflow

diagrams represent the process model of a system. In a process model, processes and the data used

and produced by each process are the focus of the representation. A process model does not represent

the relationships among the data. In contrast, a data model represents a system from the perspective of

its data and does not represent processes are the relationships among the processes. The process

model and the data model are two sides of the same coin, and provide two different views of the same

system.

The objectives of physical database design are to facilitate database implementation and to

improve the database performance, e.g., database access time and speed of transaction processing.

Normalization process used in the logical database design process improves data integrity, but degrades

database performance frequently. One of the challenges of database design process is to strike a good

balance between database integrity and database performance. Physical database design process aims

at achieving a satisfactory performance without a significant compromise on database integrity. Physical

database design process includes techniques such as indexing, denormalization, and partitioning.

Identification of Attributes in a Database

In a database course, you might have learned to draw an ERA diagram and identify various

tables along with their respective attributes from a description of the system and without the benefit of a

process analysis and process model. The systems development cycle following the waterfall model has

the benefit of completing the process model that leads to identification of various data items needed to

Page 4 of 52
support the system, even before the database design begins. Please recall the chapter on High Level

design (Chapter 8) that identifies a lit of data items required to support a system. The data items thus

obtained may be classified in to basic, generated, and derived data items.

Basic data items are those data items that come from outside of the system or found in master

tables that are created with the system. Examples for basic data items include data items such as

customer number, customer name, and customer address that come from the customer entity to an order

processing system. In this order processing system, data items such as item name and unit price are

used from master catalog file. Generated data items are those data items that are generated by various

processes within the system. Since these data items originate in the system, they should be stored in

database. Examples of generated data items in an order processing system include data items such as

order number and order date. Derived data items are those data items that are derived as a function of

other data items that are available in the system. Examples for derived data items in an order processing

system include data items such as extension (= quantity * unit price) and total extension (=sum of all

extensions).

Whereas basic and generated data items are always stored, the decision to store or not store

derived data items is debatable. Database concepts require that derived data items be not stored to

avoid redundancy and to conserve storage space. However, if one of the data items has time-variant

values, then the derived data item needs to be stored. Since the data item, extension, discussed above it

is derived from quantity and unit price available in the system, one may decide not to store this data item.

However, the data item, unit price of an item, may change over time, and any extension calculated at a

different time may give a value different from the original value. Consider an item that you buy from a

department store. A return of this item would have to be accompanied by the original receipt. The

procedures for return and accounting of such returns depend on how the unit price or extension of this

item is stored in the database. If the original transaction is available in the system, it can be reversed

upon return. A store following the principles of database design strictly would not have the unit price or

extension stored for the transaction. If the price has gone up since you bought the item, the store would

have to return more money than what the customer paid originally. On the other hand, if the unit price or

extension were stored for the transaction in the database, then the store would pay exactly what the

Page 5 of 52
customer paid. The data item, unit price has values varying with time, is called a temporal data item.

Since the design of temporal databases is complex, many designers store such data items with each

transaction. This will result in redundancy of some data that need to be carefully controlled. An

alternative to avoid such redundancies is to associate time-variant data with time to avoid inconsistencies.

Thus, if unit price of an item has dates for which the price is applicable, the redundant data item,

extension, need not be stored.

Another motivation to store derived data is to save processing. Certain data items that may be

derived after complex calculations and multiple table accesses should be stored to save processing time.

Even in the case of data items that involve moderate processing, it may be worthwhile to store derived

data item if it is going to be used frequently. Derived data items that are time-invariant, simple to

calculate, and not frequently used need not be stored. The total extension discussed above is a good

example for such data items because it can be calculated as a sum of all extension in a transaction.

In drawing an E-R-A diagram, a designer needs to select the attributes from the list identified in high level

design and associate each with an entity.

Logical Database Design

The logical database design involves drawing an E-R-A diagram to determine various tables in

the database, relationships among tables, and fields in each table. The ERA diagram also helps to

identify constraints that should be imposed on data. Normalization rules can be applied, if necessary, to

ensure tables are in at least 3rd Normal Form (3NF). After finalizing the logical database design, the two

models of the system, process model and data model, are integrated, so that each process and its inputs

and outputs are defined in terms of the newly formed tables. This section will discuss the various

elements of E-R-A-diagram followed by an example for constructing the E-R-A diagram. An illustration

using the AP system will further exemplify the concepts and the integration of process model and data

model.

Page 6 of 52
E-R-A Diagram

Entity: An entity is an identifiable thing or object that is distinguishable from other objects. The reader is

cautioned about the use of the term, entity as used in the dataflow diagrams and E-R-A diagrams. An

entity in the DFDs, represent an external organization, customer, person, or a system that interacts with

the system under consideration. An entity in an E-R-A diagram can be an organization, a place, a

department, or an event. In short, an entity in an E-R-A diagram is any object about which a system

needs to collect, process, and store various data. Example entities include employee with identification

number 123, Systems Analysis and Design course with course identification MIS471, etc. An entity could

represent real or abstract objects as well as activities (e.g., shipping, registration) and others. Entities can

be described at two levels of detail. An entity class represents a collection of similar entities; an entity

instance represents a particular entity. For example, an entity called ‘student’ represents a collection of

students; hence, student is an entity class. A student with the student# ‘93844737373’ represents a

particular student; and, hence, is an instance of the student class. In the E-R-A diagram, we generally

model entity classes. Because all instances of an entity class are similar, the database design process

treats all instances of an entity in the same manner. Consequently, there is no need to model entities at

the instance level. Further, modeling the data at the instance level will make the diagram unnecessarily

big and complex without adding any new information content. We show an entity class using a

rectangular box in an E-R-A diagram.

There are several types of entities such as basic/fundamental entities, associative entities, weak

entities, and super/sub entities. Later parts of this chapter explain these types of entities. Basic or

fundamental entities are independent entities that do not depend on another entity for its existence. The

entities Department, Employee, and Skill in Figure 9-2 are basic entities. Associative entities are formed

when two or more entities are involved in a many-to-many relationship. The entities Employee-Skill in

Figure 9-3, Pre-requisite in Figure 9-5, and Factory-Warehouse-Product in Figure 9-7 are associative

entities. Video copy in Figure 9-8 is a weak entity. Super and sub entities are illustrated in Figure 9-10.

Attribute: An entity is characterized by its attributes. An attribute can be defined as a property of an

entity. For instance, the entity class ‘employee’ can have attributes ‘employee#’, ‘name’, and ‘jobtitle’.

Page 7 of 52
Note that all employee instances will have the same set of attributes. Every attribute of an entity instance

has an associated value. For example, the values of ‘employee#’, ‘name’, and ‘jobtitle’ of a particular

employee could be ‘E1’, ‘John’, and ‘Manager’, respectively. An attribute of an entity is shown in an E-R-

A diagram using an oval shaped box connected to the rectangle representing the entity.

At least one attribute serves as an identifier for the entity. For example, item number can be an

identifier for item entity containing attributes such as item number, item name, and item price. Certain

tables may have more than one attribute as identifiers. For example, order number and item number can

be identifiers for order entity containing attributes such as order number, item number, and order quantity.

To identify how much item was ordered, a user needs to know both the order number and the item

number. An identifier uniquely identifies an instance. That is, no two entity instances will have the same

value for that attribute. In some cases, an attribute of an entity instance could have more than one value.

We call such an attribute as multi-valued attribute. In general, it is a good practice to avoid using multi-

valued attributes in an E-R-A diagram. Multi-valued attributes can be transformed into an equivalent

model with only single-valued attributes using relationships, which we discuss next.

Relationship: A relationship models an association between instances of entities. The number of entities

involved may be one, two, three, or more. Relationships between two entities are common place. For

example, suppose we have employee and department as entities in an E-R-A diagram. We want to store

information about employees that work in a department (alternatively, the department that an employee

works in). This information is modeled in an E-R-A diagram using a relationship between the employee

and department entities. This is an example of a binary relationship. A binary relationship connects two

entity classes, or has a degree of two. Though binary relationships are the most frequently used

relationships in an E-R-A diagram, unary or recursive (an entity related to itself or degree 1), ternary

(three entities or degree 3), and relationships having a degree greater than three (n-ary) are possible. A

relationship is shown using crow’s foot notation as in Figure 9-1.

An E-R-A diagram also documents the cardinalities of relationships. Cardinalities model the

nature of associations between entity instances. We will explain cardinalities using examples. Consider

Page 8 of 52
a simple E-R-A model that has three entities: Department, Employee, and Skill, as shown in Figure 9-2.

Suppose, we want to store the following relationships:

(i) Several employees work in a department, but an employee works in one department only.

Each department has at least one employee.

(ii) An employee has several skills and needs at least one skill. In addition, several employees

can have same or different skills. Furthermore, no employee may have certain skills required in the firm.

Each relationship can be modeled as a binary relationship that connects the two entity classes mentioned

in it. Now consider the relationship between department and employees. For a particular department

(department instance), there can be many employees, and there has to be at least one employee. In

other words, a department instance is related to many instances of employees and it has to be related to

at least one instance of employee. We say that the minimum and maximum cardinalities of this

relationship on the employee side are one and many respectively. We show the minimum and maximum

cardinalities using crow’s foot notation as shown in Figure 9 -1. Various forms of cardinality possible in a

relationship are shown in Figure 9-1. Similarly, we can verify that since each employee instance is

related to a maximum of one department instance and each employee instance should be related to a

minimum of one department instance, the maximum and minimum cardinalities on the department side is

one.

In the relationship between employee and skill, since an employee instance can be related to

many skill instances and a skill instance can be related to many employee instances, the maximum

cardinality is many on both sides of the relationship. Since an employee needs at least one skill, then the

minimum cardinality on the skill side is one. A skill may not be available with any employee, and

therefore the minimum cardinality on the employee side is zero. We show the minimum and maximum

cardinalities using crow’s foot notation as shown in Figure 9 -2. Each entity would have at least one

attribute associated with it, and at least one identifier that acts as a primary to identify an instance

uniquely.

Page 9 of 52
Maximum/Minimum Cardinality Minimum/Maximum Cardinality
Representation
on Entity Side A on Entity Side B

One/One One/One A B
One/One Zero/One A B
One/One Zero/Many A B
One/One One/Many A B
One/Zero One/Many A B

Figure 9-1 Representation of Cardinality using Crow’s Foot Notation

Dept No. Dept Name

Department

Skill Code Skill Description

Employee Skill

Employee No. Employee Name

Figure 9-2 A Simple E-R-A Diagram

Page 10 of 52
When two are more entities are involved in a many-to-many relationship, the relationship also

would have certain attributes associated. Suppose each employee has certain weeks of experience in

each of the skills s/he has, then to identify an employee’s experience in a skill, we need to know the

employee and the skill. The attribute, experience would be associated with the relationship. In modified

E-R-A diagrams, many-to-many relationships create a new type of entity called associative entity. Each

associative entity would have its own identifiers and attributes such as experience in the above example.

The identifiers would be the identifiers of both the entities that have the many-to-many relationship.

Associative entity Employee-Skill is represented as shown in the modified E-R-A diagram in Figure 9-3. It

shows that an employee may have experience in many skills but have at least one skill. Similarly, a skill

may be available with many employees and no employee may have certain skills.

Dept No. Dept Name

Department

Skill Code Skill Description


Experience

Employee Employee-Skill Skill

Employee No. Employee Name Employee No. Skill Code

Figure 9-3. Modified E-R-A Diagram for the Example in Figure 9-2.

Cardinalities are shown for every relationship including recursive, ternary, and others. For

instance, consider the recursive relationship shown in Figure 9 - 4. In a university system, a course may

have one or more courses as its pre-requisites. The relationship shows the pre-requisite association that

exists among instances of course entity. Suppose that a course may have many pre-requisites, but not

all courses have pre-requisites. In addition, a course may serve as a pre-requisite for many other courses

and a course may not serve as a pre-requisite for any course. The maximum and minimum cardinalities

Page 11 of 52
are many and zero, respectively, on both sides of the relationship. Since this recursive relationship is

many-to-many, it would create an associative entity and have attributes such as course number and its

pre-requisite, which is also a course number. The modified E-R-A diagram is shown in Figure 9-5.

Figure 9 - 4. An Example of a Recursive Relationship

Course

Course Pre-Requisite

Figure 9 - 5. Modified E-R-A Diagram for the Recursive Relationship Example

In a ternary relationship, three or more entities will be involved. An association of three or more

entities required to describe certain data necessitate such relationships. Figure 9 - 6 shows a ternary

relationship representing a shipping activity that transports products from factories to warehouses. In this

model, what product goes from what factory for what warehouse needs an association of the three

entities? Since zero to many products may be sent from zero to many factories to zero to many

warehouses, the minimum and maximum cardinalities are zero and many respectively.

Page 12 of 52
Factory Product

Shipping
Activity

Warehouse

Figure 9 - 6. An Example of a Ternary Relationship

The many-to-many relationships in this example would have an associative entity and have

factory number, warehouse number, product number, and the number of units of the product shipped as

its attributes. To find the number of units shipped, a user would need to know factory number, warehouse

number, and the product number. In other words, these three attributes form the identifier for this

relationship. The modified E-R-A diagram for the ternary relationship example is shown in Figure 9-7.

Factory Product

Factory-
Warehouse-
Product

Warehouse

Page 13 of 52
Figure 9 - 7. Modified E-R-A diagram for the Ternary Relationship Example.

In addition to relationships discussed above, there are some special relationships such as weak

entities and super/sub entities in the modified E-R-A model.

Weak Entity: Weak entities are also known as dependent entities, as an instance in a weak entity is

dependent on an instance in another entity. In the absence of the instance in the latter entity, an instance

in the weak entity cannot exist. Let us assume a video store identifies each video by a video number,

title, year, and rating. Each video has multiple copies that are rented to various customers. Since a

customer can be related to a video copy and not to the original video, information about each copy would

have to be in a separate entity. Such an entity would have a video number, copy number of the video,

rental date, due date, and the customer, who rented the video. In order to identify data about a video, the

user would need to know the video number and its copy number. A weak entity is shown in Figure 9 -8.

A weak entity is shown as in Figure 9-8.

Video Video Copy

Figure 9-8. Illustration of Weak Entity

An associative entity can be viewed as a weak entity dependent on two or more entities. Thus,

the associative entity, Employee-Skill in Figure 9-3 is dependent on the two entities Employee and Skill.

Super/Sub Entities: Normally, we will need to treat all instances of an entity in the same way. All

instances of an entity will have the same set of attributes and participate in same relationships. However,

sometimes there may be a need to model data that apply to only a sub set of instances of an entity. We

use sub entities to model such scenarios. For instance, consider ‘student’ entity. Suppose we want to

model data that are specific only to undergraduate students and data specific only to graduate students.

For instance, for every graduate student, we may want to store the college where the student obtained

Page 14 of 52
the undergraduate degree. We can model this by creating two sub entities, namely ‘undergraduate’ and

‘graduate’, as shown in Figure 9 -10. Each of these entities represents a sub set of the ‘student’ entity.

We call ‘student’ as the super entity for these sub entities. Often, we also refer to super entity and a sub

entity as parent and child, respectively. The parent-child relationship between a super entity and a sub

entity is also known as the ‘is-a’ relationship. That is, an instance of a sub entity is (also) an entity of the

super entity. Theoretically, a sub entity could have its own sub entities, thus forming a hierarchy of ‘is-a’

relationships.

One of the important properties of a parent-child relationship is inheritance. A sub entity inherits

all attributes and relationships of its parent. However, a parent does not inherit attributes and

relationships of any of its sub entities. In Figure 9-9, while every undergraduate and graduate student

also has a ‘name’ and ‘number’, only graduate students have the attribute ‘college’.

Student No.

Student

Student Name

Undergraduate
Undergraduate Graduate
College

Figure 9-9. An Example for Super and Sub Entities

Primary Key: Each basic entity in an E-R-A model would have primary key(s) that identify each instance.

An associative entity would have more than one primary key. The primary keys of the parent entities

become the primary keys for the associative entity. Since the basic entity, Employee has Employee No.

as its primary key and the basic entity, Skill has Skill Code as its primary key, the associative entity

Page 15 of 52
Employee-Skill dependent on these two entities would have the primary keys, Employee No. and Skill

Code. The primary keys of the parent entity migrate as primary keys of the associative entity. A weak

entity also would have more than one primary key. One of the primary keys would have to be the primary

key of the entity on which weak entity is dependent. Thus, Video Number would be a primary key of the

weak entity, Video Copy. Since it cannot identify a copy uniquely, it needs a Copy Number in addition to

the Video Number as its primary keys. In super/sub entity case, the primary key(s) of the super entity

would migrate as primary key(s) for each sub entity. Thus, the primary key of the super entity Student,

Student No. also becomes the primary key for the two sub entities, Graduate and Undergraduate.

Primary key(s) in E-R-A diagram can be shown by underscoring the attributes selected as primary key(s).

Foreign Key: We discussed earlier that a common field in two or more related entities would help to

relate the entities. A customer can have many orders, and therefore, the entity, Customer would have

one-to-many relationship with the entity, Order. If Customer No. were the primary key for an instance in

the Customer entity, then to relate various orders of the customer, the Customer entity would have

Customer No. as an attribute. In relational tables, this attribute would be called a foreign key. A foreign

key also helps to navigate a database in the reverse direction. For example, an order record can be

processed for its foreign key to find its parent record. In all one-to-many relationships, the primary key(s)

of the parent entity migrate as foreign key(s) in the child entities. A weak entity can have multiple

instances for each instance in its parent entity. The one-to-many relationship in weak entity also requires

a foreign key as above. An associative entity can be viewed as having multiple one-to-many

relationships. In Figure 9-3, a parent instance in Employee entity has one or more instances of

Employee-Skill entity. Therefore, Employee No., the primary key of Employee migrates as a foreign key

to Employee-Skill. Similarly, a parent instance in Skill entity has one or more instances of Employee-Skill

entity. Therefore, Skill Code., the primary key of Skill migrates as a foreign key to Employee-Skill.

Simple rules to draw the E-R-A diagram and identify the relational tables.

 Identify basic entities a.k.a. independent or fundamental entities.

 Identify other types of entities such as weak (a.k.a. existence or dependent) entities.

Page 16 of 52
 Identify super-type and sub-type entities, if any.

 Determine relationships between basic entities (relationships may be unary, binary, ternary, etc.).

 Determine cardinality of relationships (1 to many; many to many, etc.).

 Identify minimum and maximum cardinalities.

 Many to many relationships will form associative (a.k.a. composite) entities.

 Determine the primary key(s) for all basic entities.

 Primary keys of participating basic entities become the primary keys of associative entities.

 Primary key of basic entity also becomes a part of the primary key(s) for the weak entity. A weak

entity with multiple instances for an instance of a parent entity would need another attribute as an

additional primary key. If none is available, introduce a new attribute to identify each instance

uniquely.

 Primary key(s) of parent entity becomes the foreign key(s) in the child entity, in one to many

relationships. Primary key(s) of basic entities also becomes foreign key(s) in weak entities and

associative entities.

 Identify attributes associated with each entity to complete E-R-A model

 Each entity becomes a relational table.

We illustrate E-R-A modeling using the following simple case for a fictitious firm called Orangemen

Enterprises. Orangemen Enterprises is a software development company with the following project

management responsibilities in its various divisions.

1. There are several divisions in the company and each division has a number, name, and a manager.

2. A division can operate several projects but each project belongs to a single division. Some of the

divisions do not operate any projects at all. Each project has a number, name, number of person

hours, and project cost.

3. A division has many employees with at least one employee but each employee is assigned to only

one division. Each employee has a number, name, date of hiring, date of birth, certain IS skills (at

least one), and a number of weeks experience in each of these skills.

Page 17 of 52
4. An employee may not have all of the skills needed in the company and no employee may have

certain skills required in the company. Each employee should have at least one skill. A skill code

and description identify each skill.

5. An employee may be assigned to many projects or to none at all but each project has at least one

employee assigned to it. For each project assigned to employee, the company keeps track of the

number of hours put into the project by the employee.

6. An employee may or may not have dependents. Each dependent of an employee has a roll number,

name, and date of birth. A spouse, who is also an employee of the company, should be identified as

such but s/he should not be treated as a dependent of the employee. In addition, a dependent such

as a child is a dependent to only one of the married employees.

The first step in drawing the E-R-A diagram is identifying the entities. These are objects whose

data we want to store in the database. For Orangemen Enterprises, these are division, project,

employee, skill, dependent, and spouse. The data of interest for a division are given in (1) as number,

name, and manager. These are modeled as attributes of division. Similarly, (2), (3), (4), and (6) state the

attributes for the entities project, employee, skill, and dependent, respectively. The case also states

several relationships among entities. For instance, (2) states that division and project are related. The

maximum cardinality is many on the project side and one on the division side. The minimum cardinality is

zero on the project side and one on the division side. By analyzing each sentence given in the case

description, we can identify other relationships and cardinalities. Figure 9-10 shows the entities and the

relationships among them. Since project and employee entities, and employee and skill entities have

many-to-many relationships, the E-R-A model in Figure 9-10 is modified and shown in Figure 9-11. Using

the simple rules described in the previous section, various entities, attributes, primary key(s), and foreign

key(s) are identified and shown in the modified E-R-A model.

Page 18 of 52
Date of
Birth
Division Project

Employee Dependent
Dependent Spouse

Skill

Figure 9-10 E-R-A Diagram for Orangemen Enterprises

Div Name Manager Pr No. Pr Name

Pr Cost
Legend:
Division Project Div No.
Underline - Primary Key
Pr Hours Italic - Foreign Key
Div No.

Pr. No.

Project-
Emp. No.
Employee

Hours

Roll No. Dep Name


Emp No. Div. No.
Emp No. DDOB
Emp Name

Employee Dependent
DOB

DOH

Employee-
Skill Spouse
Skill

Skill Code Skill Name Emp. No. Skill Code Experience Emp No. Emp. No.

Figure 9-11. Modified E-R-A Diagram for Orangemen Enterprises

Page 19 of 52
The attributes associated with each entity becomes a relational table as below:

Division (DivisionNumber, DivisionName, Manager)

Project (ProjectNumber, ProjectName, NumberOfPersonHours, ProjectCost, DivisionNumber)

Employee (EmployeeNumber, EmployeeName, DateOfHiring, DateOfBirth, DivisionNumber)

Skill (SkillCode, SkillDescription)

Dependent (EmployeeNumber, RollNumber, DependentName, DateOfBirth)

EmployeeSkill (EmployeeNumber, SkillCode, Experience)

ProjectEmployee (ProjectNumber, EmployeeNumber, NumberOfHours)

Spouse (EmployeeNumber, EmployeeNumber)

The database design we get at this stage should be considered as a preliminary design. If the E-

R-A diagram is well designed, then the database design derived from the E-R-A diagram will also be well

designed and in 3rd normal form. However, it is better to check whether the tables are well designed at

this stage. We use normalization criteria to assess the database design.

Normalization

Normalization is a procedure that reduces data redundancy and mitigates maintenance

anomalies in a database design. Normalization procedure consists of several stages called as normal

forms. As a design in a stage is transformed into a design in a higher normal form, the design is

improved. Normalization process works at the table level. That is, each table is assessed and improved

independently, even though all tables are part of the same data base. Converting a table into a higher

normal form always involves splitting the table into two or more tables. We discuss different normal forms

first followed by an illustration of the normalization procedure.

First Normal Form (1NF)

A table is said to be in INF if the table does not have multi-valued attributes (repeating groups).

For example, an order has several items. If order data and item data are in the same table, then the table

is not in 1NF.

Page 20 of 52
For a table to be in 1NF, it is enough if a field in any row of the table has a single value. For

example, consider the division table in Orangemen Enterprises. Each row in this table has the data for

one division. Since each division has only one code, only one name, and only one manager, each row in

this table will have only one value for each of these three fields. Therefore, the table division is in 1NF.

Suppose that a division can have many managers, then, in this design, rows corresponding to divisions

withy many managers will have multiple values for the manager field. In that case, division will not be in

1NF. Do note that we created our E-R-A diagram and subsequently the database design based on the

assumption that a division has only one manager. If this assumption is changed, then the E-R-A diagram

will change and we will not come up with the current database design in the first place.

Second Normal Form (2NF)

A table is said to be in 2NF if the table is 1NF and all non-key attributes depend on the whole set

of keys. 2NF applies to tables with two or more primary keys.

Before we discuss the definition for 2NF, we need to define the concept of dependency among

attributes in a table. We say that an attribute, Y, depends on another attribute, X, if only one value of Y

can be associated with a given value of X. In other words, if we know the value of X in a table, say, Vx,

then in all rows that have Vx as the value of X, the value of Y is the same, say Vy. Again, consider the

division table. We can conclude that DivisionName depends on DivisionCode because given

DivisionCode, we know that there is only one row in the table that will have that value for DivisionCode;

DivisionCode is the key for the table, and that in that row, there will be only one value for DivisionName

because the attributes are single-valued in this table. Can we say that DivisionCode depends on

DivisionName? Given a DivisionName, there could be multiple rows in the table with that value for the

DivisionName attribute because there is no restriction that DivisionName be unique for each division. In

each of these rows with the given value of DivisionName, there will be a different DivisionCode because

DivisionCode is unique in each row. Consequently, we find many DivisionCode for a given DivisonName,

and so, DivisionCode does not depend on DivisionName.

In a table in 2NF, every non-key attribute depends on the full set of keys. That is, no non-key

attribute can depend only on a part of the key. The question of partial dependency arises only when the

Page 21 of 52
key has more than one attribute as in Dependent, EmployeeSkill, and ProjectEmployee. Consider the

table ProjectEmployee. There is only one non-key attribute, viz., NumberOfHours. In order to determine

whether the table is in 2NF, we need to determine whether NumberOfHours depends on both

ProjectNumber and EmployeeNumber. That is, NumberOfHours cannot depend only on ProjectNumber

or only on EmployeeNumber. Given a ProjectNumber, is there only one value for NumberOfHours?

Since a project can have multiple employees working in it, and each employee works a certain number of

hours in the project, each project is associated with multiple NumberOfHours, one for each employee

working in the project. Thus, NumberOfHours does not depend on just the ProjectNumber. A similar

reasoning can be used to verify that NumberOfHours does not depend just on the EmployeeNumber.

However, given a ProjectNumber and an EmployeeNumber, there can be only one value for

NumberOfHours. Thus, ProjectEmployee is in 2NF.

Third Normal Form (3NF)

A table is in 3NF if the table is in 2NF, and the table does not contain a transitive dependency. In

other words, an attribute does not have dependency on an attribute that is not a primary key.

A transitive dependency exists in a table if the table contains three attributes, X, Y, and Z, such

that Y is dependent on X and Z is dependent on Y. Consider a table that contains order number, order

date, order value, customer number, customer name, and customer address, and that order number is

the primary key. A customer name or address is dependent on a customer number and not on the order

number. Thus, both customer name and customer address violate the 3NF rule. To resolve, the

customer name and customer address should have a separate table with customer number as its primary

key. There are higher order normal forms. However, in most practical applications, it is sufficient if the

database is in 3NF. We discussed the three normal forms individually. The essence of 3NF can be

stated using the following easy-to-remember statement.

‘A table is in 3NF if every non-key attribute depends on the key, the whole key, and nothing else but the

key.’

Page 22 of 52
Suppose we find that, a table that we derived is not in 3NF. How do we convert into a 3NF design? We

illustrate the normalization procedure using the following example. Because the design for Orangemen

Enterprises is already well-designed, we use a different example for the normalization procedure.

Illustration of Normalization Procedure

Consider the following table design for storing purchase order data:

Order (Order#, OrderDate, Customer#, CustomerName, Product#, ProductName, Unitprice, Quantity)

(i) The first step in the normalization process is to identify the dependencies among attributes in the table.

We identify the following dependencies:

Order# --------- OrderDate, Customer#, CustomerName

Customer# ---------- CustomerName

Product# --------------- ProductName

Order#, Product# ------------ Unit Price, Quantity

In the above notation for dependency, each of the attributes on the right hand side depends on the

attribute(s) on the left hand side.

(ii) The next step is to convert each dependency into a table with the left hand side as the key of the table.

Thus, we have the following tables.

Order (Order#, OrderDate, Customer#, CustomerName)

Customer (Customer#, CustomerName)

Product (Product#, ProductName)

OrderProduct (Order#, Product#, Unit Price, Quantity)

(iii) Now we check for transitive dependency in each of the tables. We know that in table Order,

CustomerName depends on Customer#, a non-key attribute. Thus, there is a transitive dependency. We

put each dependency that relates only non-key attributes as a separate table and eliminate the right hand

side of the dependency from the original table. In the table order, the only dependency that has only non-

key attributes is Customer# ------------- CustomerName. If we put this as a separate table, we get the

Customer table, which we already have in our design. Consequently, there is no need for another table

Page 23 of 52
that has Customer# and CustomerName. Then, we eliminate CustomerName from the original Order

table. Thus, we get the following Order table in 3NF.

Order (Order#, OrderDate, Customer#)

We can verify that other tables are already in 3NF. Thus, the database design in IIINF will be the

following.

Order (Order#, OrderDate, Customer#)

Customer (Customer#, CustomerName)

Product (Product#, ProductName)

OrderProduct (Order#, Product#, Unit Price, Quantity)

Again, the italicized attributes in the database design are the foreign keys.

Physical Database Design

After the logical database design is completed, the next step is to design the physical database.

The primary purpose of physical database design is to improve the database performance. The ultimate

performance measure of a database system is the speed and accuracy with which queries and updates

are performed by the system. Though the database performance can be accurately measured only after

it is implemented, a number of decisions can be made in the design stage from the performance

perspective. Tuning a database by changing the physical database design is an activity that continues

after a system is implemented. We discuss some of the physical database design techniques below.

Indexing Tables

The logical database design specifies what data should be stored in each table. It does not

specify the implementation details, or how the records are stored within a table. Indexing decisions

specify some of these details. Indexing a table based on a field improves the access speed when the

table is searched using that field. It is similar to indexes in a textbook. To a new reader, who does not

know the contents of the book, the fastest way to locate a topic is to use the index to find the page

number, and then read each line on the page sequentially. Without an index, the reader would have to

start on page one and read sequentially until the topic is found. Indexing reduces the search space to

Page 24 of 52
one page. When accessing a row in a table using its primary key, searching from the first row would be

time consuming. A table can be split into split into several pages (logically) and the page containing the

row is determined using the index.

The important indexing decision to be made is which tables should be indexed and which field or

combination of fields should be used to index the table. In determining which indexes to create, we begin

with the list of operations, both queries and update operations, to be done using the database. In a

typical database system, there will be a large number of data access and manipulation operations. Since

optimizing the performance of the database system for all database operations is impossible, we prioritize

the operations in the order of importance, and make indexing decisions starting from the most important

operation. We use the following guidelines in creating indexes.

1. We index every table on its primary key field as the key is often used to retrieve records from a table.

2. We index a table on foreign keys, if any, as the foreign keys are used to join the table with related

tables.

3. We index a table on attributes that are used in query operations. For instance, if product data are

frequently accessed using product name, the product table is indexed on ProductName attribute. We

prefer indexes that speed up more than one query.

The input-process-out tables discussed in previous chapters provide valuable information for indexing.

Since each process in an IPO table describes how each table is accessed and describes its processing

mode, it can be used to identify the indexes. The frequency of each online or real-time process identified

from work measurement data indicates tables that need to be accessed more frequently.

Denormalization

Denormalization is the reverse of normalization process. While normalization splits a table into

two or more tables, denormalization combines two or more tables. Normalization improves data integrity,

but increases data access time also. Thus, when access time performance is critical, we may sacrifice

data integrity in order to improve data access time. Consider for example, a customer table that contains

customer number, name, address, city, state, and zip code. This table is not in 3NF because both city

and state depend on zip code. Normalization up to 3NF would lead to two tables. One containing

Page 25 of 52
customer number, name, address, and zip code and the other containing zip code, city, and state. A firm

having a small number of customers throughout the country might want to denormalze these two tables

because to generate address, the two tables have to be joined frequently. In addition, city, state, and zip

code data do not change often. A downside of this denormalization is that a city name might not be

spelled the same way in all records, and data entry may involve additional key strokes. Additional

artifacts would have to be devised to reduce data entry labor.

Good candidate tables to combine are those that are frequently joined to answer queries. For

instance, in a logical database design for order, there is a frequent need to generate a sales report that

contains for each product, the product name, the unit price, and quantity. This query requires the use of

Product and OrderProduct tables. Since using two tables in the same query often slows down the

database, we may choose to combine the two tables into the following one table.

NewOrderProduct (Order#, Product#, Productname, Unit Price, Quantity)

Note that NewOrderProduct table is not in 3NF because ProductName depends only on Product# and

does not depend on the whole key.

Partitioning

Partitioning splits a table for implementation and storage purposes. Unlike normalization that

splits a table at the logical level, partitioning splits a table solely for implementation (or at the physical

level). Typically, large tables are partitioned so that smaller tables can be accessed to answer queries,

which will improve response time. Partitioning is also needed when a table has to be geographically

distributed on many servers. Records needed frequently for local use would be stored on a table in the

local server.

There are two ways of partitioning a table: horizontal and vertical. In horizontal partitioning, all

the partitions have identical attributes but each partition contains only a portion of the original number of

records. Horizontal partitioning reduces the table size and enables a DBMS to handle the table more

efficiently for processing. Accesses, updates, and joins with other tables can be faster with smaller

tables. Sometimes, specific DBMSs such as Access cannot handle large tables and have to be

horizontally partitioned to reduce size.

Page 26 of 52
In vertical partitioning, all the partitions have the same number of records but each partition

contains only a subset of the original attributes. In vertical partitioning, the primary key has to be

replicated in each partition. Vertical partitions are commonly employed in distributed databases.

Replicating a table in several remote locations requires elaborate processes to keep the tables

synchronized and consistent. If the local data items needed by each distributed site are disjointed, the

table can be split according to local needs. This simplifies and often obviates the need to keep the tables

synchronized.

Physical database design is an art and determining the optimal design requires a lot of

experimentation. However, modern database systems offer advanced tools such as index tuning wizards

and index advisors to help database designers and administrators in the physical design process.

Database and Programming

Databases have the advantage of program-data independence. Changing the data and data

structure should have very little impact on programs. Databases can be interfaced and manipulated by a

variety of programming languages. Programs written in languages such as Visual Basic and C++ can be

connected to databases to input and output data in a database. In addition, DBMSs have an easy to

learn and use high level language called Structured Query Language (SQL) to input, output, and

manipulate data.

Structured Query Language

All modern relational databases such as Microsoft SQL Server and Oracle, and including

Microsoft Access use SQL to manipulate a database. SQL has a number of commands like any other

programming language but SQL commands are structured like English and readable. For example, the

credit check process in an order processing system has to access the following customer table to obtain

“available credit” for the customer with CustomerNo, 374808:

Customer (CustomerNo, CustomaerName, CustomerAddress, CustomerCity, CustomerState,


CustomerZip, AvailableCredit, CreditLimit).

Page 27 of 52
The process can use the following SQL statement to obtain the “AvailableCredit”:

Select AvailableCredit
From Customer
Where CustomerNo = “374808”;

The above SQL statement is equivalent to the statement in natural English “get the available

credit information from the customer table, where customer number is 374808.” Note that it needs just

three lines of simple code to retrieve data from the database. A program written in a 3 rd generation

language would many more statements to accomplish the same task. In addition to retrieving data as

shown above, SQL statements can be used to add data to a table using INSERT command, delete data

with DELETE command, create new tables with CREATE command, remove tables with DROP

command, and combine several tables using JOIN command.

Stored Procedure

The SQL program as written above and executed would be compiled each time before execution.

To save compilation time, speed up execution, and enable sharing of a SQL program, it can be stored in

the database server in its compiled state. Such compiled and stored SQL programs are known as Stored

Procedures. The SQL program example discussed in the previous section can obtain the available

credit for the customer with a specific customer number of 374808. The same program as a stored

procedure can be used to obtain the available credit for any customer by what is known as

parameterization. Any customer number can be passed to this stored procedure as a parameter, and the

stored procedure used to obtain the available credit for that customer. Business rules can be

standardized and used within the same system and in other systems. Stored procedures can provide

modularity because these procedures can be called from any program module. A stored procedure can

another stored procedure in a nested operation. Since these are stored in the server, they save valuable

network bandwidth. Stored procedures can be scheduled to be executed as batch programs without

human intervention.

Page 28 of 52
Trigger

A trigger is a stored procedure that automatically executes when a certain event takes place. In

DBMS terminology, a trigger is fired. Triggers add to the flexibility offered by stored procedures. In using

a trigger, a programmer needs to identify when a trigger should be fired. A trigger can be associated with

commands (events) such as INSERT, DELETE, and UPDATE. In such cases, it can be used AFTER the

event, BEFORE the event, or INSTEAD OF the event.

Let us discuss a simple example that can use a trigger. In an inventory control system, a

procurement order has to be placed each time, the quantity on hand in the inventory table is less than or

equal to the reorder level. Since the quantity on hand will be reduced each time the inventory table is

updated for an issue of materials to production, a necessary task is to check whether the quantity on hand

is less than reorder level. For each UPDATE of QUANTITYONHAND in INVENTORY, a trigger called

CHECKROL can be fired after the update.

Although update for issue, comparing quantity on hand with ROL, and placing a procurement

order can be combined into a single program, the activities are tightly coupled (discussed in Chapter 12

on Program Design). By having a separate program for each of the above, the programs become

modular. If the system needs to compare quantity on hand ROL in any other place, the second program

can be reused. If a procurement order has to be placed for nay other reason, the last program can be

reused.

Illustration of Logical Database Design: Proposed AP System

Orangemen Enterprises and other examples used so far are examples of cases that have already

been structured sufficiently for us to draw an E-R-A model from the description. However, in reality, the

information required for drawing an E-R-A model is rarely as well structured as Orangemen Enterprises.

Fortunately, the DFD, its supplements, and the data dictionary for the proposed system contain the

information we require to draw the E-R-A diagram and to design the database. We illustrate this

procedure using the Accounts Payable system discussed in the previous chapter.

Page 29 of 52
In chapter 8, we derived the unique set of data items for the proposed AP system. We first

identify the data items that will be stored in the database. Table 9 -1 shows the data items that will be

stored and reasons for choosing or not choosing to store an item in the database.

Data Item Reason for inclusion or exclusion from the database


Amount owed Basic data item as it is a verified data item coming from an external entity
Balance It is a master data that is updated for each check written as of certain date and
time.
Check # Generated data item
Check status Generated data item
Date/time Generated data item
Delivered to store Basic data item
Dr/cr Generated data item
Extension It is derived as a product of quantity and unit price. Since unit price may vary
with time, this temporal data item is stored.
Invoice # Generated data item
Invoice date Generated data item
Invoice status Generated data item
Item # Added to identify each item uniquely.
Item name Basic data item
Item quantity Basic data item
Payment due date Basic data item
Sales person name Basic data item
Sales person signature Basic data item
Sales tax Basic data item coming from an external entity.
Shipping & handling Basic data item coming from an external entity.
charges
Total extension Since this is derived as a simple sum of all extensions in an invoice, we need
not store it.
Total outstanding amount Derived and used only to answer vendor query
Total payments Derived and used only to reconcile vendor invoices
Total purchases Derived and used only to reconcile vendor invoices
Unit price Basic data item
Vendor # Basic data item
Vendor address Basic data item
Vendor name Basic data item

Table 9 -1 Data Items That Will Be Stored in the Database in the AP System

We can identify five entities in the proposed system. They are vendor, invoice, item, check, and

check book register. Vendor entity represents the collection of vendors from whom OG has bought

various items. An invoice entity represents a collection of invoices received from various vendors. Item

entity contains all items bought by OG. Check is the set of all checks written by OG to pay its vendors.

Check book register keeps a list of all checks, their date and time, and the corresponding balances. One

Page 30 of 52
way to identify relationships, other than knowledge about the problem domain, is the structure of data

flows and data stores. For example, consider

Vendor Ledger: vendor # + vendor name + vendor address + {invoice # + invoice date + Amount Owed +

Check # + Payment Amount}

The above data structure contains data elements associated with three entities: vendor, invoice,

and check. Vendor #, vendor name, and vendor address are attributes of vendor. Invoice #, Invoice

date, and Amount owed are attributes of invoice. Check # and payment amount are attributes of check.

The data structure suggests that these entities are related. Further, it also shows that a vendor can have

multiple invoices (note the repeating group for invoice data for each vendor) and that an invoice is related

to only one check (note the absence of repeating group for check data for an invoice). By using similar

analysis, we can identify other relationships and maximum cardinalities. The data structure does not

indicate the minimum cardinalities; but we should determine those using domain knowledge. The E-R-A

diagram for the proposed AP system is shown in Figure 9-12. Since a credit entry (such as deposit) in

the checkbook register would not have a corresponding check, the cardinalities are as shown in the

model.

Checkbook
Vendor Check
Register

Branch
Invoice
Ledger

Item

Figure 9-12. E-R-A Diagram for the Proposed AP System

The many-to-many relationship between the Item and the Invoice would spawn an associative

entity in the modified E-R-A diagram for the system. Various data items associated with each entity, their

primary key(s), and foreign key(s) are shown in Figure 9-13. A new data item, Item # has been added to

Page 31 of 52
identify the entity Item because item names may have multiple spellings and thus cause duplicate rows

that cannot be easily identified.

Vendor Check Date/ Check


Vendor # Vendor Name Check # Date/Time Cr/Dr Balance
Addr Time Status

Invoice#
Checkbook
Vendor Check
Register

Sales Tax S&H Charges


Del to Amount
Store Owed
Invoice
Pay Du Dt
Status
Branch
Invoice
Ledger
Vendor#
Invoice Date

Invoice# SP Name
SP Name SP Sign

Extension
Item Invoice-Item

Quantity

*Item # Item Name Invoice# Item# Unit Price

Figure 9-13. Modified E-R-A Diagram for Proposed AP System

If the modified E-R-A diagram has been drawn correctly, the tables associated with the respective entities

should be in 3NF.

Vendor (vendor #, vendor name, vendor address)

Invoice (invoice #, invoice date, payment due date, delivered to store, sales tax, shipping & handling
charges, amount owed, invoice status, vendor#, SalespersonName)

Item (Item#, Item Name)

InvoiceItem (Invoice#, ItemNo., UnitPrice, ItemQuantity, Extn.)

Check (Check #, Check Date/Time, Check Status, Invoice#)

Checkbook (date/time, dr/cr, balance,)

BranchLedger (SalespersonName, SalespersonSignature)

Page 32 of 52
The table InvoiceItem appears to violate 2NF because one might expect Unit Price to depend on

only Item Number. However, recall our discussion that UnitPrice may vary, and therefore it needs to be

identified with each Invoice#. All the tables are in 3NF. The next step is the physical design of the

database. At a minimum, we will create the following index files.

Table Index Files


Vendor Vendor #
Invoice Invoice #, Vendor #,
InvoiceItem Invoice#, ItemNo.
Check Check #, Invoice#
Checkbook Check #

Table 9 - 2 Index Files for the Proposed AP System

We do not denormalize the database design, as the number of tables in the database is fairly

small and so, none of the queries will require joining a large number of tables.

After the database design is finalized, we need to revise our input-process-output tables because

the number of data stores and the contents of each data store have changed following the database

design. The revised input-process-output tables are given in Table 9 -3.

(Please note that Item # has been added to the data dictionary to identify each item uniquely, and

therefore, invoices received from vendors should include the Item #)

Invoice (from Store) Get Signature (online) Sign (to


Branch Ledger Bookkeeper)
Invoice: delivered to store Upon receipt of invoice, the Sign: sales person
(other data items incuding vendor bookkeeper will name + sales
# + vendor name + vendor address Get branch ledger table person’s signature
+ invoice # + invoice date + Obtain sales person name and
payment due date + sales person sales person’s signature from the
signature + {tem # + item name + branch ledger table
item quantity + unit price +
extension} + total extension +
sales tax + shipping & handling
charges + amount owed are not
needed for the process but appear
on the paper invoice)

Branch Ledger: sales person


name + sales person signature

Page 33 of 52
Invoice (from Store) Verify Signature Verified Invoice (with Bookkeeper)
Sign(from Bookkeeper) (Manual) Invoice (Returned to store)
Invoice: delivered to store + Compare the signatures Verified Invoice: vendor # + vendor
sales person signature on the invoice and the name + vendor address + invoice # +
(other data items incuding branch ledger. invoice date + payment due date +
vendor # + vendor name + If signatures are not delivered to store + sales person
vendor address + invoice # identical signature + {item # + item name +
+ invoice date + payment Return the invoice to item quantity + unit price +
due date + {item # + item store for confirmation extension} + total extension + sales
name + item quantity + unit Endif tax + shipping & handling charges +
price + extension} + total amount owed
extension + sales tax +
shipping & handling Invoice: vendor # + vendor name +
charges + amount owed are vendor address + invoice # + invoice
not needed for the process date + payment due date + delivered
but appear on the paper to store + sales person signature +
invoice) {item # + item name + item quantity
+ unit price + extension} + total
Sign: sales person name + extension + sales tax + shipping &
sales person’s signature handling charges + amount owed

Verified Invoice Enter Invoice Information (online) Invoice


(Bookkeeper) InvoiceItem

Verified Invoice: If the signatures match Invoice:


vendor # + vendor Enter Invoice data into Invoice record Invoice # + invoice date +
name + vendor Invoice status = "outstanding" payment due date +
address + invoice For each item in invoice delivered to store + sales tax
# + invoice date + Enter Item No., Unit Price, Item + shipping & handling
payment due date Quantity, and Extension in charges + amount owed +
+ delivered to InvoiceItem record invoice status + vendor# +
store + sales Next item Sales person Name
person signature Endif
+ {item name + InvoiceItem:
item quantity + Invoice# + ItemNo. +
unit price + UnitPrice + Item Quantity +
extension} + total Extension
extension + sales
tax + shipping &
handling charges
+ amount owed

Page 34 of 52
Invoice Prepare Check (Batch) Invoice
Checkbook Checkbook
Check
Invoice: Once in ten days, Invoice:
invoice # + amount Get the Invoice table invoice # + invoice
owed + invoice For each invoice in the invoice file with invoice status
status status = outstanding”
If payment due date – current date <= 10 Checkbook:
Checkbook: Get checkbook table date/time + dr/cr
date/time + balance Get the last balance in the checkbook +balance
If amount owed <= balance
Check date/time = current date/time Check:
Generate Check # Check # + Check
Check Status = “No” Date/time + Check
Write Check #, Check Date/Time, Status + Invoice #
Check Status, and Invoice # in Check
record
Dr/cr = Dr
Balance = balance – amount owed
Update checkbook record
Invoice status = "in-process"
Update invoice record
Endif
Endif
Next invoice

Invoice Show Invoice & Check Information Invoice (to Owner)


Vendor (online) Check (to Owner)
InvoiceItem
Item
Check
Invoice: For each check w/Check Status = Invoice:
invoice # + invoice date + “No” vendor # + vendor
payment due date + delivered to Get Invoice# name + vendor
store + sales tax + shipping & Get Invoice record address + invoice #
handling charges + amount owed Use Vendor to get Vendor + invoice date +
+ invoice status + vendor# + record payment due date +
SalespersonName) Get InvoiceItem table delivered to store +
For each Item for the sales person
Vendor: Invoice# signature + {item
Vendor # + vendor name + Get InvoiceItem name + item
vendor address record quantity + unit price
Get Item Table + extension} + total
Item: Using Item #, get extension + sales
Item# + Item Name item record tax + shipping &
Next Item handling charges +
InvoiceItem: Format & display Invoice amount owed +
Invoice# + ItemNo. + UnitPrice + Fomat & display Check Invoice Status
ItemQuantity + Extension
Check:
Check: Check: Check # +
Check # + Check Date/time + Check Date +
Check Status + Invoice # Vendor Name +
vendor address +
CheckStatus +

Page 35 of 52
Invoice #

Invoice (to Owner) Decide Approval (Manual) Approval (by


Check (to Owner) Owner)
Invoice: vendor # + vendor name Verify whether invoice #, vendor Approval: Check # +
+ vendor address + invoice # + name, and vendor address match in (Yes|No)
invoice date + payment due date the check and the invoice
+ delivered to store + sales Verify whether check amount =
person signature + {item name + amount owed
item quantity + unit price + If the above four items match
extension} + total extension + Decide whether or not to
sales tax + shipping & handling approve now
charges + amount owed + If decided to approve
invoice status Go to process
“Approve Check”
Check: Check # + Check Date + Endif
Vendor Name + vendor address + Endif
Check Amount + Invoice #

Approval (from Owner) Approve Check (online) Check


Approval: Check # + Check Enter Check Status (approved) Check:
Status (Yes|No) for the check # Check # + Check status

Check Mail Check (real-time) Check (Vendor)


Vendor Invoice
Invoice: Vendor Ledger
Check: Upon approving a check, Check:
Check # + Check If Check status = “Yes” Check # + Check Date +
Date/time + Check Get Invoice # Vendor Name + vendor
Status + Invoice # Get Invoice record address + Check Amount
Get Vendor record using Vendor # + Invoice #
Invoice: Check amount = Amount Owed
invoice # + amount Format & print Check Invoice:
owed + Vendor # Invoice status = "paid" invoice # + invoice status
Update Invoice record
Vendor: Endif
Vendor # + vendor (Stuff the check into window envelop.
name + vendor Mail the check – Manual)
address

Vendor Update Vendor Payable Ledger Vendor Payable Ledger


Ledger (Batch)

This process is not required because, the Vendor Payable Ledger: Vendor
outputs of this process are not stored in No. + Vendor Name + Total
the database. Purchases + Total Payments +
Total Outstanding Amount

Page 36 of 52
Query (from Provide Invoice Status (online) Response
Vendor) (Vendor)
Vendor
Invoice
Query: Vendor Upon receiving a query from a vendor, Get the Vendor No. Response:
# Get Vendor record Vendor No.
For each invoice of vendor w/ Invoice status = “Outstanding” + Vendor
Vendor: or “In-Process” Name +
Vendor No. + Get Invoice record {Invoice No.
Vendor Name If Invoice Status = Outstanding + Invoice
Write Invoice No., Invoice Date, Amount Owed, and Date +
Invoice: Invoice Status Amount
Invoice # + Total outstanding amount = total outstanding amount + Owed +
invoice date + amount owed Invoice
amount owed + Elseif Invoice Status = In-process Status} +
invoice status Write Invoice No., Invoice Date, Amount Owed, and Total
+ vendor# Invoice Status Outstanding
Total amount in-process = total amount in-process + Amount +
amount owed Total
Endif Amount in
Next invoice process
Format & display Response

Invoice Prepare cash flow statement (Batch) cash flow


statement (cash
flow planning
system)
Invoice: Every month, Cash flow
Invoice # + invoice Get Invoice file Statement:
date + payment due For each invoice with invoice status = outstanding Payment Due
date + amount owed + Write Payment due date, Invoice No., Invoice Date + {Invoice
invoice status + Date, Vendor No. Amount Owed in cash flow No. + Invoice
vendor# statement Date + Vendor
Total amount owed = total amount owed + No. + Amount
amount owed Owed} +Total
Next invoice Amount Owed.
Send cash flow statement to planning system

Table 9 - 3. Corrected IPO tables after database design

In larger systems, you may find each table interacting with each process, and thus, producing a

complex DFD. In the design stage, we need information about each process and each data item that is

input or output. Since the DFD representation is no longer useful in the design stage, we will discard it

and use the Input-Process-Table in subsequent stages.

Page 37 of 52
Chapter Summary

 This chapter discussed the data base design process.

 Database design process includes two stages: logical data base design and physical data base

design.

 Logical data base design shows the tables, the data elements in each table, and relationships

between tables.

 A good logical data base design results in high data integrity.

 The input to the logical data base design process is the data dictionary prepared at the end of the

analysis phase.

 The first step in the logical data base design process is identifying the data elements that will be

stored in the data base

 The next step is the creation of E-R-A diagram.

 This chapter discussed the various elements of an E-R-A diagram and how to construct an E-R-A

diagram.

 The next step in the logical data base design process is the conversion of E-R-A diagram into

data base tables.

 This chapter discussed the step-by-step procedure for converting an E-R-A diagram into a logical

data base design.

 The chapter discussed normalization as a tool to verify the goodness of logical data base design.

 Once the logical data base design is completed, then the physical data base design is done.

 The physical data base design tunes the data base to improve performance.

 This chapter discussed indexing, partitioning, and denormalization methods to tune the data

base.

 After the data base design is finalized, the input-process-output tables derived at the end of the

analysis phase are modified to reflect the data base design.

 Finally, this chapter illustrated the data base design process using a case study.

Page 38 of 52
End of Chapter Exercises

Key Terms

Associative Entity

Attribute

Binary Relationship

Denormalization

Entity

First Normal Form

Foreign Key

Indexing

Logical database design

Maximum Cardinality

Minimum Cardinality

Normalization

Partitioning

Physical database design

Primary Key

Recursive (Unary) Relationship

Relationship

Second Normal Form

SQL

Stored Procedure

Structured Query Language

Sub entity

Super Entity

Ternary Relationship

Third Normal Form

Trigger

Page 39 of 52
Weak Entity

Page 40 of 52
Self-Study Questions

1. Logical data base design does not include

a. Preparing an ERA diagram

b. Normalization

c. Indexing

d. b and c

e. a and b.

2. Derived attributes are never stored in a database.

a. True

b. False

3. Data items identified in the data dictionary are modeled in an E-R-A diagram as

a. entities

b. attributes

c. relationships

d. super and sub entities

e. None of the above

4. In E-R-A diagrams, for a 1 to Many relationship,

a. The key of the entity corresponding to the one part is put in the entity corresponding to

the many part.

b. The key of the entity corresponding to the may part is put in the entity corresponding to

the one part.

c. Both a and b

d. An associative entity that has both keys as attributes is created

e. None of the above.

Page 41 of 52
5. Normalization process results in

a. High data integrity

b. Reduced data duplication

c. Reduced maintenance anomalies

d. Smaller tables

e. All of the above

6. Normalization often increase access time

a. true

b. false

7. In horizontal partitioning,

a. Each partition will have sub set of attributes from the table

b. Each partition will have all attributes, but a sub set of records

c. Each partition will have a sub set of attributes and a sub set of records.

d. Each partition will have a sub set of tables from the data base.

e. Each partition will have a sub set of indexes.

8. Indexing of a table

a. Speeds up access time

b. Speeds up addition of records

c. Speeds up deletion of records

d. Speeds up modification of records

e. None of the above

Page 42 of 52
9. Denormalization

a. Splits a table into two or more tables

b. Combines two or more tables into one table

c. Improves data integrity

d. Reduces data duplication

10. A table that has a transitive dependency violates the condition for

a. First normal form

b. Second normal form

c. Third normal form

d. All of the above

e. None of the above.

Page 43 of 52
Review Questions

1. What is the significance of E-R-A diagram in systems design?

2. What do (i) entities, (ii) attributes, and (iii) relationships model in an E-R-A diagram?

3. Explain maximum and minimum cardinalities of a relationship using an example.

4. What are different types of entities? Give examples for each type.

5. How does normalization help in the database design process?

6. What is the essence of III NF?

7. What is the purpose of logical database design and physical database design?

8. Which fields in a table are good candidates for indexing?

9. Is there any disadvantage to normalizing a database? Explain with an example.

10. Under what conditions will a derived attribute be stored (or not stored) in a database?

11. Why do we say that DFDs and E-R-A diagrams of a system are like “two sides of the same coin”?

12. E-R-A diagrams capture more information about data than data documentation associated with

DFDs. Illustrate with examples.

Page 44 of 52
Exercises

1. Design a relational database in III NF for the bank described below.

The BG bank serves the people of Bowling Green. A customer of the BG bank can have a checking account

or a savings account or both. A customer can have only one savings account, and one checking account.

BG bank serves two types of customers: individual and institutional. When the customer opens an account,

the bank obtains several data depending on the type of customer. For individual customers, data such as the

social security number, name, occupation, and salary are obtained. For institutional customers, data such as

the name, number of employees, revenue, and profit are collected. The customer is then given an account

number. The bank maintains, for each account, the account balance.

The institutional customers are assigned personal officers; one customer has one assigned officer.

However, an officer may handle multiple customers. The officer schedules meetings with the institutional

members frequently to discuss customer-related issues. The bank maintains data such as time, date, venue,

and topic for each meeting between a customer and an officer.

The bank also provides loans. Customers who obtain loans are called as clients. The same client can

obtain several loans from the bank. The bank gives home and car loans. Each loan has a repayment

period. This period is fixed by the bank based on the client’s financial status, the type of loan, and a

number of other factors.

2. Consider the following data dictionary for a dentist’s office. Develop a database design in III NF.

Data Dictionary:

No. Data item Description

1 family# Family identification number

2. fname Last name of the family

3. faddr family address

4. fbalance family balance due

5. patient# patient identification number

6. patname patient name

Page 45 of 52
7. servcode service code

8. servdesc service description

9. servfee service fee

10. pat_servfee amount of service fee owed by the patient

11. ins_servfee amount of service fee owed by the insurance company

12. servdate date when service was rendered

13. insurname name of the insurance company to which the family belongs.

14. insur_bal balance owed by the insurance company

Assume the following:

A patient gets a service only once in a day. A family has only one insurance company. There is a fixed fee

for each service. The fee owed by the patient and the insurance company for a service depends on the

insurance company.

3. IT Services, Inc. is an engineering firm with approximately 1000 employees. A database is required to

keep track of all employees, their skills, and projects assigned and departments worked in. Every

employee has a unique number assigned by the firm. The firm needs to store his/her name and date-of-

birth. Each employee is given a job title. The employees are also categorized into different groups such

as professionals, and administrative assistants. The relevant data to be recorded for professionals is the

type of degree and for administrative assistants is their typing speed.

There are several departments, each with a unique name. An employee can report to only one

department. Each department has a phone number.

To procure various kinds of equipment, each department deals with many vendors. A vendor typically

supplies equipment to many departments. It is required to store the name and address of each vendor,

and the date of last meeting between a department and a vendor.

Page 46 of 52
Many employees can work on a project. An employee can work on many projects. Each project is

carried out in a city. For each city, we are interested in its state and the population. An employee can

have many skills. An employee uses each skill she/he possesses in at least one project. Each skill is

assigned a number. A short description is required to be stored for each skill. Projects are distinguished

by project numbers. It is required to store the estimated cost of each project.

4. Draw the complete E-R-A diagram for the following situation.

The US Airlines Company publishes a monthly flight log report that tracks which type of aircraft, and

the number of hours flown by an individual pilot during the month. A separate report is prepared for each

pilot, and is used to monitor pilot flight proficiency for two types of aircraft (fixed-wing, and rotorcraft) which a

pilot may be qualified to fly. Pilots may fly a different aircraft in each trip. Each aircraft has a single crew chief

permanently assigned to perform maintenance on the aircraft, although a crew chief may crew more than one

aircraft. Each aircraft is identified by an aircraft number. Each aircraft also has a seating capacity. The pilots

have pilot license numbers. The report also specifies, for each aircraft, its characteristics such as number of

engines and the type of propeller in the case of fixed-wing aircrafts, and the rotor speed in the case of

rotorcrafts.

Page 47 of 52
List of Figures

Figure 9 -1 Representation of Cardinality using Crow’s Foot Notation

Figure 9 - 2 A Simple E-R-A Diagram

Figure 9 -3. Modified E-R-A Diagram for the Example in Figure 9-2

Figure 9 - 4. An Example of a Recursive Relationship

Figure 9 - 5. Modified E-R-A Diagram for the Recursive Relationship Example

Figure 9 - 6. An Example of a Ternary Relationship

Figure 9 - 7. Modified E-R-A diagram for the Ternary Relationship Example

Figure 9-8. Illustration of Weak Entity

Figure 9-9. An Example for Super and Sub Entities

Figure 9-10 E-R-A Diagram for Orangemen Enterprises

Figure 9-11. Modified E-R-A Diagram for Orangemen Enterprises

Figure 9-12. E-R-A Diagram for the Proposed AP System

Figure 9-13. Modified E-R-A Diagram for Proposed AP System

Page 48 of 52
List of Tables

Table 9 -1 Data Items That Will Be Stored in The Database in The AP System

Table 9 - 2 Index Files for the Proposed AP System

Table 9 - 3. Corrected IPO tables after database design

Page 49 of 52
Chapter Index

Index entry as it will appear in the Book Index String to search for in the text body
Associative Entity many-to-many relationships create a new type of
entity called associative entity
Attribute An attribute can be defined as a property of an
entity.
Binary Relationship A binary relationship connects two entity classes, or
has a degree of two.
Denormalization Denormalization is the reverse of normalization
process
Entity An entity is an identifiable thing or object that is
distinguishable from other objects
First Normal Form A table is said to be in INF if the table does not have
multi-valued attributes
Foreign Key A foreign key also helps to navigate a database in
the reverse direction.
Indexing Indexing a table based on a field improves the
access speed when the table is searched using that
field.
Logical database design The logical database design involves drawing an E-
R-A diagram to determine various tables in the
database, relationships among tables, and fields in
each table.
Maximum Cardinality We say that the minimum and maximum
cardinalities of this relationship on the employee
side are one and many respectively.
Minimum Cardinality We say that the minimum and maximum
cardinalities of this relationship on the employee
side are one and many respectively.
Normalization Normalization is a procedure that reduces data
redundancy and mitigates maintenance anomalies
in a database design.
Partitioning Partitioning splits a table for implementation and
storage purposes
Physical database design The primary purpose of physical database design is
to improve the database performance.
Primary Key Each basic entity in an E-R-A model would have
primary key(s) that identify each instance.
Recursive Relationship an entity related to itself or degree 1
Relationship A relationship models an association between
instances of entities.
Second Normal Form A table is said to be in 2NF if the table is 1NF and
all non-key attributes depend on the whole set of
keys.
SQL In addition, DBMSs have an easy to learn and use
high level language
Stored Procedure The SQL program as written above and executed
would
Structured Query Language In addition, DBMSs have an easy to learn and use
high level language
Sub entity We use sub entities to model such scenarios.
Super Entity we also refer to super entity and a sub entity as

Page 50 of 52
parent and child, respectively.
Ternary Relationship In a ternary relationship, three or more entities will
be involved
Third Normal Form A table is in 3NF if the table is in 2NF, and the table
does not contain a transitive dependency.
Trigger A trigger is a stored procedure that automatically
executes
Weak Entity Weak entities are also known as dependent entities,

Page 51 of 52
Additional Reading

1. C. J. Date. An Introduction to Database Systems, Seventh Edition, Addison Wesley, 2000.

2. R. Elmasri and S. Navathe, Fundamentals of Database Systems, Third Edition, Addison Wesley,

2000.

3. L. Sanders, Data Modeling, Boyd and Fraser Publishing Company, 1995.

Page 52 of 52

You might also like