0% found this document useful (0 votes)
5 views

Unit-5-Handouts

This document outlines a unit on Data Modeling, focusing on defining data design concepts, applying data modeling techniques, and creating entity relationship diagrams (ERDs). It covers key fields, types of relationships, normalization processes, and cardinality notation to ensure effective data organization and documentation. The learning outcomes include the ability to draw ERDs, normalize tables, and understand the use of data dictionaries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
5 views

Unit-5-Handouts

This document outlines a unit on Data Modeling, focusing on defining data design concepts, applying data modeling techniques, and creating entity relationship diagrams (ERDs). It covers key fields, types of relationships, normalization processes, and cardinality notation to ensure effective data organization and documentation. The learning outcomes include the ability to draw ERDs, normalize tables, and understand the use of data dictionaries.
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 22

Unit Data Modeling

5
Lesson 1. Data Modeling

LEARNING OUTCOMES
At the end of the unit, the students must have:

1. Defined data Design concept


2. Applied data modelling for system concept
3. Draw entity Relationship Diagram
4. Normalized table using normalization
5. Discussed how data dictionary is used what it contains
6. Completed the assigned task on time

INTRODUCTION
In this unit you will learn how to use a popular data-modelling tool, entity relationship
diagrams, to document the data that must be captured and stored by a system, independently
of showing how that data is or will be used – that is, independently of specific inputs, outputs,
and processing. You will also learn about a data analysis technique called normalization that
is to ensure that a data model is a “good” data model.

LEARNING CONTENT
Data design terms include entity, table, file, record tuple, and key field.
Entity is a person, place, thing, or event for which data is collected and maintained.
Table or File - data organized into tables or file. A table, or file, contains a set of related
records that store data about specific entity. Table of file are shown as two-dimensional
structures that consist of vertical and horizontal rows. Each row represents, a field, or
characteristic of the entity, and each row represents a record, which is an individual
instance, or occurrence of the entity.
Field also call attribute, is a single characteristic or fact about an entity. For example,
customer entity might include the CustomerID, Firstname, Lastname, address, City, State,
Zip and e-mail address.
Record also called a tuple is a set of related fields that describes one instance, or
occurrence of an entity, such as one customer, one order, or one product. A record might
have fields, depending on what information is needed.
1
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
Key Fields
During the system design phase, key fields is use to organize, access, and maintain
data structures. The four types of keys are:
1. Primary key is a field or combination of fields that uniquely and minimally identifies
a particular member of entity. For example, in a customer table the customer
number is a unique primary key because no customers can have the same customers
can have the same customer number.
2. Candidate key is a combination of field to use as the primary key. Any field could
serve as a primary key is called a candidate key. For example, if every employee has
a unique employee number, then you could use either employee number or the
Social security number as a primary key. You can select only one field as a primary
key, you should select the field that contains the least amount of data and is easiest
to use.
3. Foreign key is a field that points to records in different file in a database. It a field
in one table that must match a primary key value in another table in order to
establish relationship between two tables.
4. Secondary Key is a field that can be used to access or retrieve records. Secondary
key values are not unique. For example, if you need to access records for only those
customers in specific ZIP code, you would use the ZIP code field as a secondary key.
Data modeling is a technique for organizing and documenting a system’s data. It is
sometimes called database modeling.
System Concept for Data Modeling
There are several notions for data modeling.
1. Entity Relationship Diagram (ERD)
2. Normalization
Entity Relationship Diagram (ERD) is a data model utilizing several notations to
depict data in terms of the entities and relationships describe by data. It is a model that
shows the logical relationships and interaction among system entities. An ERD provides an
overall view of the system and a blueprint for creating the physical structures. An
information system most recognize the relationships among entities. For example, a
customer entity can have several instances of an order entity, and an employee entity can
have one instance, or none, of a spouse entity.
Entity is a class of persons, places, objects, events, or concepts about which we need to
capture and store data. For example, entities might be customers, sales regions, product, or
orders.
Attribute is a descriptive property or characteristic of an entity
Relationship is a natural business association between one or more entities.
Degree is the number of entities that participate in a relationship.

Drawing an ERD
The first step is to list the entities that you identified during the system analysis phase and to
consider the nature of the relationships that link them. At this stage, you can use simplified
method to show the relationships between entities.
Although there are different ways to draw ERDs, a popular method is to represent entities as
rectangles and relationship as diamond shapes. The entity rectangles are labeled with
singular nouns, and the relationship diamonds are labeled with verbs, usually in a top-to-
bottom and left –to-right fashion. For example, in figure 1, a doctor entity. Unlike data flow
diagrams, entity-relationships diagrams depict relationships, not data or information flows.

2
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
DOCTOR

TREAT

PATIENT

Figure 1. In an entity relationship diagram, entities are labeled with singular nouns and relationships
are labeled with verbs. The relationship is interpreted as a simple English sentence.

TYPE OF RELATIONSHIP
Three types of relationships can exist between entities: one-to-one, one-to-many, and
many-to-many.
A one-to-one relationship, abbreviated 1:1, exists when exactly one of the second
entity occurs for each instance of the entity. Figure 2 shows examples of several 1:1
relationship. A number 1 is placed alongside each of the two connecting lines to indicate the
1:1 relationship.

1 1
DOCTOR TREATS PATIENT

1 1
ASSIGNED TO
VEHICLE ID NO. VEHICLE

1 1
SOCIAL ASSIGNED TO
PERSON
SECURITY NO.

1 1
DEPARTMENT CHAIRS
DEPARTMENT
HEAD

Figure 2. Examples of one to one (1:1) relationships.


3
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
A one-to-many relationship, abbreviate 1:M, exists when one occurrence of the first
entity can relate to many instances of the second entity, but each instance of the second
entity can associate with only one instance of the first entity. For example, the relationship
between DEPARTMENT and EMPLOYEE is one-to-many: One Department can have many
employees, but each employee works in only one department at a time. Figure 3 shows
several 1:M relationships. The line connecting the many entity is labeled with the letter M,
and the number 1 labels the others connecting lines. How many is many? The first 1:M
relationship show in figure 3 shows the entities INDIVIDUAL and AUTOMOBILE. One
individual might own five automobiles, or one, or none. Thus, many can mean any number,
including zero.

1 M
DEPARTMENT EMPLOYS EMPLOYEE

1 M
OWN
INDIVIDUAL AUTOMOBILE

1 M
CUSTOMER PLACES
ORDER

1 M
FACULTY ADVISES
STUDENT
ADVISOR

Figure 3. Examples of one-to-many (1:M) relationships.

A many-to-many relationship, abbreviate M:N, exists when one instance of the first
entity can relate to many instances of the second entity, and one instance of the second
entity can relate to many instances of the first entity. The relationship between STUDENT
and CLASS, for example, is many-to-many –- one student can take many classes, and one
class can have many students enrolled. Figure 4 shows several M:N entity relationships. One
of the connecting labeled with the letter M, and the letter N labels the other connection.

4
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
M N
STUDENT ENROLLS CLASS
IN

REGISTRATION

PASSENGER RESERVES FLIGHT


SEAT ON

associative
RESERVATION entity

M N
ORDER LISTS PRODUCT

ORDER LINE

Figure 4. Examples of many-to-many (M:N) relationships. Notice that the event or transaction that
links the two entities is an associative entity with its own set of attributes and characteristics.
CARDINALITY
After an analyst draws an initial ERD, he or she must define the relationships in more detail
by using a technique called cardinality. Cardinality describes the numeric relationship
between two entities and show how instances of one entity relate to instance of another
entity. For example, consider the relationship between two entities: COSTUMER and ORDER.
One customer can have one order, many orders, or none, but each order must have one
and only one customer. An analyst can model this interaction by adding cardinality
notation, which uses special symbols to represent the relationship.
A common method of cardinality notation is called crow’s foot notation because of the
shapes, which include circles, bars, and symbols, that indicate various possibilities. A single
bar indicates one, a double bar indicates one and only one, a circle indicates zero, and a
5
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
crow’s foot indicates many. Figure 5 shows various cardinality symbols, their meanings, and
the UML representations of the relationships. As you learned in Unit 4, the Unified
Modeling Language (UML) is a widely used method of visualizing and documenting
software system design.
Symbol Meaning UML Representation

One and only one 1

One or many 1..*

Zero, or one, or many 0..*

Figure 5. Crow’s foot notation is a common of indicating cardinality. The four examples how you can
use various symbols to describe the relationships between entities.
In Figure 6, four examples of cardinality notation are shown. In the first example, one
and only one CUSTOMER can place anywhere from zero to many of the ORDER entity. In
the second example, one and only one ORDER can include one ITEM ORDERED or many. In
the third example, one and only one EMPLOYEE can have one SPOUSE or none. In the four
example, one EMPLOYEE, or many employees, or none, can be assigned to one PROJECT, or
many projects, or none.
Examples of Cardinality Notation

CUSTOMER PLACES ORDER

One and only one CUSTOMER can place anywhere from zero to many of the ORDER family

ITEM
ORDER INCLUDES ORDERED

One and only one ORDER can include one ITEM ORDERED or many

EMPLOYEE HAS SPOUSE

One and only one EMPLOYEE can have one SPOUSE or NONE
6
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
EMPLOYEE ASSIGNED PROJECT
TO

One EMPLOYEE or many employees, or none, can be assigned to one PROJECT, or many projects, or
NONE
Figure 6. In the first example of cardinality notation, one and only one customer can place anywhere
from zero to many of the ORDER entity. In the second example, one and only one ORDER can include
one ITEM ORDERED or many. In the third example, one and only one EMPLOYEE can have one
SPOUSE OR none. In the fourth example, one EMPLOYEE, or many employees, or none, can be
assigned to one PROJECT, or many, or none.

Most CASE product support the drawing of ERDs from entities in the data repository. Figure
7 shows part of a library system. Notice that crown’s foot notation is use to show the nature
of the relationships, which are described in both directions.

LIBRARY SYSTEM DATA MODEL

Contains

USER

Borrows book during

Is part of
BOOK
CHECKOUT

CHECKOUT
BOOK
LIST
Involves

has
Writes
COPY

AUTHOR

Figure 7. An entity relationship diagram for a library system.

7
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
Figure 8. ERD Diagram

NORMALIZATION
Normalization is the process of creating table design by assigning specific fields or attributes
to each table in the database. A table design specifies the field and identifies the primary
key in a particular table or file. Working with a set of initial table designs, you use
normalization to develop an overall database design that is simple, flexible, and free of data
redundancy. Normalization involves applying a set of rules that can help you identify and
correct inherent problems and complexities in your table design. The concept of
normalization is based on the work of Edgar Codd, a British computer scientist who
formulated the basic principles of relationship database design.
The normalization process typically involves four stages: unnormalized design, first
normal form, second normal form, and third normal form. The three normal forms constitute
a progression in which third normal form represents the best design. Most business-related
database must be designed in third normal form.
STANDARD NOTATION FORMAT
Designing tables is easier if you use a standard notation format to show the table’s
structure, fields, and primary key. The standard notation format in the following example
standard notation format in the following examples starts with the name of the table,
followed by a parenthetical expression that contains the field names separated by commas.
The primary key field(s) is underlined, like this:
NAME (FIELD 1, FIELD 2, FIELD 3,)

8
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
REPEATING GROUPS AND UNNORMALIZED DESIGNS
During data design, you must be able to recognize a repeating group of fields. A repeating
group is a set of one or more fields that can occur any number of times in a single record,
with each occurrence having different values.
Reaping groups often occur in manual documents prepared by users. For example,
consider a school registration from with the student’s information at the top of the form,
followed by a list of courses the student is taking. If you were the design a table base on
this registration form, the courses would represent a repeating of values for each student.
An example of a repeating group is shown in Figure 9. The first two records in the
ORDER table contain multiple products, which present a repeating group of fields. Notice
that in addition to the order number and date, the records with multiple products contain
repetitions of the product number, description, and number ordered. You can think
repeating group as a set of child (subsidiary) records contained within the parent (main)
record.
Record# ORDER ORDER- PRODUCT- PRODUCT- NUM-
DESC ORDERED
NUM DATE NUM
1 40311 03112011 304 All-purpose 7
gadget
683 Assembly 1
684 Super 4
gizmo
2 40312 03112011 128 Steel 12
widget
304 All-purpose 3
gadget
3 40313 03112011 304 All-purpose 144
gadget
Figure 9. In the ORDER table design, records 1 and 2 have repeating groups because they
contain several products. ORDER-NUM is the primary key for the ORDER table, and
PRODUCT-NUM serves as a primary key for the repeating group.

A table design that contains a repeating group is called unnormalized. The standard
notation method for representing an unnormalized design is to enclose the repeating group
of fields within a second set of parentheses. An example of an unnormalized table would
look like this:
NAME (FIELD 1, FIELD 2, FIELD 3, (REPEATING FIELD 1, (REPEATING FIELD 2))
Now review the unnormalized ORDER table design in Figure 9. Following the notation
guidelines, you can describe the design as follows:
ORDER (ORDER-NUM, ORDER-DATE, (PRUDUCT-NUM, PRODUCT-DESC, NUM-
ORDERED))
The notation indicates that the ORDER table design contains five fields, which are listed
within the outer parenthesis. ORDER-NUM field is underlined to show that it is the primary
key. The PRODUCT-NUM, PRODUCT-DESC, and NUM-ORDER fields are enclosed within an
9
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
inner set of parentheses to indicate that they area fields within a repeating group. Notice
that PRODUCT-NUM also is underlined because it acts as the primary key of the repeating
group. If a customer orders three different products in one order, then the fields PRODUCT-
NUM, PRODUCT-DESC, and NUM-ORDERED repeat three times, as shown in Figure 9-22 on
the previous page.
FIRST NORMAL FORM
A table is in first normal form (1NF) if it does not contain a repeating group. To convert
an unnormalized design to 1NF, you must expand the table’s primary key to include the
primary key of the repeating group.
For example, in the ORDER table shown in Figure 9, the repeating group consist of three
fields: PRODUCT-NUM, PRODUCT-DESC, and NUM-ORDERED. Of the three fields, only
PRODUCT-NUM can be a primary key because uniquely identifies each instance of the
repeating group. The product description can be a primary key because it might or might
not be unique. For example, a company might sell a large number of parts of the same
descriptive name, such us washer, relying on a coded part number to identify uniquely each
washier size.
When you expand the primary key of ORDER table to include PRODUCT-NUM, you
eliminate the repeating and the ORDER table is now in 1NF, as shown:
ORDER (ORDER-NUM, ORDER-DATE, PRODECT-NUM, PRODUCT-DESC, NUM-ORDERED)
Figure 10 shown the ORDER table and 1NF. Notice that when you eliminate the repeating
group, additional records emerge -- one for each combination of a specific order and a
specific product. The result is more records, but a greatly simplified design. In the new
version, the repeating group for order number 40311 has become three separate records.
Therefore, when a table is in 1NF, each records stores data about a single instance of a
specific order and a specific product.
Also notice that the 1NF design shown in Figure 10 has a combination primary key. The
primary of the 1NF design cannot be the ORDER-NUM field alone, because the order
number does not uniquely identify each product in a multiple-item order. Similarly,
PRODUCT-NUM cannot be the primary key, because it appears more than once if several
orders the same product. Because the same record must reflect a specific product in a
specific product in a specific order, you need both fields, ORDER-NUM and PRODUCT-NUM,
to identify a single record uniquely. Therefore, the primary key is a combination of two
fields: ORDER-NUM and PRODUCT-NUM.
Record# ORDER ORDER- PRODUCT- PRODUCT- NUM-
NUM DATE NUM DESC ORDERED

1 40311 03112011 304 All-purpose 7


gadget
2 40311 03112011 683 Assembly 1

3 40311 03112011 684 Super gizmo 4


4 40312 03112011 128 Steel widget 12
5 40312 03112011 304 All-purpose 3
gadget
6 40313 03112011 304 All-purpose 144
gadget

10
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
Figure 10. The ORDER table as it appears in 1NF. The repeating groups have been
eliminated. Notice that the repeating group for order 40311 has become three separate
records. The 1NF primary key is a combination of ORDER-NUM and PRODUCT-NUM which
uniquely identifies each record.

SECOND NORMAL FORM


To understand second normal form (2NF), you must understand the concept of functional
dependence. For example, Flied A is functionally dependent on Fields B if the value of
fields A depends on Fields B. For example, in Figure 10, the ORDER-DATE value is
functionally dependent of the ORDER-NUM, because for a specific order number, there can
be only one date. In contrast, a product description is not dependent on the order number.
For a particular order number, there might a several product descriptions – one of each item
ordered.
A table design is in second normal form (2NF) if it is in 1NF and if all fields that are not
part of the primary key are functionally dependent on the entire primary key. If any field in
a 1NF tables depends on only one of the fields in a combination primary key, then the table
is not in 2NF.
Notice that if a 1NF design has a primary key that consist of only one field, the problem
of partial dependence does not arise – because the entire primary key is a single field.
Therefore, a 1NF table with a single-field primary key is automatically in 2NF:
Now reexamine the 1NF design for the ORDER table shown in figure 10:
ORDER (ORDER-NUM, ORDER-DATE, PRODUCT-NUM, PRODUCT-DESC, NUM-ORDERED)
Recall that the primary key is a combination of the order number and the product
number. The NUM-ORDERED field depends on the entire primary key, because NUM-
ORDERED refers to a specific product number, which is only a part of the primary key.
Similarly, the PRODUCT-DESC field depends on the product number, which also is only a
part of the primary key. Because some fields are not dependent on the entire primary key,
the design is not in 2NF.
A standard process exists for converting a table form 1NF to 2NF. The objective is to
break the original table in into two or more new tables and reassign the fields so that each
monkey field will depend on the entire primary key in its table. To accomplish this, you
follow this steps:
1. First, create and name a separate table of each field in the existing primary key. For
example, in Figure 10 on the previous page, the ORDER table’s primary key has two
fields, ORDER-NUM and PRODUCT.NUM, so you must create two tables. The ellipsis
(…) indicates that fields will be assign later. The result is:
ORDER (ORDER-NUM,…)
PRODUCT (PRODUCT-NUM,…)
2. Next, create a new table for each possible combination of the original primary key
fields. In the Figure 10 example, you would create and name a new table with a
combination of primary key of ORDER-NUM and PRODUCT-NUM. This table describe
individual lines in an order, so it is named ORDER-LINE, as shown:
OREDOR-LINE (ORDER-NUM, PRODUCT-NUM,…)
3. Finally, study the three tables and place each field with each appropriate primary
key, which is the minimal key on which it functionally depends. When you finish
placing all the fields, remove any table that did not have any additional fields

11
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
assigned to it. The remaining tables are the 2NF version of your original table. In the
Figure 11 example, the three tables would be shown as:
ORDER (ORDER-NUM, ORDER-DATE)
PRODUCT (PRODUCT-NUM, PRODUCT-DESC)
ORDER-LINE (ORDER-NUM, PRODUCT-NUM, NUM-ORDERED)

Figure 11 shown the 2NF table design. By following the steps, you have the converted the
original 1NF table into three 2NF tables.
Why is it important to move from 1NF to 2NF? Four kinds of problem are found with 1NF
designs that do not exist in 2NF:
• Consider the work necessary to change a particular product’s description.
Suppose 500 current
Orders exist for product number 304. Changing the product description involves modifying
500 records for product number 304. Updating all 500 records would be cumbersome and
expensive.
• 1NF tables can contain inconsistent data. Because someone must enter the
product description in.
Each record, nothing prevents product number 304 from having different product
descriptions in different records. In fact, if product number 304 appears in a large number
of order records, some of the matching product description might be inaccurate or
improperly spelled. Even the presence or absence of a hyphen in the order for All-purpose
gadget would create consistency problems. It the data entry person must enter a term such
as I01 Queue Controller numerous times, it certainly is possible than some inconsistency will
result.
• Adding a new product is a problem. Because the primary key must include
an order number and
A product number, you need values for both fields in order to add a record. What value do
you use for the order number when you went to add a new product that has not been
ordered by any customer? You could use dummy order number, and then replace it with a
real order number when the product is ordered to solve the problem, but the solution also
creates difficulties.
• Deleting a product is a problem. If all the related records are deleted once
an order is filled and
Paid for, what happens if you delete the only record that contains product number 633? The
information about that product number and its description is lost.
Has the 2NF design eliminated all potential problem? To change a product description, now
you can chance just one PRODUCT record. Multiple, inconsistent values for the product
description are impossible because the description appears in only one location. To add a
new product, you simply create a new PRODUCT record, instead of creating a dummy order
record. When you remove the last ORDER-LINE record for a particular product number, you
do not lose that product number and its description because the PRODUCT records still
exist. The four potential problems are eliminated, and the three 2NF designs are superior to
both the original unnormalized table and the 1NF design.

12
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
ORDER IN 2NF

Record# ORDER ORDER-


NUM DATE

1 40311 03112011

2 40311 03112011
3 40311 03112011

PRODUCT IN 2NF

Record# PRODUCT- PRODUCT-DESC


NUM

1 128 Steel Widget


2 304 All-purpose gadget
3 633 Assembly

4 684 Super gizmo

ORDER-LINE IN 2NF

Record# ORDER PRODUCT- NUM-


NUM NUM ORDERED

1 40311 304 7
2 40311 683 1
3 40311 684 4
4 40312 128 12
5 40312 304 3
6 40313 304 144
Figure 11. ORDER, PRODUCT, and ORDERLINE tables in 2NF. All fields are functionally
dependent on the primary key.
THIRD NORMAL FORM
A popular rule of thumb is that the design is in 3NF if every non-key field depends on the
key, the whole key, and nothing but the key. As you will see, a 3NF design avoids
redundancy and data integrity problems that still can exist in 2NF designs.
Consider the following CUSTOMER table design, as shown in Figure 12:
CUSTOMER (CUSTOMER-NUM, CUSTOMER-NAME, ADDRESS, SALES-REP-NUM, SALES-
REP-NAME)

13
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
In 2NF, the non-key field SALES-REP-NAME
is functionally dependent on another non-
key field, SALES-REP-NUM

Record# CUSTOMER- CUSTOMER- ADDRESS SALES- SALES-REP-


NUM NAME REP- NAME
NUM
1 108 Benedict, San Diego, 41 Kaplan, James
Louise CA
2 233 Corelli, Helen Nashua, NH 22 McBride, Jon

3 254 Gomez, J.P. Butte, MT 38 Stein, Ellen


4 431 Lee, M. Snow Camp, 74 Roman, Harold
NC
5 779 Paulski, Diane Lead, SD 38 Stein, Ellen
6 800 Zuider, Z. Greer, SC 74 Roman, Harold
Figure 12. 2NF design for the CUSTOMER table.
The table is in 1NF because it has no repeating groups. The design also is in 2NF
because the primary key is a single field. But the table still has four potential products
numbers similar to the four 1NF problems described earlier. Changing the name of a sales
rep still requires changing every record in which that sales rep name appears. Nothing about
the design prohibits a particular sales rep from having different names and different records.
In additional, because the sales rep name is included in the CUSTOMER table, you must
create a dummy CUTOMER record to add new sales rep who has not yet been assigned any
customers. Finally, if you delete all the records for customers of sales rep number 22, you
well lose that sales rep’s number and name.
Those potential problems are caused because the design is not in 3NF. A table design is
in third normal form (3NF) if it is in 2NF and if no non-key fields is dependent on other
non-key field. Remember that a non-key field is a field that is not a candidate key for the
primary key. The CUSTOMER example in Figure 12 is not in 3NF because one non-key field,
SALES-REP-NAME, depends on another non-key field, SALES-REP-NUM.
To convert the table to 3NF, you must remove all fields from the 2NF table that depend
on another non-key field and place them in a new table that uses the non-key field as a
primary key. In the CUSTOMER example, the SALES-REP-NAME field depends on another
field, SALES-REP-NUM, which is not part of the primary key. Therefore, to reach 3NF, you
most remove SALES-REP-NAME and place it into a new table that uses SALES-REP-NUM as
the primary key. As shown in Figure 13, the third normal form produces two separate
tables:
CUSTOMER (CUSTOMER-NUM, CUSTOMER-NAME, ADDRESS, SALES REP NUM)
SALES (SALES-REP-NUM, SALES-REP-NAME)

14
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
Record# CUSTOMER- NUM CUSTOMER- ADDRESS SALES-REP-
NAME NUM
1 108 Benedict, Louise San Diego, CA 41
2 233 Corelli, Helen Nashua, NH 22
3 254 Gomez, J.P. Butte, MT 38
4 431 Lee, M. Snow Camp, 74
NC
5 779 Paulski, Diane Lead, SD 38
6 800 Zuider, Z. Greer, SC 74

Record# SALES-REP- SALES-REP-


NUM NAME
1 41 Kaplan, James

2 22 McBride, Jon
3 38 Stein, Ellen
4 74 Roman, Harold
5 38 Stein, Ellen
6 74 Roman, Harold

Figure 13. When the CUSTOMER table is transformed from 2NF to 3NF, the result is two
tables: CUSTOMER AND SALES-REP.

A NORMALIZATION EXAMPLE
To show the normalization process, consider the familiar situation in which a faculty advisor,
who represents an entity can advise many students, each of whom can register for one or
many courses. It depicts several entities in a school advising system: ADVISER, COURSE and
STUDENT.
Before you start the normalization process, your notice that the STUDENT table contains
fields that relate to the ADVISORY and COURSE entities, so you decide to begin with the
initial design for the STUDENT table, which is shown in Figure 14. Notice that the table
design includes the student number, student name, total credits taken, grade point average
(GPA), advisor number, advisor name, and, for every course the student has taken, the
course number, course description, number of credits, and grade received.
The STUDENT table in Figure 14 is unnormalized, because it has a repeating group. The
STUDENT table design can be written as:
STUDENT (STUDENT-NUMBER, STUDENT-NAME, TOTAL-CRIDETS, GPA, ADVISOR-
NUMBER, ADVISOR-NAME, (COURSE-NUMBER, COURSE-DESC, NUM-CREDITS, GRADE))

15
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
STUDENT STUDENT TOTAL- GPA ADVISOR ADVISOR COURSE COURSE- NUM- GRADE
-NO. -NAME CREDITS -NO. -NAME - NO. DESC. CREDITS

1035 Linda 47 3.647 49 Smith CSC151 Computer 4 B


Science 1
MKT212 Marketing 3 A
Mgt
ENG101 English 3 B
Composition
CHM112 General 4 A
Chemistry 1

BUS105 Intro. To 2 A
Business

3397 Sam 29 3.000 49 Smith ENG101 English 3 A


Composition

MKT212 Marketing 3 C
Mgt

CSC151 Computer 4 B
Science 1

4070 Kelly 14 2.214 23 Jones CSC151 Computer 4 B


Science 1

CHM112 General 4 C
Chemistry 1

ENG101 English 3 C
Composition

BUS105 Intro. To 2 C
Business

Figure 14. The STUDENT table is unnormalized because it contains a repeating group that
represents the courses each student has taken.
To convert the STUDENT record to 1NF, you must expand the primary key to include the
key of the repeating group, producing:
STUDENT (STUDENT-NUMBER, STUDENT-NAME, TOTAL-CRIDETS, GPA, ADVISOR-
NUMBER, ADVISOR-NAME, (COURSE-NUMBER, COURSE-DESC, NUM-CREDITS, GRADE))

STUDENT STUDENT TOTAL- GPA ADVISOR- ADVISOR COURSE COURSE- NUM- GRADE
-NO. -NAME CREDITS NO. -NAME - NO. DESC. CREDITS

1035 Linda 47 3.647 49 Smith CSC151 Computer 4 B


Science 1
1035 Linda 47 3.647 49 Smith MKT212 Marketing 3 A
Mgt
1035 Linda 47 3.647 49 Smith ENG101 English 3 B
Composition
1035 Linda 47 3.647 49 Smith CHM112 General 4 A
Chemistry 1

1035 Linda 47 3.647 49 Smith BUS105 Intro. To 2 A


Business

3397 Sam 29 3.000 49 Smith ENG101 English 3 A


Composition

16
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
3397 Sam 29 3.000 49 Smith MKT212 Marketing 3 C
Mgt

3397 Sam 29 3.000 49 Smith CSC151 Computer 4 B


Science 1
4070 Kelly 14 2.214 23 Jones CSC151 Computer 4 B
Science 1

4070 Kelly 14 2.214 23 Jones CHM112 General 4 C


Chemistry 1
4070 Kelly 14 2.214 23 Jones ENG101 English 3 C
Composition

4070 Kelly 14 2.214 23 Jones BUS105 Intro. To 2 C


Business

Figure 15. The STUDENT table in 1NF. Notice that the primary key has been expanded to
include STUDENT-NUMBER and COURSE- NUMBER. Also, the repeating group has been
eliminated.
Figure 15 shows the 1NF version of the simple STUDENT data. Do any of the fields in the
1NF STUDENT record depend on only a portion of primary key? The student name, total
credits, GPA, advisor number, and advisor name all relate only to the student number and
have not relationship to the course number. The course description depends on the course
number, but not on the student number. Only the GRADE field depends on the entire
primary key.
Following the 1NF-2NF conversion process describe earlier, you would create a new
table for each field and combination of fields in the primary key, and place the other fields
with their appropriate key. The result is:
STUDENT (STUDENT-NUMBER, STUDENT-NAME, TOTAL-CRIDETS, GPA, ADVISOR-
NUMBER, ADVISOR-NAME)
COURSE (COURSE-NUMBER, COURSE-DESC, NUM-CREDITS)
GRADE (STUDENT-NUMBER, COURSE-NUMBER, GRADE)

STUDENT- STUDENT- TOTAL- GPA ADVISOR- ADVISOR-


NO. NAME CREDITS NO. NAME
1035 Linda 47 3.647 49 Smith
3397 Sam 29 3.000 49 Smith
3397 Sam 29 3.000 49 Smith
4070 Kelly 14 2.214 23 Jones

COURSE- COURSE- DESC. NUM-


NO. CREDITS
BUS105 Intro. To Business 2
CHM112 General Chemistry 1 4
ENG101 English Composition 3
CSC151 Computer Science 1 4

17
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
ENG101 English Composition 3
MKT212 Marketing Mgt 3

STUDENT- COURSE- GRADE


NO. NO.
1035 CSC151 B
1035 MKT212 A
1035 ENG101 B
1035 CHM112 A
1035 BUS105 A
3397 ENG101 A
3397 MKT212 C
3397 CSC151 B
4070 CSC151 B
4070 CHM112 C
4070 ENG101 C
4070 BUS105 C
Figure 16. STUDENT, COURSE, and GRADE tables in 2NF. Notice that all fields are
functionally dependent on the entire primary key of their respective tables.
You now have converted the original 1NF STUDENT table to three tables, all in 2NF. In
each table, every non-key fields depends on the entire primary key.
Figure 16 shows the 2NF STUDENT, COURSE and GRADE, and designs and sample data.
Are all three tables and three 3NF? The COURSE and GRADE are in 3NF? STUDENT is not
3NF, however, because the ADVISOR-NAME field depends on the ADVISOR-NUMBER field
which is not part of the STUDENT primary key. To convert STUDENT to 3NF, you remove
the AVISOR-NAME field from the STUDENT table and place it into a table with ADVISOR-
NAME as the primary key.
Figure 17 shows the 3NF versions of the sample data for STUDENT, AVISOR, COURSE
and GRADE. The final 3NF design is:
STUDENT (STUDENT-NUMBER, STUDENT-NAME, TOTAL-CREDITS, GPA, ADVISOR-
NUMBER)
ADVISOR (ADVISOR-NUMBER, ADVISOR-NAME)
COURSE (COURSE-NUMBER, COURSE-DESC, NUM-CREDITS)
GRADE (STUDENT-NUMBER, COURSE-NUMBER, GRADE

STUDENT- STUDENT- TOTAL- GPA ADVISOR-


NO. NAME CREDITS NO.
1035 Linda 47 3.647 49
3397 Sam 29 3.000 49
4070 Kelly 14 2.214 23

18
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
ADVISOR- ADVISOR-
NO. NAME
49 Smith
49 Smith
49 Smith
23 Jones

COURSE- COURSE- DESC. NUM-


NO. CREDITS
BUS105 Intro. To Business 2
CHM112 General Chemistry 1 4
ENG101 English Composition 3
CSC151 Computer Science 1 4
ENG101 English Composition 3
MKT212 Marketing Mgt 3

STUDENT- COURSE- GRADE


NO. NO.
1035 CSC151 B
1035 MKT212 A
1035 ENG101 B
1035 CHM112 A
1035 BUS105 A
3397 ENG101 A
3397 MKT212 C
3397 CSC151 B
4070 CSC151 B
4070 CHM112 C
4070 ENG101 C
4070 BUS105 C
Figure 17. STUDENT, ADVISOR, COURSE, and GRADE tables in 3NF. When STUDENT table
is transformed from 2NF to 3NF, the result is two tables: STUDENT and ADVISOR.
Figure 18 shows the complete ERD after normalization. Now there are four entities:
STUDENT, AVISORY, COURSE, and GRADE, which is an associative entity, you can see that
the M:N relationship STUDENT and COURSE has been converted into to 1:M relationships:
one relationship between STUDENT and GRADE and the other relationship between COURSE
and GRADE.

19
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
Advises
ADVISOR STUDENT

Receives

Shows
COURSE COURSE

Figure 18. The entity-relationship diagram for STUDENT, ADVISOR, and COURSE after
normalization. The GRADE entity was identified during normalization process. GRADE is an
associative entity that links the STUDENT and COURSE tables.
To create 3NF designs, you must understand the nature of first, second, and third normal
forms. In your work as a system analyst, you were encounter designs that are much more
complex than the examples in this chapter. You also should know that normal forms beyond
3NF exists, but they rarely are use and business-oriented systems.
The Data Dictionary
A data dictionary is a specialized application of the kinds of dictionaries used as references
in everyday life. A data dictionary is a reference work of data about data (that is, metadata).
System analysts compile data dictionaries to guide them through analysis and design. A data
dictionary is a document that collects and coordinates specific data terms, and it confirms
what each term means to different people in the organization. One important reason for
maintaining a data dictionary is to keep clean data. This means that data must be
consistent. If you store data about man’s sex as “M” in one record, “Male” in a second
record, and as the number 1 in a third record, the data are not clean. Keeping a data
dictionary will help in this regard.
Automated data dictionaries are valuable for their capacity to cross-reference data items,
thereby allowing necessary program changes to all programs that share a common element.
This feature supplants changing programs on a haphazard basis, and it prevents waiting
until the program won’t run because a change has not been implemented across all
programs sharing the updated item. Clearly, automated data dictionaries are important for a
large system that produce several thousand data elements requiring cataloging and cross-
referencing.
Need for Understanding the Data Dictionary
Many database management systems now come equipped with an automated data
dictionary. These dictionaries can be either elaborate or simple. Some computerized data
dictionaries automatically catalog data items when programming is done; others simply
provide a template to prompt the person filling in the dictionary to do so in a uniform
manner for every entry.
Despite the existence of automated data dictionaries, a systems analyst should understand
what data compose a data dictionary, the conventions used in data dictionaries, and how a
data dictionary is developed. Understanding the process of compiling a data dictionary can
aid a systems analyst in conceptualizing the system and how it works. The upcoming
sections allow the system analyst to see the rationale behind what exists in automated data
dictionaries.
20
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
In addition, to providing documentation and eliminating redundancy, a data dictionary may
be used to:
1. Validate the data flow diagram for completeness and accuracy.
2. Provide a starting point for developing screens and reports.
3. Determine the contents of data stored in files.
4. Develop the logic for data flow diagram processes.
5. Create XML (Extensible Markup Language).
Whereas a data dictionary contains information about data and procedures, a larger
collection of project information is called a repository. One of the benefits of using CASE tool
to develop the data dictionary is the ability to develop a repository, or a shared collection of
project information and team contributions. The repository may contain the following:
1. Information about data maintained by the system including data flows, data stores,
record structures, elements, entities, and messages.
2. Procedural logic and use cases
3. Screen and report design
4. Data relationships, such as how one data structure is linked to another
5. Project requirements and final system deliverables
6. Project management information, such as delivery schedule, achievements, issues
that need resolving, and project users
The data dictionary is created by examining and describing the contents of the data flows,
data stores, and processes. Each data store and data flow should be defined and then
expanded to include the details of the elements it contains. The logic of each process should
be described using the data flowing into or out of the process. Omissions and other design
errors should be noted and resolved.

LET’S DO
IT

Name: ____________________Course,Year, & Section:_________Date Accomplished: ____

Activity Title: Entity Relationship Diagram and Normalization


Topic: University Library System Database
Date:
🎯 Objective
To apply concepts of Entity Relationship Modeling and Database Normalization (up to 3NF)
using a real-world scenario.
📚 Scenario: University Library System
The university library maintains information about books, students, and borrowing
transactions.
- Each book has a unique Book ID, Title, Author, Publisher, and Year Published.
- Each student has a Student ID, Name, Course, and Year Level.
21
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU
- A borrowing transaction records which student borrowed which book, along with the Date
Borrowed and Date Returned.
- A student can borrow multiple books; a book can be borrowed multiple times.
🛠️ Instructions:
Part I: Entity Identification
Identify and list at least three entities from the scenario. For each entity, list:
- Attributes
- Primary Key (PK)
Part II: Draw the ERD
Create an Entity Relationship Diagram (ERD) that includes:
- Entities
- Attributes
- Primary and Foreign Keys
- Relationships with cardinalities
You may use software like Draw.io, Lucidchart, or Microsoft Visio in creating ERD.
Part III: Unnormalized Form (UNF)
Simulate a single flat table that contains all the information. Include sample fields and data.
Part IV: Normalization Process
Normalize the UNF to:
- First Normal Form (1NF): Remove repeating groups.
- Second Normal Form (2NF): Remove partial dependencies.
- Third Normal Form (3NF): Remove transitive dependencies.
For each step, show the resulting tables and indicate:
- Table name
- Attributes
- Primary and Foreign Keys
📑 Output Requirements:
- ERD diagram (digital drawing)
- Table structures (from UNF to 3NF)
- Short explanation per normalization step

22
Lecture Notes in CC 207 – System Analysis and Design
Property of WVSU

You might also like