0% found this document useful (0 votes)
26 views

SWE DataBase

The document provides an introduction to database management systems (DBMS). It discusses that a DBMS allows for the creation, maintenance, and use of databases through various computer programs. It also defines what a database is as an organized collection of related data for a particular purpose. The document then summarizes some key functions of a DBMS, including allowing multiple users to access the same database concurrently and providing query languages to simplify writing application programs and accessing information. It also briefly outlines some common database models and classifications of DBMSs.

Uploaded by

nkoloamandine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
26 views

SWE DataBase

The document provides an introduction to database management systems (DBMS). It discusses that a DBMS allows for the creation, maintenance, and use of databases through various computer programs. It also defines what a database is as an organized collection of related data for a particular purpose. The document then summarizes some key functions of a DBMS, including allowing multiple users to access the same database concurrently and providing query languages to simplify writing application programs and accessing information. It also briefly outlines some common database models and classifications of DBMSs.

Uploaded by

nkoloamandine
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 56

IUC DOUALA-CAMEROON

INTRODUCTION
TO DATABASE
HND-Common Course
Year I / Semester II

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


TATSOPTEU E. ENDELLY [email protected]
2019/2020 Academic Year
INTRODUCTION
A database-management system (DBMS) is a software package with
computer programs that control the creation, maintenance, and the use
of a database, examples are Microsoft Access, MySQL, Oracle,
PostgreSQL …. A database is an organization of data related to a
particular subject or purpose so that the data can be retrieved or
processed. The primary goal of a DBMS is to provide a way to store and
retrieve database information that is both convenient and efficient. By
data, we mean known facts that can be recorded and that have implicit
meaning. For example, consider the names, telephone numbers, and
addresses of the people you know. You may have recorded this data in an
indexed address book, or you may have stored it on a flash, using a
personal computer and software such as Microsoft ACCESS, or EXCEL. The
database and DBMS software together are called as Database system.
Database systems are designed to manage large bodies of information.
Management of data involves both defining structures for storage of
information and providing mechanisms for the manipulation of
information. In addition, the database system must ensure the safety
of the information stored, despite system crashes or attempts at
unauthorized access. If data are to be shared among several users, the
system must avoid possible anomalous results.

Database System
Functions of a data base management system
• It allows organizations to conveniently develop databases for various
applications by database administrators (DBAs) and other
specialists.
• A DBMS allows different user application programs to concurrently
access the same database. DBMSs may use a variety of database
models, such as the relational model or object model, to
conveniently describe and support applications.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


1
TATSOPTEU E. ENDELLY [email protected]
• It typically supports query languages, which are in fact high-level
programming languages, dedicated database languages that
considerably simplify writing database application programs.
• Database languages also simplify the database organization as well
as retrieving and presenting information from it.
• A DBMS provides facilities for controlling data access, enforcing data
integrity, managing concurrency control, recovering the database
after failures and restoring it from backup files, as well as
maintaining database security.
Various kinds of interactions catered by DBMS
The various kind of interactions catered by DBMS are:
• Data definition
• Update
• Retrieval
• Administration
Some Applications of DBMS
• Banking
• Airlines
• Universities
• Credit card transactions
• Tele communication
• Finance
• Sales
• Manufacturing
• Human resources
Disadvantages of file processing system
The disadvantages of file processing systems are:
• Data redundancy & inconsistency.
• Difficult in accessing data.
• Data isolation.
• Data integrity problems.
• Atomicity problems
• Concurrent access is not possible.
• Security Problems.
Database Instances and Schemas
Databases change over time as information is inserted and deleted. The
collection of information stored in the database at a particular moment is
called an instance of the database. The overall design of the database
is called the database schema.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


2
TATSOPTEU E. ENDELLY [email protected]
Database systems have several schemas, partitioned according to the
levels of abstraction. The physical schema describes the database
design at the physical level, while the logical schema describes the
database design at the logical level. A database may also have several
schemas at the view level, sometimes called sub schemas, that describe
different views of the database.
Of these, the logical schema is by far the most important, in terms of its
effect on application programs, since programmers construct applications
by using the logical schema. The physical schema is hidden beneath the
logical schema, and can usually be changed easily without affecting
application programs. Application programs are said to exhibit physical
data independence if they do not depend on the physical schema, and
thus need not be rewritten if the physical schema changes.

DBMS ARCHITECTURE AND DATA INDEPENDENCE


Three important characteristics of the database approach are (1)
insulation of programs and data (program-data and program-operation
independence); (2) support of multiple user views; and (3) use of a
catalog to store the database description - schema (A catalog is a table
that contains the information such as structure of each file, the type and
storage format of each data item and various constraints on the data. The
information stored in the catalog is called Metadata). Here, we specify
an architecture for database systems, called the three-schema
architecture, which was proposed to help achieve and visualize these
characteristics. We then discuss the concept of data independence.
The Three-Schema Architecture
The goal of the three-schema architecture, illustrated below, is to separate
the user applications and the physical database. In this architecture,
schemas can be defined at the following three levels:
• The internal level has an internal schema, which describes the
physical storage structure of the database. The internal schema uses
a physical data model and describes the complete details of data
storage and access paths for the database.
• The conceptual level has a conceptual schema, which
describes the structure of the whole database for a community of
users. The conceptual schema hides the details of physical storage
structures and concentrates on describing entities, data types,
relationships, user operations, and constraints. A high-level data
model or an implementation data model can be used at this level.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


3
TATSOPTEU E. ENDELLY [email protected]
• The external or view level includes a number of external schemas
or user views. Each external schema describes the part of the
database that a particular user group is interested in and hides the
rest of the database from that user group. A high-level data model
or an implementation data model can be used at this level.

The Three-level Architecture


Notice that the three schemas are only descriptions of data; the only data
that actually exists is at the physical level. In a DBMS based on the three-
schema architecture, each user group refers only to its own external
schema. Hence, the DBMS must transform a request specified on an
external schema into a request against the conceptual schema, and then
into a request on the internal schema for processing over the stored
database. If the request is a database retrieval, the data extracted from
the stored database must be reformatted to match the user’s external
view. The processes of transforming requests and results between levels
are called mappings.
Data Independence
The three-schema architecture can be used to explain the concept of data
independence, which can be defined as the capacity to change the
schema at one level of a database system without having to change the
schema at the next higher level. We can define two types of data
independence:

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


4
TATSOPTEU E. ENDELLY [email protected]
• Logical data independence is the capacity to change the
conceptual schema without having to change external schemas or
application programs. We may change the conceptual schema to
expand the database (by adding a record type or data item), or to
reduce the database (by removing a record type or data item). In
the latter case, external schemas that refer only to the remaining
data should not be affected. Only the view definition and the
mappings need be changed in a DBMS that supports logical data
independence. Application programs that reference the external
schema constructs must work as before, after the conceptual
schema undergoes a logical reorganization. Changes to constraints
can be applied also to the conceptual schema without affecting the
external schemas or application programs.
• Physical data independence is the capacity to change the
internal schema without having to change the conceptual (or
external) schemas. Changes to the internal schema may be needed
because some physical files had to be reorganized—for example, by
creating additional access structures—to improve the performance
of retrieval or update. If the same data as before remains in the
database, we should not have to change the conceptual schema.

Types of Database System


Several criteria are normally used to classify DBMSs.
• The first is the data model on which the DBMS is based. The main
data model used in many current commercial DBMSs is the
relational data model. The object data model was
implemented in some commercial systems but has not had wide
spread use. Many legacy (older) applications still run on database
systems based on the hierarchical and network data models.
The relational DBMSs are evolving continuously, and, in particular,
have been incorporating many of the concepts that were developed
in object databases. This has led to a new class of DBMSs called
object-relational DBMSs. We can hence categorize DBMSs based
on the data model: relational, object, object-relational,
hierarchical, network, and other.
• The second criterion used to classify DBMSs is the number of users
supported by the system. Single-user systems support only one
user at a time and are mostly used with personal computers.
Multiuser systems, which include the majority of DBMSs, support
multiple users concurrently.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


5
TATSOPTEU E. ENDELLY [email protected]
• A third criterion is the number of sites over which the database is
distributed. A DBMS is centralized if the data is stored at a single
computer site. A centralized DBMS can support multiple users, but
the DBMS and the database themselves reside totally at a single
computer site. Below is a diagram of a centralized DBMS.

A distributed DBMS(DDBMS) can have the actual database and


DBMS software distributed over many sites, connected by a
computer network. Homogeneous DDBMSs use the same DBMS
software at multiple sites. A homogenous database is one that
uses the same DBMS at each node. A heterogeneous database
is one that may have a different DBMS at each node. A recent trend
is to develop software to access several autonomous pre-existing
databases stored under heterogeneous DDBMSs. This leads to a
federated DBMS (or multidata base system), in which the
participating DBMSs are loosely coupled and have a degree of local
autonomy. Many DBMSs use a client-server architecture. Below is a
distributed DBMS.

Types Database Users


End-users are the people whose jobs require access to the database for
querying, updating, and generating reports, the database primarily exists
for their use. The different types of end-users are:
• Casual end-users: occasionally access the database, need
different information each time.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


6
TATSOPTEU E. ENDELLY [email protected]
• Naive or Parametric end-users: includes tellers, clerks, etc.,
make up a sizable portion of database end-users, main job function
revolves around constantly querying and updating the database.
• Sophisticated end-users: includes engineers, scientists, business
analyst, etc., use for their complex requirements.
• Stand-alone users: maintain personal databases by using ready-
made program packages, provide easy-to-use menu-based or
graphics-based interfaces.
• Database Administrator (DBA): a person or a group of people,
responsible for designing & managing the database system. i.e.
authorizing the access to the database, monitoring its use and
managing all the resource to support the use of the whole database
system. The responsibilities of DBA and database designer are:
1. Planning for the database's future storage requirements.
2. Defining database availability and fault management
architecture.
3. Defining and creating environments for development
and new release installation.
4. Creating physical database storage structures after developers
have designed an application
5. Constructing the database.
6. Determining and setting the size and physical locations of data files.
7. Evaluating new hardware and software purchase.
8. Providing database design and implementation.

1. FUNDAMENTAL OBJECTIVES/CHARACTERISTICS OF DATABASE


The database approach has some very characteristic features which are
discussed in detail below:
• Less redundancy: Data redundancy means unnecessary
duplication of data (the same data stored in more than one table).
In file processing systems there is redundancy of data, but in DBMS
we can reduce data redundancy by means of normalization process,
without affecting the original data. The goal of normalization is to
reduce and even eliminate data redundancy, an important
consideration for databases designers. There are two goals of
the normalization process: eliminating redundant data and
ensuring data dependencies make sense (only storing related
data in a table). Both of these are worthy goals as they reduce the

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


7
TATSOPTEU E. ENDELLY [email protected]
amount of space a database consumes and ensure that data is
logically stored.

The same information may be duplicated in several places (files).


For example, the address and telephone number of a particular
customer may appear in a file that consists of savings-account
records and in a file that consists of checking-account records. This
redundancy leads to higher storage and access cost. In addition, it
may lead to data inconsistency; that is, the various copies of the
same data may no longer agree. For example, a changed customer
address may be reflected in savings-account records but not
elsewhere in the system.
• ACID Properties: ACID (an acronym for Atomicity Consistency
Isolation Durability) is a concept that database professionals
generally look for while evaluating relational databases and
application architectures. For a reliable database, all four of these
attributes should be achieved:
1. Atomicity is an all-or-none rule for database modifications.
2. Consistency guarantees that a transaction never leaves your
database in a half-finished state.
3. Isolation keeps transactions separated from each other until
they are finished.
4. Durability guarantees that the database will keep trac of
pending changes in such a way that the server can recover
from an abnormal termination and committed transactions will
not be lost.
• Multiuser and Concurrent Access: A database system allows
several users to access the database concurrently. Answering
different questions from different users with the same (base) data
is a central aspect of an information system. Such concurrent use of
data increases the economy of a system. An example for concurrent

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


8
TATSOPTEU E. ENDELLY [email protected]
use is the flight database of an airline agency. The employees of
different branches can access the database concurrently and book
flights for their clients. Each travel agent sees on his interface if
there are still seats available for a specific flight or if it is already
fully booked.

Concurrency control is the process managing simultaneous


operations against a database so that database integrity is no
compromised.
• Confidentiality/integrity: Data integrity is a by word for the
quality and the reliability of the data of a database system. In a
broader sense data integrity includes also the protection of the
database from unauthorised access (confidentiality) and
unauthorised changes. Data reflect facts of the real world.
• Data Persistence: Data persistence means that in a DBMS all data
is maintained as long as it is not deleted explicitly. The life span of
data needs to be determined directly or indirectly by the user and
must not be dependent on system features. Additionally, data once
stored in a database must not be lost. Changes of a database which
are done by a transaction are persistent. When a transaction is
finished even a system crash cannot put the data in danger.
• Transactions: A transaction is a bundle of actions which are done
within a database to bring it from one consistent state to a new
consistent state. In between the data are inevitable inconsistent.
A transaction is atomic what means that it cannot be divided up
any further. Within a transaction all or none of the actions need to
be carried out. Doing only a part of the actions would lead to an
inconsistent database state. One example of a transaction is the
transfer of an amount of money from one bank account to another.
The debit of the money from one account and the credit of it to
another account makes together a consistent transaction. This
transaction is also atomic. The debit or credit alone would both lead
to an inconsistent state. After finishing the transaction (debit and

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


9
TATSOPTEU E. ENDELLY [email protected]
credit) the changes to both accounts become persistent and the one
who gave the money has now less money on his account while the
receiver has now a higher balance.

2. DATA MODELS
Data Models
Underlying the structure of a database is the data model: a collection of
conceptual tools for describing data, data relationships, data semantics,
and consistency constraints. The data structures include the data
objects, the associations between data objects, and the rules which
govern operations on the objects. As the name implies, the data model
focuses on what data is required and how it should be organized rather
than what operations will be performed on the data. To use a common
analogy, the data model is equivalent to an architect's building plans.
A data model is independent of hardware or software constraints . Rather
than try to represent the data as a database would see it, the data model
focuses on representing the data as the user sees it in the "real world". It
serves as a bridge between the concepts that make up real-world events
and processes and the physical representation of those concepts in a
database. To illustrate the concept of a data model, we outline two data
models in this section: the entity-relationship model and the
relational model. Both provide a way to describe the design of a
database.
Methodology
There are two major methodologies used to create a data model:
• the Entity-Relationship (ER) approach and
• the Object Model. This course uses the Entity-Relationship approach.

Data Modelling In the Context of Database Design


Database design is defined as: "design the logical and physical structure
of one or more databases to accommodate the information needs of the
users in an organization for a defined set of applications". The design
process roughly follows five steps:
1. Planning and analysis
2. Conceptual design
3. Logical design
4. Physical design

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


10
TATSOPTEU E. ENDELLY [email protected]
5. Implementation
Components of a Data Model
The data model gets its inputs from the planning and analysis stage.
Here the modeler, along with analysts, collects information about the
requirements of the database by reviewing existing documentation
and interviewing end-users.
The data model has two outputs. The first is an entity-relationship
diagram which represents the data structures in a pictorial form. Because
the diagram is easily learned, it is valuable tool to communicate the model
to the end-user. The second component is a data document. This a
document that describes in detail the data objects, relationships, and rules
required by the database. The dictionary provides the detail required by
the database developer to construct the physical database.

A. Entity-Relationship Model
The ER model is a conceptual data model that views the real-world as
entities and relationships. A basic component of the model is the Entity-
Relationship diagram which is used to visually represents data objects.
The utility of the ER model is:
• It maps well to the relational model. The constructs used in the ER
model can easily be transformed into relational tables.
• It is simple and easy to understand with a minimum of training.
Therefore, the model can be used by the database designer to
communicate the design to the end user.
• In addition, the model can be used as a design plan by the database
developer to implement a data model in a specific database
management software.

Basic Constructs of Entity Relationship Modelling


The ER model views the real world as a construct of entities and
association between entities.

(i) Entities
Entities are the principal data object about which information is to be
collected. Entities are usually recognizable concepts, either concrete or
abstract, such as person, places, things, or events which have relevance

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


11
TATSOPTEU E. ENDELLY [email protected]
to the database. Some specific examples of entities are EMPLOYEES,
PROJECTS, INVOICES. An entity is analogous to a table in the relational
model. Entities are classified as independent or dependent (in some
methodologies, the terms used are strong and weak, respectively). An
independent entity is one that does not rely on another for
identification. A dependent entity is one that relies on another for
identification. An entity occurrence (also called an instance) is an
individual occurrence of an entity. An occurrence is analogous to a row in
the relational table. An Entity type defines a collection of entities that
have the same attributes. While the set of all entities of the same type is
termed as an entity set.

Customer is a strong entity type, an identifying entity for order, order a weak
entity type and cust-order is an identifying relationship.

Special Entity Types


• Associative entities (also known as intersection entities) are
entities used to associate two or more entities in order to reconcile
a many-to-many relationship.

An entity type that represents a relationship type


• Subtypes entities are used in generalization hierarchies to
represent a subset of instances of their parent entity, called the
supertype, but which have attributes or relationships that apply only
to the subset.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


12
TATSOPTEU E. ENDELLY [email protected]
Generalization Hierarchies
A generalization hierarchy is a form of abstraction that specifies that two
or more entities that share common attributes can be generalized into a
higher-level entity type called a supertype or generic entity. The lower-
level of entities become the subtype, or categories, to the supertype.
Subtypes are dependent entities.
Generalization occurs when two or more entities represent categories of
the same real-world object. For example, Wages_Employeesand
Classified_Employees represent categories of the same entity, Employees.
In this example, Employees would be the supertype; Wages_Employees
and Classified_Employees would be the subtypes.
Subtypes can be either mutually exclusive (disjoint) or overlapping
(inclusive). A mutually exclusive category is when an entity instance can
be in only one category. The above example is a mutually exclusive
category. An employee can either be wages or classified but not both. An
overlapping category is when an entity instance may be in two or more
subtypes. An example would be a person who works for a university could
also be a student at that same university. The completeness constraint
requires that all instances of the subtype be represented in the supertype.
Generalization hierarchies can be nested. That is, a subtype of one
hierarchy can be a supertype of another. The level of nesting is limited
only by the constraint of simplicity. Subtype entities may be the parent
entity in a relationship but not the child.

Car and Truck share common attributes generalized into a higher-level Vehicle
(ii) Relationships
A Relationship represents an association between two or more entities. An
example of a relationship would be:
• Employees are assigned to projects
• Projects have subtasks
COURSE FACILITATORS: NYAM STEPHANIE [email protected] &
13
TATSOPTEU E. ENDELLY [email protected]
• Departments manage one or more projects
Classifying Relationships
Relationships are classified by their degree, connectivity, cardinality,
direction, type, and existence. Not all modelling methodologies use all
these classifications.
• Degree of a Relationship: The degree of a relationship is the
number of entities associated with the relationship. The n-ary
relationship is the general form for degree n. Special cases are the
binary, and ternary, where the degree is 2, and 3, respectively.
Binary relationships, the association between two entities is the
most common type in the real world. A recursive binary
relationship occurs when an entity is related to itself. An example
might be "some employees are married to other employees". A
ternary relationship involves three entities and is used when a
binary relationship is inadequate. Many modelling approaches
recognize only binary relationships. Ternary or n-ary relationships
are decomposed into two or more binary relationships.
• Connectivity and Cardinality: The connectivity of a relationship
describes the mapping of associated entity instances in the
relationship. The values of connectivity are "one" or "many". The
cardinality of a relationship is the actual number of related
occurrences for each of the two entities. The basic types of
connectivity for relations are: one-to-one, one to-many, and many-
to-many.
- A one-to-one(1:1) relationship is when at most one instance
of an entity A is associated with one instance of entity B. For
example, "employees in the company are each assigned their
own office". That is, for each employee there exists a unique
office and for each office there exists a unique employee.
- A one-to-many(1:N) relationships is when for one instance
of entity A, there are zero, one, or many instances of entity B,
but for one instance of entity B, there is only one instance of
entity A. An example of a 1:N relationships is "A department has
many employees". i.e. Each employee is assigned to one
department.
- A many-to-many(M:N) relationship is when for one instance
of entity A, there are zero, one, or many instances of entity B and

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


14
TATSOPTEU E. ENDELLY [email protected]
for one instance of entity B there are zero, one, or many instances
of entity A. An example is: "employees can be assigned to no
more than two projects at the same time; projects must have
assigned at least three employees". i.e. A single employee can be
assigned to many projects; conversely, a single project can have
assigned to it many employee. Here the cardinality for the
relationship between employees and projects is two and the
cardinality between project and employee is three. Many-to-
many relationships cannot be directly translated to relational
tables but instead must be transformed into two or more one-to-
many relationships using associative entities.

Frequently used cardinalities


• Direction: The direction of a relationship indicates the originating
entity of a binary relationship. The entity from which a relationship
originates is the parent entity; the entity where the relationship
terminates is the child entity. The direction of a relationship is
determined by its connectivity. In a one-to-one relationship the
direction is from the independent entity to a dependent entity. If
both entities are independent, the direction is arbitrary. With one-
to-many relationships, the entity occurring once is the parent. The
direction of many-to-many relationships is arbitrary.
• Type: An identifying relationship is one in which one of the child
entities is also a dependent entity. A non-identifying relationship
is one in which both entities are independent.
• Existence: Existence denotes whether the existence of an entity
instance is dependent upon the existence of another, related, entity
instance. The existence of an entity in a relationship is defined as
either mandatory (total participation) or optional (partial
participation). If an instance of an entity must always occur for an
entity to be included in a relationship, then it is mandatory. An
example of mandatory existence is the statement "every project
must be managed by a single department". If the instance of the
entity is not required, it is optional. An example of optional existence
is the statement, "employees may be assigned to work on projects".
COURSE FACILITATORS: NYAM STEPHANIE [email protected] &
15
TATSOPTEU E. ENDELLY [email protected]
Partial and total participation
(iii) Attributes
Attributes describe the entity of which they are associated. A particular
instance of an attribute is a value. For example, "John Peter" is one value
of the attribute Name. The domain of an attribute is the collection of
all possible values an attribute can have. The domain of Name is a
character string. Attributes can be classified as identifiers or
descriptors.
• Identifiers, more commonly called keys, uniquely identify an
instance of an entity.
• A descriptor describes a nonunique characteristic of an entity
instance.

From the ER diagram above, the entity customer has many attributes:
Email address is a key attribute, telephone number a multivalued
attribute, postal address a composite attribute, and the remaining are
atomic attributes.
ER Notation
There is no standard for representing data objects in ER diagrams. Each
modelling methodology uses its own notation. All notational styles
represent entities as rectangular boxes and relationships as lines
connecting boxes. Each style uses a special set of symbols to represent
the cardinality of a connection. The symbols used for the basic ER
constructs from the are:

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


16
TATSOPTEU E. ENDELLY [email protected]
(i) Martin Notation
• Entities are represented by labelled rectangles. The label is the
name of the entity. Entity names should be singular nouns.
• Relationships are represented by a solid line connecting two
entities. The name of the relationship is written above the line.
Relationship names should be verbs.
• Attributes, when included, are listed inside the entity
rectangle. Attributes which are identifiers are underlined.
Attribute names should be singular nouns.
• Cardinality of many is represented by a line ending in a crow's
foot in another notation. If the crow's foot is omitted, the
cardinality is one.
• Existence is represented by placing a circle or a perpendicular
bar on the line. Mandatory existence is shown by the bar (looks like
a 1) next to the entity for an instance is required. Optional existence
is shown by placing a circle next to the entity that is optional.

(ii) Chen Notation


• Rectangles represent entity types
• Ellipses represent attributes
• Diamonds represent relationship types
• Lines link attributes to entity types and entity types to relationship types
• Double lines represent a total participation of that entity to the
relationship while a single line represents a partial participation
• Primary key attributes are underlined
• Double Ellipses represent multi-valued attributes

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


17
TATSOPTEU E. ENDELLY [email protected]
Summary of ER Diagram symbols and meanings
Data Modelling As Part of Database Design
The data model is one part of the conceptual design process. The other,
typically is the functional model. The data model focuses on what data
should be stored in the database while the functional model deals with
how the data is processed. To put this in the context of the relational
COURSE FACILITATORS: NYAM STEPHANIE [email protected] &
18
TATSOPTEU E. ENDELLY [email protected]
database, the data model is used to design the relational tables. The
functional model is used to design the queries which will access and perform
operations on those tables.
Data modelling is preceded by planning and analysis. The effort devoted
to this stage is proportional to the scope of the database. The planning
and analysis of a database intended to serve the needs of an enterprise
will require more effort than one intended to serve a small workgroup.
The information needed to build a data model is gathered during the
requirements analysis. Although not formally considered part of the data
modelling stage by some methodologies, in reality the requirements
analysis and the ER diagramming part of the data model are done at the
same time.
Requirements Analysis
The goals of the requirements analysis are:
• To determine the data requirements of the database in terms of
primitive objects
• To classify and describe the information about these objects
• To identify and classify the relationships among the objects
• To determine the types of transactions that will be executed on the
database and the interactions between the data and the transactions
• To identify rules governing the integrity of the data
The modeler, or modelers, works with the end users of an organization to
determine the data requirements of the database. Information needed for
the requirements analysis can be gathered in several ways:
• Review of existing documents: such documents include existing
forms and reports, written guidelines, job descriptions, personal
narratives, and memoranda. Paper documentation is a good way to
become familiar with the organization or activity you need to model.
• Interviews with end users: these can be a combination of
individual or group meetings. Try to keep group sessions to under
five or six people. If possible, try to have everyone with the same
function in one meeting. Use a blackboard, flip charts, or overhead
transparencies to record information gathered from the interviews.
• Review of existing automated systems: if the organization
already has an automated system, review the system design
specifications and documentation.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


19
TATSOPTEU E. ENDELLY [email protected]
The requirements analysis is usually done at the same time as the data
modelling. As information is collected, data objects are identified and
classified as either entities, attributes, or relationship; assigned names;
and, defined using terms familiar to the end-users. The objects are then
modelled and analysed using an ER diagram. The diagram can be
reviewed by the modeler and the end-users to determine its completeness
and accuracy. If the model is not correct, it is modified, which sometimes
requires additional information to be collected. The review and edit cycle
continue until the model is certified as correct. Three points to keep in
mind during the requirements analysis are:
1. Talk to the end users about their data in "real-world" terms. Users
do not think in terms of entities, attributes, and relationships but
about the actual people, things, and activities they deal with daily.
2. Take the time to learn the basics about the organization and its
activities that you want to model. Having an understanding about
the processes will make it easier to build the model.
3. End-users typically think about and view data in different ways
according to their function within an organization. Therefore, it is
important to interview the largest number of people that time
permits.

Steps In Building the Data Model


While ER model lists and defines the constructs required to build a data
model, there is no standard process for doing so. Some methodologies,
such as IDEFIX, specify a bottom-up development process were the model
is built in stages. Typically, the entities and relationships are modelled first,
followed by key attributes, and then the model is finished by adding non-
key attributes. Other experts argue that in practice, using a phased
approach is impractical because it requires too many meetings with the
end-users. The sequence used for this document are:
1. Identification of data objects and relationships
2. Drafting the initial ER diagram with entities and relationships
3. Refining the ER diagram
4. Add key attributes to the diagram
5. Adding non-key attributes
6. Diagramming Generalization Hierarchies
7. Validating the model through normalization

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


20
TATSOPTEU E. ENDELLY [email protected]
8. Adding business and integrity rules to the Model
In practice, model building is not a strict linear process. As noted above,
the requirements analysis and the draft of the initial ER diagram often
occur simultaneously. Refining and validating the diagram may uncover problems
or missing information which require more information gathering and analysis.

Identifying Data Objects and Relationships


In order to begin constructing the basic model, the modeler must analyse
the information gathered during the requirements analysis for the purpose of:
• Classifying data objects as either entities or attributes
• Identifying and defining relationships between entities
• Naming and defining identified entities, attributes, and relationships
• Documenting this information in the data document
To accomplish these goals the modeler must analyse narratives from
users, notes from meeting, policy and procedure documents, and, if lucky,
design documents from the current information system.
Although it is easy to define the basic constructs of the ER model, it is not
an easy task to distinguish their roles in building the data model. What
makes an object an entity or attribute? For example, given the statement
"employees work on projects". Should employees be classified as an entity
or attribute? Very often, the correct answer depends upon the
requirements of the database. In some cases, employee would be an
entity, in some it would be an attribute. While the definitions of the
constructs in the ER Model are simple, the model does not address the
fundamental issue of how to identify them. Some commonly given
guidelines are:
• Entities contain descriptive information
• Attributes either identify or describe entities
• Relationships are associations between entities
These guidelines are discussed in more detail below.

(i) Entities:
There are various definitions of an entity:
(a) "Any distinguishable person, place, thing, event, or concept, about
which information is kept".
(b) "A thing which can be distinctly identified".
(c) "Any distinguishable object that is to be represented in a database".

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


21
TATSOPTEU E. ENDELLY [email protected]
(d) "...anything about which we store information (e.g. supplier,
machine tool, employee, utility pole, airline seat, etc.). For each
entity type, certain attributes are stored".
These definitions contain common themes about entities:
• An entity is a "thing", "concept" or, object". However, entities can
sometimes represent the relationships between two or more objects.
This type of entity is known as an associative entity.
• Entities are objects which contain descriptive information. If a data
object you have identified is described by other objects, then it is an
entity. If there is no descriptive information associated with the item,
it is not an entity. Whether or not a data object is an entity may
depend upon the organization or activity being modelled.
• An entity represents many things which share properties. They are
not single things. For example, King Lear and Hamlet are both plays
which share common attributes such as name, author, and cast of
characters. The entity describing these things would be PLAY, with
King Lear and Hamlet being instances of the entity.
• Entities which share common properties are candidates for being
converted to generalization hierarchies.
• Entities should not be used to distinguish between time periods. For
example, the entities 1st Quarter Profits, 2nd Quarter Profits, etc.
should be collapsed into a single entity called Profits. An attribute
specifying the time period would be used to categorize by time.
• Not everything the users want to collect information about will be
an entity. A complex concept may require more than one entity to
represent it. Others "things" users think important may not be entities.
(ii) Attributes:
Attributes are data objects that either identify or describe entities.
Attributes that identify entities are called key attributes. Attributes that
describe an entity are called non-key attributes. The process for identifying
attributes is similar except now you want to look for and extract those
names that appear to be descriptive noun phrases.
Validating Attributes
Attribute values should be atomic, that is, present a single fact. Having
disaggregated data allows simpler programming, greater reusability of
data, and easier implementation of changes. Normalization also depends

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


22
TATSOPTEU E. ENDELLY [email protected]
upon the "single fact" rule being followed. Common types of violations
include:
• Simple aggregation: a common example is Person Name which
concatenates first name, middle initial, and last name. Another is
Address which concatenates, street address, city, and zip code.
When dealing with such attributes, you need to find out if there are
good reasons for decomposing them. For example, do the end- users
want to use the person's first name in a form letter? Do they want
to sort by zip code?
• Complex codes: these are attributes whose values are codes
composed of concatenated pieces of information. An example is the
code attached to automobiles and trucks. The code represents
different pieces of information about the vehicle. Unless part of an
industry standard, these codes have no meaning to the end user.
They are very difficult to process and update.
• Text blocks: these are free-form text fields. While they have a
legitimate use, an over reliance on them may indicate that some
data requirements are not met by the model.
• Mixed domains: this is where a value of an attribute can have
different meaning under different conditions
Derived Attributes and Code Values
Two areas where data modelling experts disagree is whether derived
attributes and attributes whose values are codes should be permitted in
the data model. Derived attributes are those created by a formula or
by a summary operation on other attributes called stored attributes.
Arguments against including derived data are based on the premise that
derived data should not be stored in a database and therefore should not
be included in the data model. The arguments in favour are:
• Derived data is often important to both managers and users and
therefore should be included in the data model
• It is just as important, perhaps more so, to document derived
attributes just as you would other attributes
• Including derived attributes in the data model does not imply how
they will be implemented
A coded value uses one or more letters or numbers to represent a fact.
For example, the value Gender might use the letters "M" and "F" as values
rather than "Male" and "Female". Those who are against this practice cite

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


23
TATSOPTEU E. ENDELLY [email protected]
that codes have no intuitive meaning to the end-users and add complexity
to processing data. Those in favour argue that many organizations have a
long history of using coded attributes, that codes save space, and improve
flexibility in that values can be easily added or modified by means of look-
up tables.
(iii) Relationships:
Relationships are associations between entities. Typically, a relationship is
indicated by a verb connecting two or more entities. For example:
“employees are assigned to projects”.
As relationships are identified they should be classified in terms of
cardinality, optionality, direction, and dependence. As a result of defining
the relationships, some relationships may be dropped and new
relationships added. Cardinality quantifies the relationships between
entities by measuring how many instances of one entity are related to a
single instance of another. To determine the cardinality, assume the
existence of an instance of one of the entities. Then determine how many
specific instances of the second entity could be related to the first. Repeat
this analysis reversing the entities. For example: Employees may be
assigned to no more than three projects at a time; every project has at
least two employees assigned to it.
Here the cardinality of the relationship from employees to projects is
three; from projects to employees, the cardinality is two. Therefore, this
relationship can be classified as a many-to-many relationship.
If a relationship can have a cardinality of zero, it is an optional
relationship. If it must have a cardinality of at least one, the
relationship is mandatory. Optional relationships are typically indicated
by the conditional tense. For example: An employee may be assigned to
a project. Mandatory relationships, on the other hand, are indicated by
words such as must have. For example: A student must register for at
least three course each semester.
Mandatory and optional relationship are also participation constraint.
• Total: The participation of an entity set E in a relationship set R is
said to be total if every entity in E participates in at least one
relationship in R.
• Partial: if only some entities in E participate in relationships in R,
the participation of entity set E in relationship R is said to be partial.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


24
TATSOPTEU E. ENDELLY [email protected]
In the case of the specific relationship form (1:1 and 1:M), there is always
a parent entity and a child entity. In one-to-many relationships, the
parent is always the entity with the cardinality of one. In one-to-one
relationships, the choice of the parent entity must be made in the context
of the business being modelled. If a decision cannot be made, the choice
is arbitrary.

Naming Data Objects


The names should have the following properties:
• Unique
• Have meaning to the end-user
• Contain the minimum number of words needed to uniquely and
accurately describe the object
For entities and attributes, names are singular nouns while relationship
names are typically verbs. Some authors advise against using
abbreviations or acronyms because they might lead to confusion about
what they mean. Other believe using abbreviations or acronyms are
acceptable provided that they are universally used and understood within
the organization. You should also take care to identify and resolve
synonyms for entities and attributes. This can happen in large projects
where different departments use different terms for the same thing.

Recording Information in Design Document


The design document records detailed information about each object
used in the model. As you name, define, and describe objects, this
information should be placed in this document. If you are not using an
automated design tool, the document can be done on paper or with a
word processor. There is no standard for the organization of this document
but the document should include information about names, definitions,
and, for attributes, domains.
Two documents used in the IDEF1X method of modelling are useful for
keeping track of objects. These are the ENTITY-ENTITY matrix and the
ENTITY-ATTRIBUTE matrix. The ENTITY-ENTITY matrix is a two-
dimensional array for indicating relationships between entities. The names
of all identified entities are listed along both axes. As relationships are first
identified, an "X" is placed in the intersecting points where any of the two
axes meet to indicate a possible relationship between the entities involved.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


25
TATSOPTEU E. ENDELLY [email protected]
As the relationship is further classified, the "X" is replaced with the
notation indicating cardinality. The ENTITY-ATTRIBUTE matrix is used
to indicate the assignment of attributes to entities. It is similar in form to
the ENTITY-ENTITY matrix except attribute names are listed on the rows.

Developing the Basic Schema


Once entities and relationships have been identified and defined, the first
draft of the entity relationship diagram can be created. This section
introduces the ER diagram by demonstrating how to diagram binary
relationships. Recursive relationships are also shown.
Binary Relationships
The Figure shows examples of how to diagram one-to-one, one-to-many,
and many-to many relationships.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


26
TATSOPTEU E. ENDELLY [email protected]
• One-To-One: Figure A shows an example of a one-to-one diagram.
Reading the diagram from left to right represents the relationship
every employee is assigned a workstation. Because every employee
must have a workstation, the symbol for mandatory existence—in
this case the crossbar—is placed next to the WORKSTATION entity.
Reading from right to left, the diagram shows that not all
workstation are assigned to employees. This condition may reflect
that some workstations are kept for spares or for loans. Therefore,
we use the symbol for optional existence, the circle, next to
EMPLOYEE. The cardinality and existence of a relationship must be
derived from the "business rules" of the organization. For example,
if all workstations owned by an organization were assigned to
employees, then the circle would be replaced by a crossbar to
indicate mandatory existence.
• One-To-Many: Figure B shows an example of a one-to-many
relationship between DEPARTMENT and PROJECT. In this diagram,
DEPARTMENT is considered the parent entity while PROJECT is the
child. Reading from left to right, the diagram represents
departments may be responsible for many projects. The optionality
of the relationship reflects the "business rule" that not all
departments in the organization will be responsible for managing

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


27
TATSOPTEU E. ENDELLY [email protected]
projects. Reading from right to left, the diagram tells us that every
project must be the responsibility of exactly one department.
• Many-To-Many: Figure C shows a many-to-many relationship
between EMPLOYEE and PROJECT. An employee may be assigned
to many projects; each project must have many employees. Note
that the association between EMPLOYEE and PROJECT is optional
because, at a given time, an employee may not be assigned to a
project. However, the relationship between PROJECT and
EMPLOYEE is mandatory because a project must have at least two
employees assigned. Many-To-Many relationships can be used in the
initial drafting of the model but eventually must be transformed into
two one-to-many relationships. The transformation is required
because many-to-many relationships cannot be represented by the
relational model. The process for resolving many-to-many
relationships is discussed in the next section.
Recursive relationships
A recursive relationship is an entity is associated with itself. The Figure
below shows an example of the recursive relationship. An employee may
manage many employees and each employee is managed by one.

B. Relational Model
The logical model is also called a Relational Model. The elements of a
database are Tables, Queries ...
• A Table: is a collection of data about a specific topic such as
products, students or suppliers. A table organizes data into columns
(fields) and rows (records or tuples).
• A field: is a piece of information about a subject. Each field is
arranged as a column in table.
• A record: is complete information about a subject. A record is a
collection of fields and presented as a row in a table of database.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


28
TATSOPTEU E. ENDELLY [email protected]
Relational Model Concepts
We shall represent a relation as a table with columns and rows. Each
column of the table has a name, or attribute. Each row is called a tuple.
• Domain: a set of atomic values that an attribute can take
• Attribute: name of a column in a particular table (all data is stored
in tables). Each attribute Ai must have a domain, dom(Ai).
• Relational Schema: The design of one table, containing the name
of the table (i.e. the name of the relation), and the names of all the
columns, or attributes. Example: STUDENT( Name, SID, Age, GPA)
• Degree of a Relation: the number of attributes in the relation's
schema.
• Tuple, t, of R( A1, A2, A3, …, An):an ORDERED set of values,
< v1, v2,v3, …, vn>, where each vi is a value from dom(Ai).
• Relation Instance, r( R):a set of tuples; thus,
r( R) = { t1, t2, t3, …, tm}

NOTES:
• The tuples in an instance of a relation are not considered to be
ordered putting the rows in a different sequence does not change
the table.
• Once the schema, R( A1, A2, A3, …, An) is defined, the values, vi,
in each tuple, t, must be ordered as t = <v1, v2, v3, …, vn>
Properties of relations
Properties of database relations are:
• relation name is distinct from all other relations
• each cell of relation contains exactly one atomic (single) value
• each attribute has a distinct name
• values of an attribute are all from the same domain
• order of attributes has no significance
• each tuple is distinct, there are no duplicate tuples
• order of tuples has no significance, theoretically.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


29
TATSOPTEU E. ENDELLY [email protected]
Relational keys
There are two kinds of keys in relations. The first are identifying keys: the
primary key is the main concept, while two other keys – super key and
candidate key– are related concepts. The second kind is the foreign key.
Identity Keys
• Super Keys: A super key is a set of attributes whose values can be
used to uniquely identify a tuple within a relation. A relation may
have more than one super key, but it always has at least one: the
set of all attributes that make up the relation.
• Candidate Keys: A candidate key is a super key that is minimal;
that is, there is no proper subset that is itself a super key. A relation
may have more than one candidate key, and the different candidate
keys may have a different number of attributes. In other words, you
should not interpret 'minimal' to mean the super key with the fewest
attributes. A candidate key has two properties:
(i) in each tuple of R, the values of K uniquely identify that tuple
(uniqueness)
(ii) no proper subset of K has the uniqueness property
(irreducibility).
• Primary Key: The primary key of a relation is a candidate key
especially selected to be the key for the relation. In other words, it
is a choice, and there can be only one candidate key designated to
be the primary key.
Relationship between identity keys
The relationship between keys:
Super key ⊇ Candidate Key ⊇ Primary Key
• Foreign keys (FKs): The attribute(s) within one relation that
matches a candidate key of another relation. A relation may have several
foreign keys, associated with different target relations. Foreign keys allow users
to link information in one relation to information in another relation. Without
FKs, a database would be a collection of unrelated tables.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


30
TATSOPTEU E. ENDELLY [email protected]
Relational Model Constraints
Integrity Constraints
Each relational schema must satisfy the following four types of constraints.
• Domain constraints: Each attribute Ai must be an atomic value
from dom(Ai) for that attribute. The attribute, Name in the example
is a BAD DESIGN (because sometimes we may want to search a
person by only using their last name.
• Key Constraints:
Super key of R (SK): A set of attributes, SK, of R such that no two
tuples in any valid relational instance, r(R), will have the same value
for SK. Therefore, for any two distinct tuples, t1 and t2 in r(R),
t1[ SK] != t2[SK].
Key of R: A minimal super key. That is, a super key, K, of R such
that the removal of ANY attribute from K will result in a set of
attributes that are not a super key. Example
CAR(State, License Plate-No, VehicleID, Model, Year, Manufacturer)
This schema has two keys:
K1 = {State, License-Plate-No} and K2 = {VehicleID}
Both K1 and K2 are super keys. K3 = { VehicleID, Manufacturer} is
a super key, but not a key (Why?). If a relation has more than one
keys, we can select any one (arbitrarily) to be the primary key.
Primary Key attributes are underlined in the schema:
CAR(State, License-Plate-No, VehicleID, Model, Year, Manufacturer)
• Entity Integrity Constraints: The primary key attribute, PK, of
any relational schema R in a database cannot have null values in
any tuple. In other words, for each table in a DB, there must be a
key; for each key, every row in the table must have non-null
values. This is because PK is used to uniquely identify the individual
tuples. Mathematically, t[PK] != NULL for any tuple t € r(R).
• Referential Integrity Constraints: are used to specify the
relationships between two relations in a database. Consider a
referencing relation, R1, and a referenced relation, R2. Tuples in the
referencing relation, R1, have attributed FK (called foreign key
attributes) that reference the primary key attributes of the
referenced relation, R2. A tuple, t1, in R1 is said to reference a tuple,
t2, in R2 if t1[FK] = t2[PK]. A referential integrity constraint can be
displayed in a relational database schema as a directed arc from the

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


31
TATSOPTEU E. ENDELLY [email protected]
referencing (foreign) key to the referenced (primary) key. Examples
are shown in the figures below:

The last record of the table Enrolled shows a violation of the referential integrity

Entity Relationship -to- Relational Mapping


Now we are ready to lay down some informal methods to help us create
the Relational schemas from our ER models. These will be described in the
following steps:

1. For each regular entity, E, in the ER model, create a relation R that


includes all the simple attributes of E. Select the primary key for E,
and mark it.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


32
TATSOPTEU E. ENDELLY [email protected]
2. For each weak entity type, W, in the ER model, with the Owner entity
type, E, create a relation R with all attributes of W as attributes of
W, plus the primary key of E. [Note: if there are identical tuples in
W which share the same owner tuple, then we need to create an
additional index attribute in W.]
3. For each binary relation type, R, in the ER model, identify the
participating entity types, S and T.
• For 1:1 relationship between S and T, choose one relation, say S.
Include the primary key of T as a foreign key of S.
• For 1:N relationship between S and T, Let S be the entity on the
N side of the relationship. Include the primary key of T as a foreign
key in S.

• For M: N relation between S and T, create a new relation, P, to


represent R. Include the primary keys of both, S and T as foreign
keys of P.

4. For each multi-valued attribute, A, create a new relation, R, that


includes all attributes corresponding to A, plus the primary key
attribute, K, of the relation that represents the entity
type/relationship type that has A as an attribute.
5. For each n-ary relationship type, n > 2, create a new relation S.
Include as foreign key attributes in S the primary keys of the
relations representing each of the participating entity types. Also

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


33
TATSOPTEU E. ENDELLY [email protected]
include any simple attributes of the n-ary relationship type as
attributes of S.
Translation of the University ER Diagram to the Relational model

3. NORMALIZATION
• A Formal process of decomposing relations with anomalies to
produce smaller, well-structured and stable relations.
• Its objectives are to validate and improve a logical design so that it
satisfies certain constraints that avoid unnecessary duplication
of data.
Well-Structured Relations
• A relation that contains minimal data redundancy and allows users
to insert, delete, and update rows without causing data
inconsistencies.
• Goal is to avoid (minimize) anomalies
– Insertion Anomaly: adding new rows forces user to create
duplicate data
– Deletion Anomaly: deleting a row may cause loss of other data
representing completely different facts
– Modification Anomaly: changing data in a row forces changes
to other rows because of duplication

Pitfalls We Wish to Avoid

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


34
TATSOPTEU E. ENDELLY [email protected]
(i) Redundancy Versus Loss of Data
When designing our schema, we want to do so in such a way that we
minimize redundancy of data without losing any data. By redundancy, I
mean data that is repeated in different rows of a table or in different tables
in the database. Imagine that rather than having an employee table and
a department table, we have a single table called employeeDepartment.
We can accomplish this by adding a single departmentName column to
the employee table so that the schema looks like this:Table:
employeeDepartment(employeeID, name, job, departmentID, departmentName)
For each employee who works in the Department with the number 128,
Research and Development, we will repeat the data "128, Research and
Development." This will be the same for each department in the company.
This schema design leads to redundantly storing the department name
over and over.We can change this design as shown here:
employee(employeeID, name, job, departmentID)
department(departmentID, name)
In this case, each department name is stored in the database only once,
rather than many times, minimizing storage space and avoiding
some problems. Note that we must leave the departmentID in the
employee table; otherwise, we lose information from the schema, and in
this case, we would lose the link between an employee and the
department the employee works for. In improving the schema, we must
always bear these twin goals in mind—that is, reducing repetition of
data without losing any information.
(ii) Anomalies
Anomalies present a slightly more complex concept. Anomalies are
problems that arise in the data due to a flaw in the database design. There
are three types of anomalies that may arise, and we will consider how
they occur with the flawed schema described above.
• Insertion Anomalies: Insertion anomalies occur when we try to
insert data into a flawed table. Imagine that we have a new
employee starting at the company. When we insert the employee's
details into the employeeDepartment table, we must insert both his
department id and his department name. What happens if we insert
data that does not match what is already in the table, for example,
by entering an employee as working for Department 42,

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


35
TATSOPTEU E. ENDELLY [email protected]
Development? It will not be obvious which of the rows in the
database is correct. This is an insertion anomaly.
• Deletion Anomalies: Deletion anomalies occur when we delete
data from a flawed schema. Imagine that all the employees of
Department 128 leave on the same day (walking out in disgust,
perhaps). When we delete these employee records, we no longer
have any record that Department 128 exists or what its name is.
This is a deletion anomaly.
• Update Anomalies: Update anomalies occur when we change data
in a flawed schema. Imagine that a Department decides to change
its name. We must change this data for every employee who works
for this department. We might easily miss one. If we do miss one
(or more), this is an update anomaly.
(iii) Null Values
A final rule for good database design is that we should avoid schema
designs that have large numbers of empty attributes. For example, if we
want to note that one in every hundred or so of our employees has some
special qualification, we would not add a column to the employee table to
store this information because for 99 employees, this would be NULL. We
would instead add a new table storing only employeeIDs and qualifications
for those employees who have those qualifications.

Functional Dependencies
In order to be able to normalize a relation, we must first understand the
concept of dependency between attributes within a relation. There exist
various types of dependencies:
• Functional dependency: if "A" and "B" are attributes of relation
"R", "B" is functionally dependent on "A" (denoted "A" →"B"), if each
value of "A" in "R" is associated with exactly one value of "B" in "R".
Candidate Key:
– Attribute that uniquely identifies a row in a relation
– Could be a combination of (non-redundant) attributes
– Each non-key field is functionally dependent on every candidate key

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


36
TATSOPTEU E. ENDELLY [email protected]
• Full functional dependency: we talk of full functional dependency
if attribute "B" is functional dependent on "A", if "A" is a composite
primary key and "B" is not already functional dependent on parts of "A".
• Transitive dependency: if "A" determines "B", and "B"
determines "C", then "C" is determined by (dependent on) "A". We
write "A" → "B" and "B" → "C" then "A" → "C".

Partial and Transitive Dependencies

Consider the following Relation:

• Question: Is this a relation?


Answer: Yes - unique rows and no multivalued attributes
• Question: What’s the primary key?
Answer: Composite key= {EmpID, CourseTitle}
Anomalies in this Table
• Insertion: can’t enter a new employee without having the employee
take a class.
• Deletion: if we remove employee 140, we lose information about the
existence of a Tax Acc class.
• Modification: giving a salary increase to employee 100 forces us to
update multiple records.
Why do these anomalies exist?
COURSE FACILITATORS: NYAM STEPHANIE [email protected] &
37
TATSOPTEU E. ENDELLY [email protected]
Because there are two themes (entity types) in this one relation (two
themes, entity types, were combined). This results in duplication, and an
unnecessary dependency between the entities.

The Normal Forms


The database community has developed a series of guidelines for ensuring
that databases are normalized. These are referred to as normal forms and
are numbered from one (the lowest form of normalization, referred to as
first normal form or 1NF) through five (fifth normal form or 5NF).

Steps in Normalization
(i) First Normal Form (1NF)
The characteristics of the 1NF are as follows:
• Only atomic attributes (simple, single-value)
• A primary key has been identified
• Every relation is in 1NF by definition
• 1NF example:

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


38
TATSOPTEU E. ENDELLY [email protected]
Representing Functional Dependencies

(ii) First Normal Form (2NF)


The characteristics of the 1NF are as follows:
• 1NF and every non-key attribute is fully functionally dependent on
the ENTIRE primary key
– Every non-key attribute must be defined by the entire key, not by
only part of the key
– No partial functional dependencies
Functional Dependencies in Student

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


39
TATSOPTEU E. ENDELLY [email protected]
Can represent FDs with arrows as above, or
• {StudentId} →{StuName},
• {CourseId} → {CourseName}
• {StudentId,CourseId} → {Grade (and StuName, CourseName)}
Any partial FDs? Yes, Therefore, NOT in 2nd Normal Form
2NF Normalizing
• How do we convert the partial dependencies into normal ones? By
breaking into more tables.

• Becomes … (notice above arrows mean functional dependency,


below they mean FK constraints)

(iii) First Normal Form (3NF)


The characteristics of the 1NF are as follows:
• 2NF and no transitive dependencies
• A transitive dependency is when a non-key attribute depends on
another non-key attribute.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


40
TATSOPTEU E. ENDELLY [email protected]
• Note: This is called transitive, because the primary key is a
determinant for another attribute, which in turn is a determinant for
a third attribute.
• 3NF Example:

- Classroom →Capacity TRANSITIVE


- Any partial FDs? NO
- Any transitive FDs? YES!
- How do we eliminate it?
- By breaking into its own table
3NF Normalization

Further Normalization
• Boyce-Codd Normal form (BCNF)
– Slight difference with 3NF
– To be in 3NF but not in BNF, needs two composite candidate keys,
with one attribute of one key depending on one attribute of the other
– Not very common
– If a table contains only one candidate key, the 3NF and the BCNF
are equivalent.
• Fourth Normal Form (4NF)

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


41
TATSOPTEU E. ENDELLY [email protected]
– To break it, need to have multivalued dependencies, a
generalization of functional dependencies
• Usually, if you’re in 3NF you’re in BCNF, 4NF, …
BCNF Example
• Assume that
– For each subject, each student is taught by one Instructor
– Each Instructor teaches only one subject
– Each subject is taught by several Instructors

BCNF: Decompose into (Instructor, Course) and (Student, Instructor)


(iv) BCNF
• Boyce-Codd normal form (BCNF): A relation is in BCNF, if and only
if, every determinant is a candidate key.
• The difference between 3NF and BCNF is that for a functional
dependency A →B, 3NF allows this dependency in a relation if B is
a primary-key attribute and A is not a candidate key, whereas BCNF
insists that for this dependency to remain in a relation, A must be a
candidate key.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


42
TATSOPTEU E. ENDELLY [email protected]
(v) Fourth Normal Form (4NF)
• A multi-valued dependency exists when
– There are at least 3 attributes A, B, C in a relation and
– For each value of A there is a well-defined set of values for B, and
a well-defined set of values for C,
– But the set of values for B is independent on the set of values for C
• 4NF = 3NF with no multi-valued dependency
• 4NF Example: Assume that
– Each subject is taught by many Instructors
– The same books are used in many subjects
– Each Instructor uses a different book

4NF: Decompose into (Course, Instructor) and (Course, Text)

Case Study: INVOICE Data


Consider the following INVOICE Data

1. Table with multivalued attributes, not in 1st normal form

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


43
TATSOPTEU E. ENDELLY [email protected]
2. Table with no multivalued attributes and unique rows, but this relation
is not well-structured
Anomalies in this Table
• Insertion: if new product is ordered for order 1007 of existing
customer, customer data must be re-entered, causing duplication.
• Deletion: if we delete the Dining Table from Order 1006, we lose
information concerning this item's finish and price.
• Update: changing the price of product ID 4 requires update in
several records
Functional dependency diagram for INVOICE

Therefore, NOT in 2nd Normal Form


Partial Dependencies Removed, (2NF) achieved.

Partial dependencies are removed, but there are still transitive


dependencies

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


44
TATSOPTEU E. ENDELLY [email protected]
Transitive Dependencies Removed, (3NF) achieved.

Steps of Database Development

4. RELATIONAL LANGUAGES
We have so far considered the structure of a database; the relations and
the associations between relations. In this section we consider how useful
data may be extracted and filtered from database tables. A relational
language is needed to express these queries in a well-defined way. A
relational language is an abstract language which provides the
database user with an interface through which they can specify data to be
retrieved according to certain selection criteria. The two main relational
languages are relational algebra and relational calculus. Relational
algebra, which we focus on here, provides the user with a set of operators

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


45
TATSOPTEU E. ENDELLY [email protected]
which may be used to create new (temporary) relations based on
information contained in existing relations. Relational calculus, on the
other hand, provides a set of key words to allow the user to make ad hoc
inquiries.

Relational Algebra
Relational algebra is a procedural language consisting of a set of
operators. Each operator takes one or more relations as its input and
produces one relation as its output. The seven basic relational algebra
operations are Selection, Projection, Joining, Union, Intersection,
Difference and Division. It is important to note that these operations
do not alter the database. The relation produced by an operation is
available to the user but it is not stored in the database by the operation.
(i) Selection (also called Restriction) Operation
The SELECT operator selects all tuples from some relation R, so that some
attributes in each tuple satisfy some condition c. And the syntax is:
R1 = SELECTϲ(R) or R1 = σϲ(R)
A new relation R1 containing the selected tuples is then created as output.
Suppose we have the relation STORES:

The relational operation: R1 = SELECT Location = 'Dublin' (STORES), which is


read as R1 = SELECT STORES WHERE Location = 'Dublin'
selects all tuples for stores that are located in Dublin and creates the new
relation R1 which appears as follows:

We can also impose conditions on more than one attribute. This is done
by connecting those Clauses, using Boolean operators (AND, OR, NOT)

Examples:

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


46
TATSOPTEU E. ENDELLY [email protected]
• Select Employee tuples whose Department No is 4, or whose Salary
is greater than 30,000
R1 = σ (DNO=4 OR SALARY>30000)(Employee)
• Select Employee tuples who either work in Department 4 and make
over 25,000, or work in Department 5 and make over 30,000
R1 = σ (DNO=4 AND SALARY>25000) OR (DNO=5 AND SALARY>30000)(Employee)
Notes:
• Degree of relation resulting from Select operation on R is the same
as that of R
• Number of tuples in resulting operation always less than or equal to
the number of tuples in R
• Select operation is commutative, i.e.
σ<cond1>(σ<cond2>(R)) = σ<cond2>(σ<cond1>(R))
(ii) Projection Operation
The projection operator constructs a new relation from some existing
relation R by selecting only specified list of attributes L of the existing
relation and eliminating duplicate tuples in the newly formed relation R3.
The syntax is: R3 = PROJECT L (R) or R3 = PROJ L (R) or R3 = 𝝅L(R)
R3 = PROJECT Store-ID, Location (STORES) is read as
R3 = PROJECT STORES OVER Store-ID, Location and results in:

Notes:
• Degree of projection is equal to the number of attributes in the
attributes list
• If attribute list includes only non-key attributes, duplicate tuples
likely to occur
• Project operation removes any duplicate tuples
• Number of tuples in resulting operation always less than or equal to
the number of tuples in R
• If projection list is a super key of R (i.e., includes some key of R),
resulting relation has same number of tuples as R

(iii) Cartesian Product

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


47
TATSOPTEU E. ENDELLY [email protected]
Also known as cross product or cross join, it combines tuples from two
relations in a combinatorial fashion. The syntax is:
R3 = R1 * R2 (Alternatively, R3 = R1 × R2)
• Pair each tuple t1 of R1 with each tuple t2 of R2
• Concatenation t1|t2 is a tuple of R3
• R3 has one tuple for each combination of tuples from R1 and R2
- If R has nR tuples and S has nS tuples, R × S will have nR*nS tuples
• Schema of R3 is the attributes of R1 and R2, in order
- i.e., R1(A1,A2,...,An) × R2(B1,B2,...,Bm) results in relation R3 with
n + m attributes R3(A1,A2,...,An, B1,B2,...,Bm)
• But beware of attribute A of the same name in R1 and R2
- Use R1.A and R2.A
Example 1: R3 = R1 * R2

(iv) Joining Operation


Joining is an operation for combining two relations into a single relation.
At the outset, it requires choosing the attributes to match the tuples in
each relation. Tuples in different relations but with the same value of
matching attributes are combined into a single tuple in the output relation.
The syntax is: R5 = R JOIN ϲ R’
• Take the product R * R’.
• Then apply SELECT C to the result.
• The Select condition, C can be any Boolean-valued condition (=, >,
≥ or !<, ≤ or !>)

For example, with a new relation ITEMS:

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


48
TATSOPTEU E. ENDELLY [email protected]
… and our previous STORES relation:

if we joined ITEMS to STORES using the operator:


R5 = STORES JOIN Stored-ID= Stored-ID ITEMS and it is read as
R5 = JOIN STORES, ITEMS OVER Store-ID
the resulting relation R5 would appear as follows:

This relation resulted from a joining of ITEMS and STORES over the
common attribute Store-ID, i.e. any tuples of each relation which
contained the same value of Store-ID were joined together to form a single tuple.
Joining relations together based on equality of values of common
attributes is called an equijoin. When duplicate attributes are removed
from the result of an equijoin this is called a natural join, denoted:
R5 = R JOIN R’ and Alternatively by notation R5 =R R’.
The example above is such a natural join - as Store-ID appears only once
in the result. Note that there is often a connection between keys (primary
and foreign) and the attributes over which a join is performed in order to
amalgamate information from multiple related tables in a database. In the
above example, ITEMS.Store_ID is a foreign key reflecting the primary key
STORE.Store_ID. When we join on Store_ID the relationship between the
tables is expressed explicitly in the resulting output table. To illustrate, the
relationship between these relations can be expressed as an E-R diagram,
shown below.

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


49
TATSOPTEU E. ENDELLY [email protected]
Outer join
• The join operations we have seen so far
- Consider only matching tuples from the two relations
- Tuples with null values in join attribute are also eliminated
• There are situations where we want to
- keep all tuples in R, or
- Keep all tuples in S, or
- Keep all tuples in R and S,
- Whether or not they have matching tuples in the other relation
• Suppose we join R JOIN ϲ S.
• A tuple of R that has no tuple of S with which it joins is said to be
dangling.
- Similarly, for a tuple of S.
• Outer join preserves dangling tuples by padding them with a special
NULL symbol in the result.
Example

Types of outer join


• The R OUTERJOIN S operation we saw on the previously is an
example of FULL OUTER JOIN
- Keeps all tuples in R and S
COURSE FACILITATORS: NYAM STEPHANIE [email protected] &
50
TATSOPTEU E. ENDELLY [email protected]

• Outer join operations
- Left outer join
- Right outer join
Left outer join

Right outer join

(v) Union, Intersection and Difference Operations


These are the standard set operators. The requirement for carrying out
any of these operations are that the two operand relations are union-
compatible - i.e. they have the same number of attributes (say n) and the
ith attribute of each relation (i = 1,…,n) must be from the same domain
(they do not have to have the same attribute names).
• The UNION operator builds a relation consisting of all tuples
appearing in either or both of two specified relations, (R∪S).
• The INTERSECT operator builds a relation consisting of all tuples
appearing strictly in both specified relations, (R∩S).
• The DIFFERENCE operator builds a relation consisting of all tuples
appearing in the first, but not the second of two specified relations, (R – S).

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


51
TATSOPTEU E. ENDELLY [email protected]
This may be represented diagrammatically as shown below.

As an exercise, find:
C = UNION(A,B), C = INTERSTION(A,B) and C= DIFFERENCE(A,B).

(vi) Division Operation


In its simplest form, this operation has a binary relation R(X, Y) as the
dividend and a divisor that includes Y. The output is a set, S, of values of
X such that x Є S if there is a row (x, y) in R for each y value in the divisor.
As an example, suppose we have two relations R6 and R7:

The operation:
R8 = R6 / R7
will give the result:

This is because C3 is the only company for which there is a row with
Boston and New York. The other companies, C1 and C2, do not satisfy this
condition.
Aggregation Operators

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


52
TATSOPTEU E. ENDELLY [email protected]
• Aggregation operators are not operators of relational algebra
• Rather, they apply to entire columns of a table and produce a single result
• Most important examples
- SUM, AVG, COUNT, MIN, and MAX
• Example: Given R(A,B)

(vii) Grouping Operator


The syntax is R1 = GAMMAL(R2). L is a list of elements that are either:
1. Individual (grouping) attributes.
2. AGG(A), where AGG is one of the aggregation operators and A is an attribute.
Applying GAMMAL(R)
• Group R according to all the grouping attributes on list L.
- That is, form one group for each distinct list of values for those
attributes in R.
• Within each group, compute AGG(A ) for each aggregation on list L.
• Result has grouping attributes and aggregations as attributes. One
tuple for each list of values for the grouping attributes and their
group’s aggregations.
Example: Grouping/Aggregation

Relational calculus
Contrary to Relational Algebra which is a procedural query language to
fetch data and which also explains how it is done, Relational Calculus
is non-procedural query language and has no description about how the
query will work or the data will be fetched. It only focusses on what to do,

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


53
TATSOPTEU E. ENDELLY [email protected]
and not on how to do it. In brief Relational calculus is Higher-level
declarative language for specifying relational queries.
Types of Relational calculus
• Tuple Relational Calculus (TRC):
- The tuple relational calculus is specified to select the tuples in a
relation. In TRC, filtering variable uses the tuples of a relation.
- The result of the relation can have one or more tuples.
Syntax: {T | P (T)} or {T | Condition (T)}
Where T is the resulting tuples, P(T) is the condition used to fetch
T. For example:
1. {T.name | Author(T) AND T.article = 'database' }
OUTPUT: This query selects the tuples from the AUTHOR
relation. It returns a tuple with 'name' from Author who has
written an article on 'database'.
2. {T.name | Student(T) AND T.age > 17}
OUTPUT: This query selects the tuples from the Student relation.
It returns a tuple with 'name' from Student with age greater than 17.
• Domain Relational Calculus (DRC):
- In domain relational calculus, filtering variable uses the domain
of attributes.
- Domain relational calculus uses the same operators as tuple
calculus. It uses logical connectives ∧ (and), ∨ (or) and ┓ (not).
- It uses Existential (∃) and Universal Quantifiers (∀) to bind the
variable.
Syntax: {a1, a2, a3, ..., an | P (a1, a2, a3, ... ,an)}
Where: a1, a2 are attributes, P stands for formula built by inner
attributes. For example:
1. {< name, age > | ∈ Student ∧ age > 17}
Output: This query will return the names and ages of the
students in the table Student who are older than 17.
2. {l.SUPNR, l.SUPNAME | SUPPLIER(l) AND l.SUPSTATUS > 85}
Output: This will select the supplier number and supplier name
of all suppliers whose status is bigger than 85.

References/Suggested Readings
1. Date, C.J., Introduction to Database Systems (7th Edition) Addison

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


54
TATSOPTEU E. ENDELLY [email protected]
Wesley, 2000
2. Leon, Alexis and Leon, Mathews, Database Management Systems,
Leon TECHWorld
3. Elamasri R . and Navathe, S., Fundamentals of Database Systems (3rd
Edition), Pearsson Education, 2000.

Books:
• A First Course in Database Systems, by J. Ullman and J. Widom
• Fundamentals of Database Systems, by R. Elmasri and S. Navathe

COURSE FACILITATORS: NYAM STEPHANIE [email protected] &


55
TATSOPTEU E. ENDELLY [email protected]

You might also like