0% found this document useful (0 votes)
22 views

Database Chapter 2 by Hatem

Uploaded by

elbana795
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
22 views

Database Chapter 2 by Hatem

Uploaded by

elbana795
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 75

DATABASE DESIGN

By: Dr. Hatem Moharram


INTRODUCTION TO
DATABASE DESIGN

- What are the steps in designing a database?


- Why is the ER model used to create an initial design?
- What are the main concepts in the ER model?
- What are guidelines for using the ER model effectively?
- How does database design fit within the overall design
framework for complex software within large enterprises?
.. What is UML and how is it related to the ER model?
DATABASE DESIGN AND ER DIAGRAMS

The database design process can be divided into six steps.


The ER model is most relevant to the first three steps.

1- Requirements Analysis
2- Conceptual Database Design
3- Logical Database Design
4- Schema Refinement
5- Physical Database Design
6- Application and Security Design
1- Requirements Analysis
- The first step in designing a DBase application is to
understand:
• what data is to be stored in the database,
• what applications must be built on top of it, and
• what operations are most frequent and subject to
performance requirements.

- find out what the users want from the database.


- an informal process that involves discussions with user
groups.
- a study of the current operating environment and how it is
expected to change.
- analysis of any available documentation on existing
applications.
2- Conceptual Database Design
The information gathered in the requirements analysis
step is used to develop a high-level description of the data
to be stored in the database, along with the constraints
known to hold over this data.

- This step is often carried out using the ER model

- The ER model is one of several high-level, or semantic,


data models used in database design.
Conceptual Database Design

- The goal is to create a simple description of the data


that closely matches how users and developers think of
the data.

- Must enable a straightforward translation into a data


model supported by a commercial database system
(which, in practice, means the relational model).
3. Logical Database Design
We must choose a DBMS to implement our dbase design,
and convert the conceptual database design into a
database schema in the data model of the chosen DBMS.

the logical design step is to convert an ER schema into a


relational database schema.

the result is a conceptual schema, sometimes called the


logical schema, in the relational data model.
Beyond ER Design

A more careful analysis can often refine the logical


schema obtained at the end of Step 3.

we must consider performance criteria and design the


physical schema.

we must address security issues and ensure that users are


able to access the data they need, but not data that we
wish to hide from them.
4. Schema Refinement
the collection of relations in the relational database
schema must be analyzed to identify potential problems,
and to refine it.

5- Physical Database Design


building indexes on some tables
clustering some tables,
redesign of parts of the database schema obtained from
the earlier design steps.
6. Application and Security Design
Any software project that involves a DBMS must
consider aspects of the application that go beyond the
database itself.

Design methodologies like UML try to address the


complete software design and development cycle.

we must identify the entities (e.g., users, user groups,


departments) and processes involved in the application.
ENTITIES, ATTRIBUTES, AND ENTITY SETS
An entity is an object in the real world that is
distinguishable from other objects.

Examples include the following:


the toy department,
the manager of the toy department,
the address of the manager of the toy department.

An entity set a collection of similar entities.

An entity set is ssn lot


name
represented by a
rectangle in the ER
diagram Employees
entity sets need not be disjoint.

Ex.: the collection of toy department employees and the


collection of appliance department employees may both
contain employee John Doe (who happens to work in both
departments).

we could also define an entity set called Employees that


contains both the toy and appliance department employee
sets.
An entity is described using a set of attributes.
All entities in a given entity set have the same attributes
Ex.: the Employees entity set could use
name, social security number (ssn), and parking lot
(lot) as attributes.

In this case we will store the name, social security number,


and lot number for each employee.
we will not store, an employee's address (or gender or age).
an attribute is represented by an oval in the ER diagram.
since
ssn name lot did dname budget

Employees Works_In Departments


For each attribute associated with an entity set,
we must identify a domain of possible values.

Ex.: the domain associated with the attribute name of


Employees might be the set of 20-character strings.

If the company rates employees on a scale of 1 to 10 and


stores ratings in a field called rating, the associated domain
consists of integers 1 through 10.
for each entity set, we choose a key.

A key is a minimal set of attributes whose values uniquely


identify an entity in the set.

There could be more than one candidate key; if so, we


designate one of them as the primary key.
In the ER diagram each attribute in the primary key is
underlined
since
ssn name lot did dname budget

Employees Works_In Departments


RELTIONSHIPS AND RELATIONSHIP SETS
A relationship is an association
among two or more entities.

Ex.: we may have the relationship that Hatem works in the


Mathematics department.
A relationship set is a collection of similar
relationships.

A relationship set can be thought of as a set of n-tuples:

Each n-tuple denotes a relationship involving n entities e1


through en, where entity ei is in entity set Ei .
since
ssn name lot did dname budget

Employees Works_In Departments

In the relationship set Works_In, each relationship


indicates a department in which an employee works.
since
ssn name lot did dname budget

Employees Works_In2 Departments

Manages

since

several relationship sets might involve the same


entity sets.

Ex: we could also have a Manages relationship set


involving Employees and Departments.
since
ssn name lot did dname budget

Employees Works_In2 Departments

capacity Locations address

A relationship can also have descriptive attributes.


Descriptive attributes are used to record information about
the relationship, rather than about any one of the
participating entities.

Ex: we may wish to record that Hatem works in the


Mathematics department as of January 1987.
An instance of a relationship set is a set of relationships.

an instance can be thought of as a 'snapshot' of the


relationship set at some instant in time.

8-11-99
225- ahmed-5777 50- comp -577
12-10-98
226- aly- 5687 52- math- 567
25-6-96
227- fahd- 5638 53- stat- 568
20-12-93
228- saad- 5678
Employees WORKS_IN
11-11-95 Departments
Ex.: suppose that each department has offices in several
locations. This relationship is ternary because we must
record an association between an employee, a department,
and a location.

since
ssn name lot did dname budget

Employees Works_In2 Departments

capacity Locations address


The entity sets that participate in a relationship set
need not be distinct;

sometimes a relationship might involve two entities in the


same entity set.
ssn name lot

Employees

Supervisor Subordinate

Reports_To
2.4 ADDITIONAL FEATURES OF THE ER MODEL
There are some constructs in the ER model that allow us to
describe some subtle properties of the data. The
expressiveness of the ER model is a big reason for its
widespread use.
2.4.1 Key Constraints
Consider the Works_In relationship
since
ssn name lot did dname budget

Employees Works_In Departments

567- ahmed-5678 5-11-90


3- 3-99
5- computer-578
568- aly- 6745
4- 5- 98
6- math- 645
569- fahd- 5689
6- 2- 92
7- statistics- 569
570-tarek-7234

Manages Departments
Employees
567- ahmed-5678 5-11-90
3- 3-99
5- computer-578
568- aly- 6745
4- 5- 98
6- math- 645
569- fahd- 5689
6- 2- 92
7- statistics- 569
570-tarek-7234

Manages Departments
Employees

An employee can work in several departments, and a


department can have several employees.

the works_In is said to be many-to-many.


Ex.: consider a relationship set called Manages between
the Employees and Departments entity sets such that:

1- each department has at most one manager


2- although a single employee is allowed to manage
more than one department.

The restriction that each department has at most one


manager is an example of a key constraint.

This restriction is indicated in the ER diagram by using an


arrow from Departments to Manages.
since
ssn name lot did dname budget

Employees Manages Departments

A relationship set Manages is said to be one-to-many.


567- ahmed-5678
3- 3-99
5- computer-578
568- aly- 6745
4- 5- 98
6- math- 645
569- fahd- 5689
6- 2- 92
7- statistics- 569
570-tarek-7234

Manages Departments
Employees
If we add the restriction that each employee can manage
at most one department to the Manages relationship set,
which would be indicated by adding an arrow from
Employees to manages we have a one-to-one relationship
set.

since
ssn name lot did dname budget

Employees Departments
Manages
Key Constraints for Ternary Relationships
If an entity set E has a key constraint in a relationship set R,
each entity in an instance of E appears in at most one
relationship in (a corresponding instance of) R. To indicate
a key constraint on entity set E in relationship set R, we
draw an arrow from E to R.

since
ssn name lot did dname budget

Employees Works_In2 Departments

capacity Locations address


since
ssn name lot did dname budget

Employees Works_In2 Departments

capacity Locations address

- Each employee works in at most one department and at


a single location.
- each department can be associated with several
employees and locations
- each location can be associated with several
departments and employees
5- computer-578
567- ahmed-5678 3- 3-99
6- math- 645
568- aly- 6745 4- 5- 98
7- statistics- 569
569- fahd- 5689 6- 2- 92 Departments

570-tarek-7234 1-10-97
cairo-58
Employees Works_In3
Key constraint Reyad- 65

maskat- 56

Locations
2.4.2 Participation Constraints

Is every department has a manager?


every department is required to have a manager.

This requirement is an example of a participation


constraint.
the participation of the entity set Departments in the
relationship set Manages is said to be total participation.

A participation that is not total is said to be partial


participation.
the participation of the Employees in Manages
is partial, since not every employee gets to
manage a department.
If the participation of an entity set in a relationship set is
total, the two are connected by a thick line; independently,
the presence of an arrow indicates a key constraint.

since
ssn name lot did dname budget

Employees Works_In Departments

Manages

since
Weak Entities
we have assumed that the attributes associated with an

entity set include a key !


This assumption does not always hold.

Ex.: Each book has several editions, and certainly it is


nonsense to speak about an edition if this does not happen
in the context of a specific book.
? date
bid btit bauth edid

Book edition
Published
Ex.: The entity set transaction has attributes transaction-
number, date and amount. Different transactions on
different accounts could share the same number.
These are not sufficient to form a primary key (uniquely
identify a transaction).

?
Aid amoun Adate tnumber date
t

account transaction
On
Ex.: suppose that employees can purchase insurance
policies to cover their dependents.

We wish to record information about policies, including who


is covered by each policy, but this information is really our
only interest in the dependents of an employee.

If an employee quits, any policy owned by the employee is


terminated and we want to delete all the relevant policy and
dependent information from the database.
ssn name lot
?
pname age
cost

Employees Dependents
Policy
We might choose to identify a dependent by name alone in
this situation, since it is reasonable to expect that the
dependents of a given employee have different names.
Thus the attributes of the Dependents entity set might be
pname and age.

The attribute pname does not identify a dependent


uniquely. Recall that the key for Employees is ssn; thus we
might have two employees called Smethurst and each
might have a son called Joe.
A weak entity can be identified uniquely only by
considering some of its attributes in conjunction with
the primary key of another entity, which is called the
identifying owner.

The following restrictions must hold:


1- The owner entity set and the weak entity set must
participate in a one-to- many relationship set (one owner
entity is associated with one or more weak entities, but
each weak entity has a single owner). This relationship set
is called the identifying relationship set of the weak entity
set.

2- The weak entity set must have total participation in the


identifying relationship set.
a Dependents entity can be identified uniquely only if we
take the key of the owning Employees entity and the
pname of the Dependents entity.

The set of attributes of a weak entity set that uniquely


identify a weak entity for a given owner entity is called a
partial key of the weak entity set. In our example, pname
is a partial key for Dependents.
To underscore the fact that Dependents is a weak entity
and Policy is its identifying relationship, we draw both with
dark lines.

To indicate that pname is a partial key for Dependents, we


underline it using a broken line. This means that there may
well be two dependents with the same pname value.

ssn name lot pname age


cost

Employees Dependents
Policy
2.4.4 Class Hierarchies

Sometimes it is natural to classify the entities in an entity


set into subclasses.

Ex.: we might want to talk about an Hourly-Emps entity set


and a ContracLEmps entity set to distinguish the basis on
which they are paid.

We might have attributes hours_worked and hourly_wage


defined for Hourly_Emps and an attribute contractid
defined for ContracLEmps.
ssn name lot

Employees

hours_worked ISA contractid

hourly_wages Hourly_Emps Contact_Emps

every entity in one of these sets is also an Employees


entity and, as such, must have all the attributes of
Employees defined.
ssn name lot

Employees

hours_worked ISA contractid

hourly_wages Hourly_Emps Contact_Emps

the attributes defined for an Hourly_Emps entity are the


attributes for Employees plus Hourly_Emps.

at the attributes for the entity set Employees are inherited


by the entity set Hourly_Emps and that Hourly-Emps ISA
(read is a) Employees.
We can specify two kinds of constraints with respect to ISA
hierarchies:
1- overlap constraint: Overlap constraints determine
whether two subclasses are allowed to contain the same
entity.
Ex.: can Attishoo be both an Hourly_Emps entity and a
Contrac_Emps entity? no.
Can he be both a Contract_Emps entity and a Senior-Emps
entity? yes.
We denote this by writing 'Contract_Emps OVERLAPS
Senior-Emps.'

In the absence of such a statement, we assume by default


that entity sets are constrained to have no overlap.
2- Covering constraint: determines whether the entities in
the subclasses collectively include all entities in the
superclass.

Ex.: does every Employees entity have to belong to one of


its subclasses? no.
Does every motor_Vehicles entity have to be either a
Motorboats entity or a Cars entity? Yes

we denote this by writing 'Motorboats AND Cars COVER


Motor-Vehicles.'

In the absence of such a statement, we assume by default


that there is no covering constraint.
2.4.5 Aggregation
a relationship set is an association between entity sets.

Sometimes, we have to model a relationship between a


collection of entities and relationships.

Aggregation allows us to indicate that a


relationship set participates in another
relationship set.
Ex. Suppose that we have:
1- an entity set called Projects and
2- each Projects entity is sponsored by one or more
departments.
3- A department that sponsors a project might assign
employees to monitor the sponsorship.

Monitors should be a relationship set that


associates a Sponsors relationship (rather than
a Projects or Departments entity) with an
Employees entity.
a dashed box around Sponsors (and its participating entity
sets) used to denote aggregation.

since
Started_on pbudget did dname budget

pid Projects Sponsors Departments

We treated Sponsors as an
entity set for purposes of Monitors until
defining the Monitors
relationship set.
name Employees ssn

lot
When should we use aggregation?
we use it when we need to express a relationship among
relationships.
can we not express relationships involving other
relationships without using aggregation?

why not make Sponsors a ternary relationship?


- there are really two distinct relationships each with
attributes of its own. the Monitors relationship has an
attribute until that records the date until when the
employee is appointed as the sponsorship monitor.
the attribute since of Sponsors is the date when the
sponsorship took effect.
- The use of aggregation versus a ternary relationship may
also be guided by certain integrity constraints.
2.5 CONCEPTUAL DESIGN WITH THE ER MODEL
Developing an ER diagram presents several choices,
including the following:
1- Should a concept be modeled as an entity or an attribute?
2- Should a concept be modeled as an entity or a
relationship?
3- what arc the relationship sets and their participating
entity sets?
4- Should we use binary or ternary relationships?
5- Should we use aggregation?
2.5.1 Entity versus Attribute

whether a property should be modeled as an attribute or as


an entity set ?

Ex.: consider adding address information to the


Employees entity set. There are two options:

1- to use an attribute address. This option is appropriate if


we need to record only one address per employee, and it
suffices to think of an address as a string.

2- to create an entity set called Addresses and to record


associations between employees and addresses using a
relationship (say, Has_Address).
ssn name
lot

Employees address

AD_ID state city ssn name


country lot

Address Employees
Has_Address

- We record more than one address for an employee.


-We capture the structure of an address in our ER diagram, we break
down an address into city, state and country.
- we can support queries such as "Find all employees with an address
in Madison, WI."
Ex.: consider the relationship set (called Works_In4) it has
attributes from and to, instead of since.
it records the interval during which an employee works for a
department.
Now suppose that it is possible for an employee to work in a
given department over more than one period.
This possibility is ruled out by the ER diagram's semantics,
because a relationship is uniquely identified by the
participating entities.

from to
ssn name lot did dname budget

Employees Works_In4 Departments


from to
ssn name lot did dname budget

Employees Works_In4 Departments

we can address this problem by introducing an entity set


called Duration, with attributes from and to.

ssn name lot did dname budget

Employees Works_In4 Departments

Duration

from to
2.5.2 Entity versus Relationship

Consider the relationship set called Manages.

since
ssn name lot did dname budget

Employees Manages Departments


Suppose that each department manager is given a
discretionary budget (dbudget) in which we have also
renamed the relationship set to Manages2.

since dbudget

ssn name lot did dname budget

Employees Manages2 Departments

what if the discretionary budget is a sum that covers all


departments managed by that employee?
In this case, each Manages2 relationship that involves a
given employee will have the same value in the dbudget
field, leading to redundant storage of the same information.
Another problem: with this design is that it is misleading; it
suggests that the budget is associated with the relationship,
when it is actually associated with the manager.

since dbudget

ssn name lot did dname budget

Employees Manages2 Departments


Solution:
1- create a new entity set called Managers (which can be
placed below Employees in an ISA hierarchy, to show that
every manager is also an employee).
2- describe the entity manager with the attributes since and
dbudget

ssn name lot did dname budget


Employees Works_In Departments

IS_A

Managers Manages3
since dbudget
ssn name lot did dname budget
Employees Works_In Departments

IS_A each manager have


only one starting
Managers Manages3 date (as manager)
for each
since dbudget
department.

ssn name lot did dname budget


Employees Works_In Departments

IS_A each manager may


have a different
Managers Manages3 starting date (as
since manager) for each
dbudget department.
2.5.3 Binary versus Ternary Relationships

Ex.: the following ER diagram models a situation in which:


- an employee can own several policies
- each policy can be owned by several employees, and
- each dependent can be covered by several policies.

ssn name lot age pname

Employees covers Dependents

Policies

cost policyid
Suppose that we have the following additional
requirements:
1- A policy cannot be owned jointly by two or more
employees.
2- Every policy must be owned by some employee.
3-Dependents is a weak entity set, and each dependent
entity is uniquely identified by taking pname in conjunction
with the policyid of a policy entity (which, intuitively, covers
the given dependent).
1- A policy cannot be owned jointly by two or more
employees.
ssn name lot age pname

Employees covers Dependents

Policies

cost policyid
The first requirement suggests that we impose a key
constraint on Policies with respect to Covers, but this
constraint has the unintended side effect that a policy can
cover only one dependent.
2- Every policy must be owned by some employee.

ssn name lot age pname

Employees covers Dependents

Policies

cost policyid

The second requirement suggests that we impose a total


participation constraint on Policies. This solution is
acceptable if each policy covers at least one dependent.
3-Dependents is a weak entity set, and each dependent
entity is uniquely identified by taking pname in conjunction
with the policyid of a policy entity.

The third requirement forces us to introduce an identifying


relationship that is binary.
ssn lot age pname
name

Dependents
Employees Purchaser Beneficiacy

Policies

cost policyid
Ex.: (ternary relationship) entity sets Parts, Suppliers, and
Departments, and a relationship set Contracts (with
descriptive attribute qty) that involves all of them.
Pn Pnam lot qty Sn Sname
e
Parts Contracts Suppliers

Departments

Dname Dn

A contract specifies that a supplier will supply (some


quantity of) a part to a department. This relationship
cannot be adequately captured by a collection of binary
relationships (without the use of aggregation).
Pn Pnam lot qty Sn Sname
e
Parts supply Suppliers

Contracts
Departments

Dname Dn

With binary relationships, we can denote that a supplier


'can supply' certain parts, that a department 'needs' some
parts.
Pn Pnam lot qty Sn Sname
e
Parts supply Suppliers

deals
Departments

Dname Dn

or we can denote that a department 'deals with' a certain


supplier.
No combination of these relationships expresses the
meaning of a contract adequately, for at least two reasons:
• The facts that supplier S can supply part P, that
department D needs part P, and that D will buy from S do
not necessarily imply that department D indeed buys part P
from supplier S!
• We cannot represent the qty of a contract cleanly.
Pn Pnam lot qty Sn Sname
e
Parts supply Suppliers

deals
Departments

Dname Dn
2.5.4 Aggregation versus Ternary Relationships

the choice between using aggregation or a ternary


relationship is mainly determined by the existence of a
relationship that relates a relationship set to an entity set
(or second relationship set).

The choice may also be guided by certain integrity


constraints that we want to express.
Ex.: consider the following ER diagram.

since
Started_on pbudget did dname budget

pid Projects Sponsors Departments

According to this a department


diagram, a project can be can sponsor
Monitors until
sponsored by any one or more
number of departments, projects,
name Employees ssn

lot
each sponsorship is monitored by one or more employees.
If we don't need to record the until attribute of Monitors,
then we might reasonably use a ternary relationship, say,
Sponsors2.

Started_on pbudget did dname budget

pid Projects Sponsors2 Departments

name Employees ssn

lot
Consider the constraint that each sponsorship (of a project
by a department) be monitored by at most one employee.
we cannot express this constraint in terms of the Sponsors2
relationship set.

Started_on pbudget did dname budget

pid Projects Sponsors2 Departments

name Employees ssn

lot
On the other hand, we can easily express the constraint by
drawing an arrow from the aggregated relationship Sponsors
to the relationship Monitors. Thus, the presence of such a
constraint serves as another reason for using aggregation
rather than a ternary relationship set.

since did
Started_on pbudget dname budget
pid Projects Sponsors Departments

Monitors until

name Employees ssn


lot
2.6 CONCEPTUAL DESIGN FOR LARGE ENTERPRISES
For a large enterprise, the design may require the efforts of
more than one designer and span data and application code
used by a number of user groups.

•An important aspect of the design process is that the design


takes into account all user requirements and is consistent.
•The usual approach is that the requirements of various user
groups are considered, any conflicting requirements are
somehow resolved, and a single set of global requirements
is generated at the end of the requirements analysis phase.
•Generating a single set of global requirements is a difficult
task.
An alternative approach is to develop separate conceptual
schemas for different user groups and then integrate these
conceptual schemas.

To integrate multiple conceptual schemas, we must establish


correspondences between entities, relationships, and
attributes, and we must resolve numerous kinds of conflicts.

This task is difficult in its own right.

You might also like