SWE DataBase
SWE DataBase
INTRODUCTION
TO DATABASE
HND-Common Course
Year I / Semester II
Database System
Functions of a data base management system
• It allows organizations to conveniently develop databases for various
applications by database administrators (DBAs) and other
specialists.
• A DBMS allows different user application programs to concurrently
access the same database. DBMSs may use a variety of database
models, such as the relational model or object model, to
conveniently describe and support applications.
2. DATA MODELS
Data Models
Underlying the structure of a database is the data model: a collection of
conceptual tools for describing data, data relationships, data semantics,
and consistency constraints. The data structures include the data
objects, the associations between data objects, and the rules which
govern operations on the objects. As the name implies, the data model
focuses on what data is required and how it should be organized rather
than what operations will be performed on the data. To use a common
analogy, the data model is equivalent to an architect's building plans.
A data model is independent of hardware or software constraints . Rather
than try to represent the data as a database would see it, the data model
focuses on representing the data as the user sees it in the "real world". It
serves as a bridge between the concepts that make up real-world events
and processes and the physical representation of those concepts in a
database. To illustrate the concept of a data model, we outline two data
models in this section: the entity-relationship model and the
relational model. Both provide a way to describe the design of a
database.
Methodology
There are two major methodologies used to create a data model:
• the Entity-Relationship (ER) approach and
• the Object Model. This course uses the Entity-Relationship approach.
A. Entity-Relationship Model
The ER model is a conceptual data model that views the real-world as
entities and relationships. A basic component of the model is the Entity-
Relationship diagram which is used to visually represents data objects.
The utility of the ER model is:
• It maps well to the relational model. The constructs used in the ER
model can easily be transformed into relational tables.
• It is simple and easy to understand with a minimum of training.
Therefore, the model can be used by the database designer to
communicate the design to the end user.
• In addition, the model can be used as a design plan by the database
developer to implement a data model in a specific database
management software.
(i) Entities
Entities are the principal data object about which information is to be
collected. Entities are usually recognizable concepts, either concrete or
abstract, such as person, places, things, or events which have relevance
Customer is a strong entity type, an identifying entity for order, order a weak
entity type and cust-order is an identifying relationship.
Car and Truck share common attributes generalized into a higher-level Vehicle
(ii) Relationships
A Relationship represents an association between two or more entities. An
example of a relationship would be:
• Employees are assigned to projects
• Projects have subtasks
COURSE FACILITATORS: NYAM STEPHANIE [email protected] &
13
TATSOPTEU E. ENDELLY [email protected]
• Departments manage one or more projects
Classifying Relationships
Relationships are classified by their degree, connectivity, cardinality,
direction, type, and existence. Not all modelling methodologies use all
these classifications.
• Degree of a Relationship: The degree of a relationship is the
number of entities associated with the relationship. The n-ary
relationship is the general form for degree n. Special cases are the
binary, and ternary, where the degree is 2, and 3, respectively.
Binary relationships, the association between two entities is the
most common type in the real world. A recursive binary
relationship occurs when an entity is related to itself. An example
might be "some employees are married to other employees". A
ternary relationship involves three entities and is used when a
binary relationship is inadequate. Many modelling approaches
recognize only binary relationships. Ternary or n-ary relationships
are decomposed into two or more binary relationships.
• Connectivity and Cardinality: The connectivity of a relationship
describes the mapping of associated entity instances in the
relationship. The values of connectivity are "one" or "many". The
cardinality of a relationship is the actual number of related
occurrences for each of the two entities. The basic types of
connectivity for relations are: one-to-one, one to-many, and many-
to-many.
- A one-to-one(1:1) relationship is when at most one instance
of an entity A is associated with one instance of entity B. For
example, "employees in the company are each assigned their
own office". That is, for each employee there exists a unique
office and for each office there exists a unique employee.
- A one-to-many(1:N) relationships is when for one instance
of entity A, there are zero, one, or many instances of entity B,
but for one instance of entity B, there is only one instance of
entity A. An example of a 1:N relationships is "A department has
many employees". i.e. Each employee is assigned to one
department.
- A many-to-many(M:N) relationship is when for one instance
of entity A, there are zero, one, or many instances of entity B and
From the ER diagram above, the entity customer has many attributes:
Email address is a key attribute, telephone number a multivalued
attribute, postal address a composite attribute, and the remaining are
atomic attributes.
ER Notation
There is no standard for representing data objects in ER diagrams. Each
modelling methodology uses its own notation. All notational styles
represent entities as rectangular boxes and relationships as lines
connecting boxes. Each style uses a special set of symbols to represent
the cardinality of a connection. The symbols used for the basic ER
constructs from the are:
(i) Entities:
There are various definitions of an entity:
(a) "Any distinguishable person, place, thing, event, or concept, about
which information is kept".
(b) "A thing which can be distinctly identified".
(c) "Any distinguishable object that is to be represented in a database".
B. Relational Model
The logical model is also called a Relational Model. The elements of a
database are Tables, Queries ...
• A Table: is a collection of data about a specific topic such as
products, students or suppliers. A table organizes data into columns
(fields) and rows (records or tuples).
• A field: is a piece of information about a subject. Each field is
arranged as a column in table.
• A record: is complete information about a subject. A record is a
collection of fields and presented as a row in a table of database.
NOTES:
• The tuples in an instance of a relation are not considered to be
ordered putting the rows in a different sequence does not change
the table.
• Once the schema, R( A1, A2, A3, …, An) is defined, the values, vi,
in each tuple, t, must be ordered as t = <v1, v2, v3, …, vn>
Properties of relations
Properties of database relations are:
• relation name is distinct from all other relations
• each cell of relation contains exactly one atomic (single) value
• each attribute has a distinct name
• values of an attribute are all from the same domain
• order of attributes has no significance
• each tuple is distinct, there are no duplicate tuples
• order of tuples has no significance, theoretically.
The last record of the table Enrolled shows a violation of the referential integrity
3. NORMALIZATION
• A Formal process of decomposing relations with anomalies to
produce smaller, well-structured and stable relations.
• Its objectives are to validate and improve a logical design so that it
satisfies certain constraints that avoid unnecessary duplication
of data.
Well-Structured Relations
• A relation that contains minimal data redundancy and allows users
to insert, delete, and update rows without causing data
inconsistencies.
• Goal is to avoid (minimize) anomalies
– Insertion Anomaly: adding new rows forces user to create
duplicate data
– Deletion Anomaly: deleting a row may cause loss of other data
representing completely different facts
– Modification Anomaly: changing data in a row forces changes
to other rows because of duplication
Functional Dependencies
In order to be able to normalize a relation, we must first understand the
concept of dependency between attributes within a relation. There exist
various types of dependencies:
• Functional dependency: if "A" and "B" are attributes of relation
"R", "B" is functionally dependent on "A" (denoted "A" →"B"), if each
value of "A" in "R" is associated with exactly one value of "B" in "R".
Candidate Key:
– Attribute that uniquely identifies a row in a relation
– Could be a combination of (non-redundant) attributes
– Each non-key field is functionally dependent on every candidate key
Steps in Normalization
(i) First Normal Form (1NF)
The characteristics of the 1NF are as follows:
• Only atomic attributes (simple, single-value)
• A primary key has been identified
• Every relation is in 1NF by definition
• 1NF example:
Further Normalization
• Boyce-Codd Normal form (BCNF)
– Slight difference with 3NF
– To be in 3NF but not in BNF, needs two composite candidate keys,
with one attribute of one key depending on one attribute of the other
– Not very common
– If a table contains only one candidate key, the 3NF and the BCNF
are equivalent.
• Fourth Normal Form (4NF)
4. RELATIONAL LANGUAGES
We have so far considered the structure of a database; the relations and
the associations between relations. In this section we consider how useful
data may be extracted and filtered from database tables. A relational
language is needed to express these queries in a well-defined way. A
relational language is an abstract language which provides the
database user with an interface through which they can specify data to be
retrieved according to certain selection criteria. The two main relational
languages are relational algebra and relational calculus. Relational
algebra, which we focus on here, provides the user with a set of operators
Relational Algebra
Relational algebra is a procedural language consisting of a set of
operators. Each operator takes one or more relations as its input and
produces one relation as its output. The seven basic relational algebra
operations are Selection, Projection, Joining, Union, Intersection,
Difference and Division. It is important to note that these operations
do not alter the database. The relation produced by an operation is
available to the user but it is not stored in the database by the operation.
(i) Selection (also called Restriction) Operation
The SELECT operator selects all tuples from some relation R, so that some
attributes in each tuple satisfy some condition c. And the syntax is:
R1 = SELECTϲ(R) or R1 = σϲ(R)
A new relation R1 containing the selected tuples is then created as output.
Suppose we have the relation STORES:
We can also impose conditions on more than one attribute. This is done
by connecting those Clauses, using Boolean operators (AND, OR, NOT)
Examples:
Notes:
• Degree of projection is equal to the number of attributes in the
attributes list
• If attribute list includes only non-key attributes, duplicate tuples
likely to occur
• Project operation removes any duplicate tuples
• Number of tuples in resulting operation always less than or equal to
the number of tuples in R
• If projection list is a super key of R (i.e., includes some key of R),
resulting relation has same number of tuples as R
This relation resulted from a joining of ITEMS and STORES over the
common attribute Store-ID, i.e. any tuples of each relation which
contained the same value of Store-ID were joined together to form a single tuple.
Joining relations together based on equality of values of common
attributes is called an equijoin. When duplicate attributes are removed
from the result of an equijoin this is called a natural join, denoted:
R5 = R JOIN R’ and Alternatively by notation R5 =R R’.
The example above is such a natural join - as Store-ID appears only once
in the result. Note that there is often a connection between keys (primary
and foreign) and the attributes over which a join is performed in order to
amalgamate information from multiple related tables in a database. In the
above example, ITEMS.Store_ID is a foreign key reflecting the primary key
STORE.Store_ID. When we join on Store_ID the relationship between the
tables is expressed explicitly in the resulting output table. To illustrate, the
relationship between these relations can be expressed as an E-R diagram,
shown below.
As an exercise, find:
C = UNION(A,B), C = INTERSTION(A,B) and C= DIFFERENCE(A,B).
The operation:
R8 = R6 / R7
will give the result:
This is because C3 is the only company for which there is a row with
Boston and New York. The other companies, C1 and C2, do not satisfy this
condition.
Aggregation Operators
Relational calculus
Contrary to Relational Algebra which is a procedural query language to
fetch data and which also explains how it is done, Relational Calculus
is non-procedural query language and has no description about how the
query will work or the data will be fetched. It only focusses on what to do,
References/Suggested Readings
1. Date, C.J., Introduction to Database Systems (7th Edition) Addison
Books:
• A First Course in Database Systems, by J. Ullman and J. Widom
• Fundamentals of Database Systems, by R. Elmasri and S. Navathe