0% found this document useful (0 votes)
37 views

Database 3rd Semister All Chapter

The document discusses different approaches to data handling, including manual, traditional file-based, and database approaches. It describes the key features and limitations of each approach. The database approach emphasizes data integration and sharing across an organization using a centralized database managed by a database management system.

Uploaded by

demeereta.13
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
37 views

Database 3rd Semister All Chapter

The document discusses different approaches to data handling, including manual, traditional file-based, and database approaches. It describes the key features and limitations of each approach. The database approach emphasizes data integration and sharing across an organization using a centralized database managed by a database management system.

Uploaded by

demeereta.13
Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 74

CHAPTER ONE

Introduction to Database Systems


Database systems are designed to manage large data set in an organization. The data
management involves both definition and the manipulation of the data which ranges from
simple representation of the data to considerations of structures for the storage of
information. The data management also consider the provision of mechanisms for the
manipulation of information.
Today, Databases are essential to every business. They are used to maintain internal
records, to present data to customers and clients on the World-Wide-Web, and to support
many other commercial processes. Databases are likewise found at the core of many
modern organizations.
The power of databases comes from a body of knowledge and technology that has
developed over several decades and is embodied in specialized software called a database
management system, or DBMS. A DBMS is a powerful tool for creating and managing
large amounts of data efficiently and allowing it to persist over long periods of time,
safely. These systems are among the most complex types of software available.
Thus, for our question: What is a database? In essence a database is nothing more than a
collection of shared information that exists over a long period of time, often many years.
In common dialect, the term database refers to a collection of data that is managed by a
DBMS.

Thus the DB course is about:


◼ How to organize data
◼ Supporting multiple users
◼ Efficient and effective data retrieval
◼ Secured and reliable storage of data
◼ Maintaining consistent data
◼ Making information useful for decision making

Data Handling approaches


Data management passes through the different levels of development along with
the development in technology and services. These levels could best be described
by categorizing the levels into three levels of development. Even though there is
Fundamentals of Database Systems Organized By Melkam A.
1
an advantage and a problem overcome at each new level, all methods of data
handling are in use to some extent.
The major three levels are;

1. Manual Approach
2. Traditional File Based Approach
3. Database Approach
1. Manual Approach
In the manual approach, data storage and retrieval follows the primitive and traditional
way of information handling where cards and paper are used for the purpose. The data
storage and retrieval will be performed using human labour.

Files, for as many event and objects as the organization has, are used to store information.
• Each of the files containing various kinds of information is labelled and stored in
one or more cabinets.
• The cabinets could be kept in safe places for security purpose based on the
sensitivity of the information contained in it.
• Insertion and retrieval is done by searching first for the right cabinet then for the
right the file then the information.
• One could have an indexing system to facilitate access to the data

Limitations of the Manual approach


• Prone to error
• Difficult to update, retrieve, integrate
• You have the data but it is difficult to compile the information
• Limited to small size information
• Cross referencing is difficult

An alternative approach of data handling is a computerized way of dealing with


the information. The computerized approach could also be either decentralized
or centralized base on where the data resides in the system.

Fundamentals of Database Systems Organized By Melkam A.


2
2. Traditional File Based Approach
After the introduction of Computer for data processing to the business community, the
need to use the device for data storage and processing increase. There were, and still are,
several computer applications with file based processing used for the purpose of data
handling. Even though the approach evolved over time, the basic structure is still similar
if not identical.
• File based systems were an early attempt to computerize the manual filing system.
• This approach is the decentralized computerized data handling method.
• A collection of application programs perform services for the end-users. In such
systems, every application program that provides service to end users define and
manage its own data
• Such systems have number of programs for each of the different applications in
the organization.
• Since every application defines and manages its own data, the system is subject to
serious data duplication problem.
• File, in traditional file based approach, is a collection of records which contains
logically related data.

Limitations of the Traditional File Based approach


As business application become more complex demanding more flexible and reliable data
handling methods, the shortcomings of the file based system became evident. These
shortcomings include, but not limited to:
• Separation or Isolation of Data: Available information in one application may not
be known. Data Synchronisation is done manually.
• Limited data sharing- every application maintains its own data.
• Lengthy development and maintenance time
• Duplication or redundancy of data (money and time cost and loss of data
integrity)

Fundamentals of Database Systems Organized By Melkam A.


3
• Data dependency on the application- data structure is embedded in the
application; hence, a change in the data structure needs to change the application
as well.
• Incompatible file formats or data structures (e.g. “C” and COBOL) between
different applications and programs creating inconsistency and difficulty to
process jointly.
• Fixed query processing which is defined during application development
The limitations for the traditional file based data handling approach arise from two basic
reasons.
• Definition of the data is embedded in the application program which makes it
difficult to modify the database definition easily.
• No control over the access and manipulation of the data beyond that imposed by
the application programs.
The most significant problem experienced by the traditional file based approach of data
handling can be formalized by what is called “update anomalies”. We have three types of
update anomalies;
Modification Anomalies: a problem experienced when one or more data value is
modified on one application program but not on others containing the same data set.
Deletion Anomalies: a problem encountered where one record set is deleted from one
application but remain untouched in other application programs.
Insertion Anomalies: a problem experienced whenever there is new data item to be
recorded, and the recording is not made in all the applications. And when same data item
is inserted at different applications, there could be errors in encoding which makes the
new data item to be considered as a totally different object.

Fundamentals of Database Systems Organized By Melkam A.


4
3. Database Approach
database systems should present the user with a view of data organized as tables called
relations. Behind the scenes, there might be a complex data structure that allowed rapid
response to a variety of queries. But, unlike the user of earlier database systems, the user
of a relational system would not be concerned with the storage structure. Queries could
be expressed in a very high-level language, which greatly increased the efficiency of
database programmers. The database approach emphasizes the integration and sharing of
data throughout the organization.

Thus in Database Approach:


• Database is just a computerized record keeping system or a kind of electronic
filing cabinet.
• Database is a repository for collection of computerized data files.
• Database is a shared collection of logically related data and description of data
designed to meet the information needs of an organization. Since it is a shared
corporate resource, the database is integrated with minimum amount of or no
duplication.
• Database is a collection of logically related data where these logically related data
comprises entities, attributes, relationships, and business rules of an organization's
information.
• In addition to containing data required by an organization, database also contains
a description of the data which is known as “Metadata” or “Data Dictionary” or
“Systems Catalogue” or “Data about Data” or some times “Data Directory”.
• Since a database contains information about the data (metadata), it is called a self-
descriptive collection of integrated records.
• The purpose of a database is to store information and to allow users to retrieve
and update that information on demand.
• Database is deigned once and used simultaneously by many users.
• Unlike the traditional file based approach in database approach there is program
data independence. That is the separation of the data definition from the

Fundamentals of Database Systems Organized By Melkam A.


5
application. Thus the application is not affected by changes made in the data
structure and file organization.
• Each database application will perform the combination of: Creating database,
Reading, Updating and Deleting data.

Benefits of the database approach


• Data can be shared: two or more users can access and use same data instead of
storing data in redundant manner for each user.
• Improved accessibility of data: by using structured query languages, the users can
easily access data without programming experience.
• Redundancy can be reduced: isolated data is integrated in database to decrease the
redundant data stored at different applications.
• Quality data can be maintained: the different integrity constraints in the database
approach will maintain the quality leading to better decision making
• Inconsistency can be avoided: controlled data redundancy will avoid
inconsistency of the data in the database to some extent.
• Transaction support can be provided: basic demands of any transaction support
systems are implanted in a full scale DBMS.
• Integrity can be maintained: data at different applications will be integrated
together with additional constraints to facilitate validity and consistency of shared
data resource.
• Security measures can be enforced: the shared data can be secured by having
different levels of clearance and other data security mechanisms.
• Improved decision support: the database will provide information useful for
decision making.
• Standards can be enforced: the different ways of using and dealing with data by
different unite of an organization can be balanced and standardized by using
database approach.
• Compactness: since it is an electronic data handling method, the data is stored
compactly (no voluminous papers).

Fundamentals of Database Systems Organized By Melkam A.


6
• Speed: data storage and retrieval is fast as it will be using the modern fast
computer systems.
• Less labour: unlike the other data handling methods, data maintenance will not
demand much resource.
• Centralized information control: since relevant data in the organization will be
stored at one repository, it can be controlled and managed at the central level.
Limitations and risk of Database Approach
➢ Introduction of new professional and specialized personnel.
➢ Complexity in designing and managing data
➢ The cost and risk during conversion from the old to the new system
➢ High cost to be incurred to develop and maintain the system
➢ Complex backup and recovery services from the users perspective
➢ Reduced performance due to centralization and data independency
➢ High impact on the system when failure occurs to the central system.

Fundamentals of Database Systems Organized By Melkam A.


7
Database Management System (DBMS)
Database Management System (DBMS) is a Software package used for providing
EFFICIENT, CONVENIENT and SAFE MULTI-USER (many people/programs
accessing same database, or even same data, simultaneously) storage of and access to
MASSIVE amounts of PERSISTENT (data outlives programs that operate on it) data. A
DBMS also provides a systematic method for creating, updating, storing, retrieving data
in a database. DBMS also provides the service of controlling data access, enforcing data
integrity, managing concurrency control, and recovery.

Having this in mind, a full scale DBMS should at least have the following services to
provide to the user.
➢ Data storage, retrieval and update in the database
➢ A user accessible catalogue
➢ Transaction support service: ALL or NONE transaction, which minimize data
inconsistency.
➢ Concurrency Control Services: access and update on the database by different
users simultaneously should be implemented correctly.
➢ Recovery Services: a mechanism for recovering the database after a failure must
be available.
➢ Authorization Services (Security): must support the implementation of access and
authorization service to database administrator and users.
➢ Support for Data Communication: should provide the facility to integrate with
data transfer software or data communication managers.
➢ Integrity Services: rules about data and the change that took place on the data,
correctness and consistency of stored data, and quality of data based on business
constraints.
➢ Services to promote data independency between the data and the application

Fundamentals of Database Systems Organized By Melkam A.


8
DBMS and Components of DBMS Environment

Fig. General architecture of a DBMS

A DBMS is software package used to design, manage, and maintain databases. Each
DBMS should have facilities to define the database, manipulate the content of the
database and control the database. These facilities will help the designer, the user as well
as the database administrator to discharge their responsibility in designing, using and
managing the database.
It provides the following facilities:

• Data Definition Language (DDL):


✓ Language used to define each data element required by the organization.
✓ Commands for setting up schema or the intension of database
✓ These commands are used to setup a database, create, delete and alter table
with the facility of handling constraints

Fundamentals of Database Systems Organized By Melkam A.


9
• Data Manipulation Language (DML):
✓ Is a core command used by end-users and programmers to store, retrieve,
and access the data in the database e.g. SQL
✓ Since the required data or Query by the user will be extracted using this
type of language, it is also called "Query Language"

• Data Control Language:


✓ Database is a shared resource that demands control of data access and
usage. The database administrator should have the facility to control the
overall operation of the system.
✓ Data Control Languages are commands that will help the Database
Administrator to control the database.
✓ The commands include grant or revoke privileges to access the database or
particular object within the database and to store or remove database
transactions

The DBMS is software package that helps to design, manage, and use data using the
database approach. Taking a DBMS as a system, one can describe it with respect to it
environment or other systems interacting with the DBMS. The DBMS environment has
five components. To design and use a database, there will be the interaction or integration
of Hardware, Software, Data, Procedure and People.

1. Hardware: are components that one can touch and feel. These components are
comprised of various types of personal computers, mainframe or any server
computers to be used in multi-user system, network infrastructure, and other
peripherals required in the system.

2. Software: are collection of commands and programs used to manipulate the


hardware to perform a function. These include components like the DBMS

Fundamentals of Database Systems Organized By Melkam A.


10
software, application programs, operating systems, network software, language
software and other relevant software.

3. Data: since the goal of any database system is to have better control of the data
and making data useful, Data is the most important component to the user of the
database. There are two categories of data in any database system: that is
Operational and Metadata. Operational data is the data actually stored in the
system to be used by the user. Metadata is the data that is used to store
information about the database itself. The structure of the data in the database is
called the schema, which is composed of the Entities, Properties of entities, and
relationship between entities and business constraints.

4. Procedure: this is the rules and regulations on how to design and use a
database. It includes procedures like how to log on to the DBMS, how to use
facilities, how to start and stop DBMS, how to make backup, how to treat
hardware and software failure, how to change the structure of the database.

5. People: this component is composed of the people in the organization that are
responsible or play a role in designing, implementing, managing, administering
and using the resources in the database. This component includes group of people
with high level of knowledge about the database and the design technology to
other with no knowledge of the system except using the data in the database.

Database Development Life Cycle (DDLC)

Fundamentals of Database Systems Organized By Melkam A.


11
As it is one component in most information system development tasks, there are several
steps in designing a database system. Here more emphasis is given to the design phases
of the system development life cycle. The major steps in database design are;

1. Planning: that is identifying information gap in an organization and propose a


database solution to solve the problem.

2. Analysis: that concentrates more on fact finding about the problem or the
opportunity. Feasibility analysis, requirement determination and structuring, and
selection of best design method are also performed at this phase.

3. Design: in database development more emphasis is given to this phase. The


phase is further divided into three sub-phases.
a. Conceptual Design: concise description of the data, data type,
relationship between data and constraints on the data.
• There is no implementation or physical detail consideration.
• Used to elicit and structure all information requirements
b. Logical Design: a higher level conceptual abstraction with selected
specific data model to implement the data structure.
• It is particular DBMS independent and with no other physical
considerations.
c. Physical Design: physical implementation of the logical design of the
database with respect to internal storage and file structure of the database
for the selected DBMS.
• To develop all technology and organizational specification.

4. Implementation: the testing and deployment of the designed database for


use.

Fundamentals of Database Systems Organized By Melkam A.


12
5. Operation and Support: administering and maintaining the operation of
the database system and providing support to users. Tuning the database
operations for best performance.

Roles in Database Design and Use

As people are one of the components in DBMS environment, there are group of roles
played by different stakeholders of the designing and operation of a database system.

Database Administrator (DBA)


• Responsible to oversee, control and manage the database resources (the database
itself, the DBMS and other related software)
• Authorizing access to the database
• Coordinating and monitoring the use of the database
• Responsible for determining and acquiring hardware and software resources
• Accountable for problems like poor security, poor performance of the system
• Involves in all steps of database development
• We can have further classifications of this role in big organizations having huge
amount of data and user requirement.
a. Data Administrator (DA): is responsible on management of data resources.
This involves in database planning, development, maintenance of standards
policies and procedures at the conceptual and logical design phases.
b. Database Administrator (DBA): This is more technically oriented role. DBA
is responsible for the physical realization of the database. It is involved in
physical design, implementation, security and integrity control of the
database.

Database Designer (DBD)


✓ Identifies the data to be stored and choose the appropriate structures to represent
and store the data.
Fundamentals of Database Systems Organized By Melkam A.
13
✓ Should understand the user requirement and should choose how the user views the
database.
✓ Involve on the design phase before the implementation of the database system.
✓ We have two distinctions of database designers, one involving in the logical and
conceptual design and another involving in physical design.

Application Programmer and Systems Analyst


System analyst determines the user requirement and how the user wants to view the
database.
• The application programmer implements these specifications as programs; code,
test, debug, document and maintain the application program.
• The application programmer determines the interface on how to retrieve, insert,
update and delete data in the database.
• The application could use any high level programming language according to the
availability, the facility and the required service.

End Users
Workers, whose job requires accessing the database frequently for various purposes,
there are different group of users in this category.
• Naïve Users:
a. Sizable proportion of users
b. Unaware of the DBMS
c. Only access the database based on their access level and demand
d. Use standard and pre-specified types of queries.
• Sophisticated Users
a. Users familiar with the structure of the Database and facilities of the
DBMS.
b. Have complex requirements
c. Have higher level queries
d. Are most of the time engineers, scientists, business analysts, etc
• Casual Users
a. Users who access the database occasionally.
Fundamentals of Database Systems Organized By Melkam A.
14
b. Need different information from the database each time.
c. Use sophisticated database queries to satisfy their needs.
d. Are most of the time middle to high level managers.

These users can be again classified as “Actors on the Scene” and “Workers behind the
Scene”.

Actors on the Scene:


➢ Data Administrator
➢ Database Administrator
➢ Database Designer
➢ End Users

Workers behind the scene


➢ DBMS designers and implementers: who design and implement different DBMS
software.
➢ Tool Developers: experts who develop software packages that facilitates
database system designing and use. Prototype, simulation, code generator
developers could be an example. Independent software vendors could also be
categorized in this group.
➢ Operators and Maintenance Personnel: system administrators who are
responsible for actually running and maintaining the hardware and software of
the database system and the information technology facilities.

Fundamentals of Database Systems Organized By Melkam A.


15
CHAPTER TWO

Database System Concepts and Architecture

Relational Data Model


The relational model uses a collection of tables to represent both data and the
relationships among those data.
A data model is a collection of concepts that can be used to describe the structure of a
database. A characteristic of the database approach is that it provides a level of data
abstraction, by hiding details of data storage that are not needed by most users.
The model provides the necessary means to achieve the abstraction.
The structure of a database is characterized by data types, relationships, and constraints
that hold for the data. Models also include a set of operations for specifying retrievals
and updates.
Data models are changing to include concepts to specify the behaviour of the database
application. This allows designers to specify a set of user defined operations that are
allowed.
The relational model is a combination of three components:-
1. Structural Part: The structural part defines the database as a collection of relations.
2. Integrity Part: The database integrity is maintained in the relational model using
primary and foreign keys.
3. Manipulative Part: The relational algebra and relational calculus are the tools used to
manipulate data in the database.
Thus relational model has a strong mathematical background.

Categories of Data Models


Data models can be categorized in multiple ways.
• High level (conceptual)) data models – provide concepts close to the way users
perceive the data.
• Physical (low level) data models – provide concepts that describe the details of
how data is stored in the computer. Describe how data is stored in files by

Fundamentals of Database Systems Organized By Melkam A.


16
representing record formats, record orderings and access paths. These concepts
are generally meant for the specialist, and not the end user.
• Representational (implementation) data models – provide concepts that may
be understood by the end user but not far removed from the way data is organized.
Conceptual data models use concepts such as entities, attributes and relationships.
• Entity – represents a real world object or concept
• Attribute - represents property of interest that describes an entity, such as name
or salary.
• Relationships – among two or more entities, represents an association among two
or more entities.
Representational data models are used most frequently in commercial DBMSs. They
include relational data models, and legacy models such as network and hierarchical
models.
The key features of relational data model are as follows:
❖ Each row in the table is called tuple.
❖ Each column in the table is called attribute.
❖ The intersection of row with the column will have data value.
❖ In relational model rows can be in any order.
❖ In relational model attributes can be in any order.
❖ By definition, all rows in a relation are distinct. No two rows can be exactly the
same.
❖ Relations must have a key. Keys can be a set of attributes.
❖ For each column of a table there is a set of possible values called its domain.
❖ The domain contains all possible values that can appear under that column.
❖ Domain is the set of valid values for an attribute.
❖ Degree of the relation is the number of attributes (columns) in the relation.
❖ Cardinality of the relation is the number of tuples (rows) in the relation.
Table and Relation
The general doubt that will rise when one reads the relational model is the difference
between table and relation.
For a table to be relation, the following rules holds good:

Fundamentals of Database Systems Organized By Melkam A.


17
The intersection row with the column should contain single value (atomic value).
All entries in a column are of same type.
Each column has a unique name (column order not significant).
No two rows are identical (row order not significant).
Example of Relational Model: in the table named movie
Movie Name Director Actor Actress
Titanic James Cameron Leonardo DiCapiro Kate Winslet
Authograph Cheran Cheran Gopika
Roja Maniratnam AravindSwamy Madubala
In the earlier relation:
• The degree of the relation (i.e., is the number of column in the relation) = 4.
• The cardinality of the relation (i.e., the number of rows in the relation) = 3.
Concept of Key
Key is an attribute or group of attributes, which is used to identify a row in a relation.
Key can be broadly classified into:
1. Super key: A super key is a subset of attributes of an entity-set that uniquely identifies
the entities. Super keys represent a constraint that prevents two entities from ever having
the same value for those attributes.
Adding zero or more attributes to candidate key generates super key.
A candidate key is a super key but vice versa is not true.
Example for super key: Imagine a table with the fields. < Name>, <Age>, <SSN> and
<Phone Extension>. This table has many possible super keys. Three of these are <SSN>,
<Phone Extension, Name> and <SSN, Name>.
2. Candidate Key: Candidate key is a minimal super key. A Candidate Key can be any
column or a combination of columns that can qualify as unique key in database. There
can be multiple Candidate Keys in one table. Each Candidate Key can qualify as Primary
Key.
For Example: STUD_NO in STUDENT relation.
The value of Candidate Key is unique and non-null for every tuple. There can be more
than one candidate key in a relation. For Example, STUD_NO as well as STUD_PHONE
both are candidate keys for relation STUDENT.

Fundamentals of Database Systems Organized By Melkam A.


18
3. Primary Key: The primary key is a designated candidate key. It is to be noted that the
primary key should not be null. There can be more than one candidate key in a relation
out of which one can be chosen as primary key.
For Example: STUD_NO as well as STUD_PHONE both are candidate keys for relation
STUDENT but STUD_NO can be chosen as primary key (only one out of many
candidate keys).
Example: Consider the employee relation, which is characterized by the attributes,
employee ID, employee name, employee age, employee experience, employee salary, etc.
In this employee relation:
✓ Super keys can be employee ID, employee name, employee age, employee
experience, etc.
✓ Candidate keys can be employee ID, employee name, employee age.
✓ Primary key is employee ID.
Note: If we declare a particular attribute as the primary key, then that attribute value
cannot be NULL. Also it has to be distinct.
4. Foreign key: is set of fields or attributes in one relation that is used to “refer” to a
tuple in another relation.
For Example: STUD_NO in STUDENT_COURSE is a foreign key to STUD_NO in
STUDENT relation.
It may be worth noting that unlike, Primary Key of any given relation, Foreign Key can
be NULL as well as may contain duplicate tuples i.e. it need not follow uniqueness
constraint.
For Example: STUD_NO in STUDENT_COURSE relation is not unique. It has been
repeated for the first and third tuple. However, the STUD_NO in STUDENT relation is a
primary key and it needs to be always unique and it cannot be null.
QQQ: read more about constraints

Schemas, Instances and Database State


The description of a database is called the database schema. The schema is specified
during database design, and is not expected to change frequently.

Fundamentals of Database Systems Organized By Melkam A.


19
Data models have conventions for displaying schemas as diagrams. A displayed schema
is called a schema diagram.
Each object in the schema is called a schema construct.
Schema diagrams display only some aspects of a schema, such as names and some
constraints.
The data in a database may change frequently, every time records are added or updated.
The data in the database at a given moment in time is called the database state or
snapshot.
Database Schema vs Database State
When a database is defined, the schema is specified to the DBMS. The database state at
this point is in the empty state, with no data.
The initial state of the database is when the database is first populated or loaded with the
initial data. Every time data is added/removed/updated, there is a new database state.
The DBMS is responsible for ensuring every state is a valid state, a state that satisfies the
structure and constraints specified in the schema.
The DBMS stores the descriptions of the schema constructs and constraints, called the
Meta data, in the DBMS catalogue.
The schema is called the intension, and the database state an extension of the schema.

Three Schema Architecture and Data Independence


Remember from the previous chapters, three of the main characteristics of database
systems, these are:
1. Insulation of programs and data
2. Support of multiple views
3. Use of a catalogue to store the database description (schema)
The three schema architecture helps to achieve these characteristics.

Fundamentals of Database Systems Organized By Melkam A.


20
EXTERNAL External External
VIEW View View
Mappings

CONCEPTUAL Conceptual Schema


LEVEL
Mappings
Internal Schema
INTERNAL
LEVEL

STORED DATABASE

Three Schema Architecture


The goal of the three schema architecture is to separate the user applications and the
physical database. The schemas can be defined at the following levels:
1. The internal (physical) level – has an internal schema which describes the
physical storage structure of the database. Uses a physical data model and
describes the complete details of data storage and access paths for the database.

2. The conceptual (logical) level – has a conceptual schema which describes the
structure of the database for users. It hides the details of the physical storage
structures, and concentrates on describing entities, data types, relationships, user
operations and constraints. Usually a representational data model is used to
describe the conceptual schema.
3. The External or View level – includes external schemas or user vies. Each
external schema describes the part of the database that a particular user group is
interested in and hides the rest of the database from that user group. Represented
using the representational data model.

Fundamentals of Database Systems Organized By Melkam A.


21
The three schema architecture is used to visualize the schema levels in a database. The
three schemas are only descriptions of data, the data only actually exists is at the physical
level.
Each user group refers only to its own external schema. The DBMS must transform a
request specified on an external schema into a request against the conceptual schema, and
then into a request on the internal schema for processing over the database. The process
of transforming requests and results between levels is called mapping.

Classification of DBMSs
1. Data Model Classification
o Relational data model
o Object data model
o Hierarchical data model
o Network data model and Object relational data model
2. Number of Users
o Single User systems
o Multi User systems
3. Number of Sites
o Centralized – data is stored at single site.
o Distributes – database and DBMS software stored over many sites connected
by network
o Homogeneous – use same DBMS software at multiple sites.
4. Cost
o Low-end systems under $3000
o High-end systems, over $100,000

CHAPTER THREE
Database Modelling

The ER-model is used to model the logical view of the system from data perspective
which consists of components:

Fundamentals of Database Systems Organized By Melkam A.


22
• Entity
• Entity set and
• Entity type
The E/R model translates analyzed information into data requirements, and is used to
facilitate communications between the database architect and the future users of the new
system.
The E/R data model views the real world as a set of basic objects (entities) and
relationships among these objects. This represents the overall logical structure of the
database. ER diagram is a graphical modeling tool to standardize ER modeling. The
modeling can be carried out with the help of pictorial representation of entities, attributes,
and relationships. The three basic notions (building blocks) or elements of the E/R model
are:
1. Entity: represents existing real-world objects or concepts, such as places, objects,
events, persons, orders, customers, and so on.
An entity is an object that exists and is distinguishable from other objects. In other words,
the entity can be uniquely identified.
An entity might be:
✓ An object with physical existence (e.g., a lecturer, a student, a car)
✓ An object with conceptual existence (e.g., a course, a job, a position)
2. Relationship: represents associations between objects. A relationship is an association
of entities where the association includes one entity from each participating entity type
whereas relationship type is a meaningful association between entity types.
The examples of relationship types are:
✓ Teaches is the relationship type between LECTURER and STUDENT.
✓ Buying is the relationship between VENDOR and CUSTOMER.
✓ Treatment is the relationship between DOCTOR and PATIENT.
3. Attribute: describes the entity, such as the invoice date or the customer first name.
Attributes are properties of entity types. In other words, entities are described in a
database by a set of attributes.
The following are example of attributes:
✓ Brand, cost, and weight are the attributes of CELLPHONE.

Fundamentals of Database Systems Organized By Melkam A.


23
✓ Roll number, name, and grade are the attributes of STUDENT.

Design principles of ER diagram


ER-DIAGRAM NOTATION FOR ER SCHEMAS

Entity sets/types
Entities with the same basic attributes are grouped or typed into an entity type. For
example, the entity type EMPLOYEE and PROJECT. An entity may belong to more
than one entity type.
For example, a staff working in a particular department can pursue higher education as
part-time.
Hence the same person is a LECTURER at one instance and STUDENT at another
instance.

Fundamentals of Database Systems Organized By Melkam A.


24
Classification of entity sets

Strong Entity:A strong (independent) entity is one that does not rely on other entities for
identification.Strong entity is one whose existence does not depend on other entity.
Example: Consider the example, student takes course. Here student is a strong entity.

In this example, course is considered as weak entity because, if there are no students to
take a particular course, then that course cannot be offered.The COURSE entity depends
on the STUDENT entity.
Weak (dependent) Entity:Weak entity is one whose existence depends on other entity.It
is one that relies on other entities for identification.In many cases, weak entity does
not have primary key.
Example: Consider the example, customer borrows loan. Here loan is a weak entity.
For every loan, there should be at least one customer. Here the entity loan depends on the
entity customer hence loan is a weak entity.
Associative Entity Type:A weak entity type that depends on two or more entity types for
its primary key.
An individual occurrence of an entity set is also known as an instance (object).

Fundamentals of Database Systems Organized By Melkam A.


25
Key attributes
An attribute of an entity type for which each entity must have a unique value is called key
attribute of the entity type. Example: SSN of an employee.
Key attribute may be composite. Example: Vehicle Tag Number is a key of the CAR
entity type with components (Number, State).
Attribute classification
Attributes are descriptive properties that are associated with an entity. A set of attributes
describe an entity. A particular instance of an attribute is called a value.
For example: “Employee Id” and “Name” are the attributes of the “EMPLOYEES” entity
set; and “Kevin Jones” is one value of the attribute “Name”.
This attribute can be broadly classified as:
1) Based on value:
Single Value Attribute: Single value attribute means, there is only one value associated
with that attribute.
Example: The examples of single value attribute are age of a person, Roll number of the
student, Registration number of a car, “Name” and “Gender” of the “EMPLOYEES”
entity set.
Representation of Single Value Attribute in ER Diagram:
Multi valued Attribute: More than one value will be associated with that attribute.
Representation of Multi valued Attribute in ER Diagram:
Example: Hence food items associated with the entity HOTEL is an example of multi
valued attribute.
Derived Attribute: The value of the derived attribute can be derived from the values of
other related attributes or entities. Derived Attributes are attributes that can be calculated
from the related stored attributes, entities or general states.

In ER diagram, the derived attribute is represented by:

Example: 1. Age of a person can be derived from the date of birth of the person. In this
example, age is the derived attribute.

Fundamentals of Database Systems Organized By Melkam A.


26
2. Experience of an employee in an organization can be derived from date of joining of
the employee.
3. CGPA of a student can be derived from GPA (Grade Point Average).
Stored Attributes: Stored Attributes on the other hand are attributes that cannot be
calculated in any way from the stored attributes.
Example: “Birth Date” of the “EMPLOYEES” entity set is a stored attribute, where as
“Age” is a derived attribute that can be calculated from the “Birth Date” and “Current
Date”.
Null Value Attribute: In some cases, a particular entity may not have any applicable
value for an attribute. For such situation, a special value called null value is created. Null
value situations:- Not applicable and Not known
Example: In application forms, there is one column called phone no. if a person do not
have phone then a null value is entered in that column.
2) Based on structure
Simple Attributes: Simple Attributes are attributes also known as Atomic Attributes that
cannot be divided into subparts mainly of primitive types.
Example: “Age” and “Gender” of the “EMPLOYEES” entity set.
Composite Attributes: Composite Attributes are attributes that are composed of smaller
subparts that can be subdivided into the subparts (Attributes).
Example: “Address” of the “EMPLOYEES” entity set that can be divided into “City”,
“Home Address”, “Phone”, and “P.O. Box”.
Attributes can be also classified as identifiers or descriptors.
Identifiers: more commonly called keys, uniquely identify an instance of an entity.
Example: “Employee Id” uniquely identifies an employee entity from the entity set.
Descriptor: describes a non-unique characteristic of an entity instance.
Example: “Name” is a descriptor for the “EMPLOYEES” entity set.
Relationship Sets
A Relationship represents an association between two or more entities.
An example of a relationship would be:
- “EMPLOYEES” are assigned to “TEAMS”
- “CUSTOMERS” Owns “PROJECTS”

Fundamentals of Database Systems Organized By Melkam A.


27
- “TEAMS” works on “PROJECTS”
A Relationship Set is then a set consisting same types of relationships.
The entities involved in the relationship are known as participating entities and the
function the entity plays in a relationship is called the entity’s role.
Example: In the Assigned relationship “EMPLOYEES” and “TEAMS” entity sets are
the participating entity sets; and the “EMPLOYEES” entity has a role as a
“Programmer” or “Team Leader” in the relationship.
Relationships are classified in terms of degree, connectivity, cardinality, and existence.
1. Degree: The degree of a relationship is the number of entities associated with the
relationship.
• The n-ary (multi-way) relationship is the general form for degree n.
• Special cases are the binary, and ternary, where the degree is 2, and 3,
respectively
1.1 Unary Relationship: The unary relationship is otherwise known as recursive
relationship. In the unary relationship the number of associated entity is one. An entity
related to itself is known as recursive relationship.
Example: Player……captain of, Employee…..manager of…
Roles and Recursive Relation
When an entity sets appear in more than one relationship, it is useful to add labels to
connecting lines. These labels are called as roles.

The relationship ‘Represents’ is a one-to-many Unary relationship. It connects with


only one table (relation/entity set).

Fundamentals of Database Systems Organized By Melkam A.


28
1.2. Binary Relationship: In a binary relationship, two entities are involved. Consider
the example; each staff will be assigned to a particular department. Here the two entities
are STAFF and DEPARTMENT.

The relationship ‘Registers’ is a many-to-many Binary relationship. It connects


(associates) two entity sets, Student and Courses.
1.3. Ternary Relationship: In a ternary relationship, three entities are simultaneously
involved. Ternary relationships are required when binary relationships are not sufficient
to accurately describe the semantics of an association among three entities.
Example: Consider the example of employee assigned a project. Here we are considering
three entities EMPLOYEE, PROJECT, and LOCATION.
The relationship is “assigned-to.” Many employees will be assigned to one project hence
it is an example of one-to-many relationship.

The relationship ‘STC’ is a


many-t-many Ternary relationship. It links the entity sets Student, Courses and
Teacher.

1.4. Quaternary Relationships: Quaternary relationships involve four entities.

Fundamentals of Database Systems Organized By Melkam A.


29
The example of quaternary relationship is “A professor teaches a course to students using
slides.” Here the four entities are PROFESSOR, SLIDES, COURSE, and STUDENT.
The relationships between the entities are “Teaches.”
2. Connectivity: The connectivity of a relationship describes the mapping of associated
entity instances in the relationship. The values of connectivity are “one” or “many”.
3. Cardinality: The cardinality of a relationship is the actual number of related
occurrences for each of the two entities. The basic types of cardinality for relations are:
one-to-one, one-to-many, and many-to-many.
4. Existence: denotes whether the existence of an entity instance is dependent upon the
existence of another, related, entity instance. describes whether an entity in a relationship
is optional or mandatory. Analyze your business rules to identify whether an entity must
exist in a relationship.
For example, your business rules might dictate that an address must be associated with a
name.
Such an association indicates a mandatory existence dependency for the relationship
between the name and address entities.
An example of an optional existence dependency can be a business rule that says a person
might or might not have children.
Enhanced ER Diagram
Enhanced entity-relationship models, also known as extended entity-relationship models,
are advanced database diagrams very similar to regular ER diagrams.
Enhanced ERDs are high-level models that represent the requirements and complexities
of complex databases.In addition to the same concepts that ordinary ER diagrams
encompass, EERDs include:
➢ Subtypes and super types (sometimes known as subclasses and super classes)
➢ Specialization and generalization
➢ Category or union type
➢ Attribute and relationship inheritance
Features of EER Model
EER creates a design more accurate to database schemas.
It reflects the data properties and constraints more precisely.

Fundamentals of Database Systems Organized By Melkam A.


30
It includes all modeling concepts of the ER model.
Diagrammatic technique helps for displaying the EER schema.
It includes the concept of specialization and generalization.
It is used to represent a collection of objects that is union of objects of different of
different entity types.
A. Sub Class and Super Class
Sub class and Super class relationship leads the concept of Inheritance.The relationship

between sub class and super class is denoted with symbol.


1. Super Class: Super class is an entity type that has a relationship with one or
more subtypes.
An entity cannot exist in database merely by being member of any super class.
For example: Shape super class is having sub groups as Square, Circle, Triangle.
2. Sub Class : Sub class is a group of entities with unique attributes.Sub class
inherits properties and attributes from its super class.
For example: Square, Circle, Triangle are the sub class of Shape super class.

B. Specialization and Generalization

1. Generalization: is the process of generalizing the entities which contain the


properties of all the generalized entities.
Fundamentals of Database Systems Organized By Melkam A.
31
It is a bottom approach, in which two lower level entities combine to form a higher level
entity.Generalization is the reverse process of Specialization.It defines a general entity
type from a set of specialized entity type.It minimizes the difference between the entities
by identifying the common features.
For example:

2. Specialization: is a process that defines a group entities which is divided into sub
groups based on their characteristic.
It is a top down approach, in which one higher entity can be broken down into two lower
level entity.It maximizes the difference between the members of an entity by identifying
the unique characteristic or attributes of each member.
It defines one or more sub class for the super class and also forms the superclass/subclass
relationship.
For example

C. Category or Union
Category represents a single super class or sub class relationship with more than one
super class.It can be a total or partial participation.
Fundamentals of Database Systems Organized By Melkam A.
32
For example: Car booking, Car owner can be a person, a bank (holds a possession on a
Car) or a company. Category (sub class) → Owner is a subset of the union of the three
super classes → Company, Bank, and Person.
A Category member must exist in at least one of its super classes.

D. Aggregation
Aggregation is a process that represent a relationship between a whole object and its
component parts.It abstracts a relationship between objects and viewing the relationship
as an object.It is a process when two entity is treated as a single entity.

In the above example, the relation between College and Course is acting as an Entity in
Relation with Student

Fundamentals of Database Systems Organized By Melkam A.


33
CHAPTER FOUR
Enhanced Entity Relationship and Object Modelling
EER stands for Enhanced ER or Extended ER. The additional EER concepts are used to
model applications more completely and more accurately. EER includes some object-
oriented concepts, such as inheritance.
Sub class and super class: An entity type may have additional meaningful
subgroupings of its entities.
Example: EMPLOYEE may be further grouped into:
Based on the EMPLOYEE’s Job: SECRETARY, ENGINEER, TECHNICIAN,
Based on EMPLOYEE’s position: MANAGER
Based on the EMPLOYEE’s method of pay: SALARIED_EMPLOYEE,
HOURLY_EMPLOYEE
EER diagrams extend ER diagrams to represent these additional subgroupings, called
subclasses or subtypes.

EMPLOYEE is the superclass for each of these subclasses. These are called
superclass/subclass relationships:

Fundamentals of Database Systems Organized By Melkam A.


34
Employee/secretary
Employee/technician
Employee/manager
These are also called IS-A relationships
SECRETARY IS-A EMPLOYEE, TECHNICIAN IS-A EMPLOYEE,
Constraints on specialization and generalization
If we can determine exactly those entities that will become members of each subclass by
a condition, the subclasses are called predicate-defined (or condition-defined) subclasses
Condition is a constraint that determines subclass members
If all subclasses in a specialization have membership condition on same attribute of the
superclass, specialization is called an attribute-defined specialization
Attribute is called the defining attribute of the specialization
Example: Job Type is the defining attribute of the specialization {SECRETARY,
TECHNICIAN, and ENGINEER} of EMPLOYEE
If no condition determines membership, the subclass is called user-defined
Membership in a subclass is determined by the database users by applying an operation to
add an entity to the subclass.
Membership in the subclass is specified individually for each entity in the superclass by
the user.
Classification of Constraints
Disjointness Constraint: Specifies that the subclasses of the specialization must be
disjoint: an entity can be a member of at most one of the subclasses of the specialization.
Specified by d in EER diagram.
If not disjoint, specialization is overlapping: that is the same entity may be a member
of more than one subclass of the specialization. Specified by o in EER diagram.
Completeness Constraint: Total specifies that every entity in the superclass must be a
member of some subclass in the specialization/generalization
Shown in EER diagrams by a double line.
Partial : allows an entity not to belong to any of the subclasses.
Shown in EER diagrams by a single line.

Fundamentals of Database Systems Organized By Melkam A.


35
Note: Generalization usually is total because the superclass is derived from the
subclasses.

CHAPTER FIVE
Functional dependency and normalization
In this chapter, you will learn:
✓ What normalization is and what role it plays in the database design process
✓ About the normal forms 1NF, 2NF, 3NF
✓ How normal forms can be transformed from lower normal forms to higher normal
forms
✓ How normalization and ER modeling are used concurrently to produce a good
database design

Fundamentals of Database Systems Organized By Melkam A.


36
When we come to the concept of normalization first we can see functional dependency.
Because it is the most important concept in relational schema design theory is that of a
functional dependency. In this section we formally define the concept.
A functional dependency is a constraint between two sets of attributes from the
database. Definition. A functional dependency, denoted by X → Y, between two sets of
attributes X and Y that are subsets of R specifies a constraint on the possible tuples that
can form a relation state r of R. The constraint is that, for any two tuples t1 and t2 in r
that have t1[X] = t2[X], they must also have t1[Y] = t2[Y]. This means that the values of
the Y component of a tuple in r depend on, or are determined by, the values of the X
component; alternatively, the values of the X component of a tuple uniquely (or
functionally) determine the values of the Y component. We also say that there is a
functional dependency from X to Y, or that Y is functionally dependent on X. The
abbreviation for functional dependency is FD or f.d. The set of attributes X is called the
left-hand side of the FD, and Y is called the right-hand side. Thus, X functionally
determines Y in a relation schema R if, and only if, whenever two tuples of r(R) agree on
their X-value, they must necessarily agree on their Y-value. Note the following:
1. If a constraint on R states that there cannot be more than one tuple with a given X-
value in any relation instance r(R)—that is, X is a candidate key of R—this implies that
X → Y for any subset of attributes Y of R (because the key constraint implies that no two
tuples in any legal state r(R) will have the same value of X). If X is a candidate key of R,
then X → R.
2. If X → Y in R, this does not say whether or not Y → X in R.
Normalization is filtering or purification process to make the design have successively
better quality.it is also a process of organizing the data in database to avoid data
redundancy, insertion anomaly, update anomaly & deletion anomaly. Let’s discuss about
anomalies first then we will discuss normal forms with examples.
Anomalies in DBMS
There are three types of anomalies that occur when the database is not normalized. These
are – Insertion, update and deletion anomaly. Let’s take an example to understand this.

Fundamentals of Database Systems Organized By Melkam A.


37
Emp_id Emp_name Emp_address Emp_dep

101 Rick Delhi D001

101 Rick Delhi D002

123 Maggie Agra D890

166 Glenn Chennai D900

166 Glenn Chennai D004

The above table is not normalized. We will see the problems that we face when a table is
not normalized.
Update anomaly: In the above table we have two rows for employee Rick as he belongs
to two departments of the company. If we want to update the address of Rick then we
have to update the same in two rows or the data will become inconsistent. If somehow,
the correct address gets updated in one department but not in other then as per the
database, Rick would be having two different addresses, which is not correct and would
lead to inconsistent data.
Insert anomaly: Suppose a new employee joins the company, who is under training and
currently not assigned to any department then we would not be able to insert the data into
the table if Emp_dep field doesn’t allow nulls.
Delete anomaly: Suppose, if at a point of time the company closes the department D890
then deleting the rows that are having Emp_dep as D890 would also delete the
information of employee Maggie since she is assigned only to this department.
To overcome these anomalies we need to normalize the data. In the next section we will
discuss about normalization.
Database tables and Normalization
Normalization is Process for evaluating and correcting table structures to minimize data
redundancies. It works through a series of stages called normal forms:
✓ First normal form (1NF)
✓ Second normal form (2NF)
✓ Third normal form (3NF)

Fundamentals of Database Systems Organized By Melkam A.


38
2NF is better than 1NF; 3NF is better than 2NF.For most business database design
purposes, 3NF is as high as we need to go in normalization process. Highest level of
normalization is not always most desirable.
The need for normalization
✓ To reduce the amount of storage needed to store data.
✓ To avoid unnecessary data.
Normalization process
✓ Each table represents a single subject
✓ No data item will be unnecessarily stored in more than one table
✓ All attributes in a table are dependent on the primary key
1NF: It states that the domain of an attribute must include only atomic (simple,
indivisible) values and that the value of any attribute in a tuple must be a single value
from the domain of that attribute. Hence, 1NF disallows having a set of values, a tuple of
values, or a combination of both as an attribute value for a single tuple. In other words,
1NF disallows relations within relations or relations as attribute values within tuples. The
only attribute values permitted by 1NF are single atomic (or indivisible) values.
We have three steps in 1NF:
Step1: Eliminate the Repeating Groups.
Example if there is a relation named as ins-info (instructor information):
Ins-name c-code

Prof x ITec 01

ITec 02

Prof y ITec 03

First normal form says each cell of a table should contain exactly one value. But in the
above table there exist two values (ITec 01 and ITec 02) in one column.so it is better to
write them in separate columns.

Fundamentals of Database Systems Organized By Melkam A.


39
Ins-name c-code

Prof x ITec 01

Prof x ITec 02

Prof y ITec 03

There is no redundant values here.


Step 2: Identify the Primary Key. Primary key must uniquely identify attribute value.
Step 3: Identify All Dependencies if exist.
Example: first normal form dependency diagram.

First normal form describes tabular format in which:


✓ All key attributes are defined
✓ There are no repeating groups in the table
✓ All attributes are dependent on primary key
• First normal form describes tabular format in which:
✓ All key attributes are defined
✓ There are no repeating groups in the table
✓ All attributes are dependent on primary key
If a relation schema has more than one key, each is called a candidate key. One of the
candidate keys is arbitrarily designated to be the primary key, and the others are called
secondary keys. In a practical relational database, each relation schema must have a
Fundamentals of Database Systems Organized By Melkam A.
40
primary key. If no candidate key is known for a relation, the entire relation can be treated
as a default super key. In Figure 5.1, {Ssn} is the only candidate key for EMPLOYEE, so
it is also the primary key. Definition. An attribute of relation schema R is called a prime
attribute of R if it is a member of some candidate key of R. An attribute is called
nonprime if it is not a prime attribute—that is, if it is not a member of any candidate key.
In Figure 5.1, both Ssn and Pnumber are prime attributes of WORKS_ON, whereas other
attributes of WORKS_ON are nonprime.

Figure 5.1. Simplified company relational database schema.


2NF: Second normal form (2NF) is based on the concept of full functional dependency.
A functional dependency X → Y is a full functional dependency if removal of any
attribute A from X means that the dependency does not hold anymore; that is, for any
attribute A ε X, (X − {A}) does not functionally determine Y. A functional dependency
X → Y is a partial dependency if some attribute A ε X can be removed from X and the
dependency still holds; that is, for some A ε X, (X − {A}) → Y.
A relation schema R is in 2NF if:
Fundamentals of Database Systems Organized By Melkam A.
41
1. It satisfies 1NF.
2. Every nonprime attribute A in R is fully functionally dependent on the
primary key of R.
Example: In the relation Emp-proj
Ssn Pnumber hours Ename Pname Plocation

{Ssn, Pnumber} → Hours is a full dependency (neither Ssn → Hours nor


Pnumber → Hours holds).
However, the dependency {Ssn, Pnumber} → Ename is partial because Ssn →
Ename a holds. So the above table is in 1NF but not in 2NF because of partial
dependency.

3NF: Third normal form (3NF) is based on the concept of transitive dependency. A
functional dependency X → Y in a relation schema R is a transitive dependency if there
exists a set of attributes Z in R that is neither a candidate key nor a subset of any key of
R, and both X → Z and Z → Y hold.
According to Codd’s original definition, a relation schema R is in 3NF if:
It satisfies 2NF
No nonprime attribute of R is transitively dependent on the primary key.
Example: In the relation Emp_dep: The dependency Ssn → Dmgr_ssn is transitive
through Dnumber in EMP_DEP because both the dependencies Ssn → Dnumber and
Dnumber → Dmgr_ssn hold and Dnumber is neither a key itself nor a subset of the key
of EMP_DEPT. Intuitively, we can see that the dependency of Dmgr_ssn on Dnumber is
undesirable in EMP_DEPT since Dnumber is not a key of EMP_DEPT.
Fundamentals of Database Systems Organized By Melkam A.
42
The relation schema EMP_DEP is in 2NF, since no partial dependencies on a key exist.
However, EMP_DEPT is not in 3NF because of the transitive dependency of Dmgr_ssn
(and also Dname) on Ssn via Dnumber. We can normalize EMP_DEPT by decomposing
it into the two 3NF relation schemas ED1 and ED2 shown above. Intuitively, we see that
ED1 and ED2 represent independent facts about employees and departments, both of
which are entities in their own right. A NATURAL JOIN operation on ED1 and ED2 will
recover the original relation EMP_DEPT without generating spurious tuples.
Summery About Normal Forms
Normal Test Remedy(normalization)
form

First (1NF) Relation should have no Form new relations for each
multivalued attributes or nested multivalued attribute or nested
relations relation.

Second For relations where primary key Decompose and set up a new relation
(2NF) contains multiple attributes, no for each partial key with its dependent
nonkey attribute should be attribute(s). Make sure to keep a
functionally dependent on a part relation with the original primary key
of the primary key and any attributes that are fully
functionally dependent on it.

Fundamentals of Database Systems Organized By Melkam A.


43
Third Relation should not have nonkey Decompose and set up a relation that
(3NF) attribute functionally determined includes the nonkey attribute(s) that
by another nonkey attribute (or by functionally determine(s) other nonkey
a set of nonkey attributes). That attribute(s).
is, there should be no transitive
dependency of a nonkey attribute
on the primary key.

CHAPTER SIX
Relational Algebra and Structured Query Language
Agenda
✓ Relational algebra
✓ Relational calculus (reading Assignment)
✓ Structural Query Language (SQL)

Relational algebra
Relational Algebra is a procedural query language that consists of a set of operations that
take one or two relations as input and produce a new relation as a result/output.

The relational algebra is a theoretical language with operations that work on one or more
relations to define another relation without changing the original relation.

Relational algebra operations are performed recursively on a relation. The output of these
operations is a new relation, which might be formed from one or more input relations.

The algebra operations enable a user to retrieve specific request on a relational model.
The operations that produce a new relation can be further manipulated using operations
of the relation algebra. The sequence of the relational algebra that produces new relation
forms a relational algebra expression.
Knowledge about relational algebra allows us to understand query execution and
optimization in relational database management system.
Fundamental Operations of Relational Algebra

Fundamentals of Database Systems Organized By Melkam A.


44
Operations in relational algebra can be broadly classified into set operation and
database operations. The core relational algebra that has traditionally been thought of as
the relational algebra consists of the Fundamental operations that can be grouped into two
based on the number of relation operands of the operator.
Unary operation involves one operand, whereas binary operation involves two
operands.
Unary operation operate on one relation
Binary operation operate on more than one relation
1. Unary Operators.
• Selection (σ): Selects a subset of rows from relation.
• Projection (Π): Deletes unwanted columns from relation.
• Rename (ρ): Rename operator is used to give another name to a relation.
2. Binary Operators.
Product (Cartesian product): Allows us to combine two relations.
• Union (U): Tuples in reln. 1 and in reln. 2.
• Set Difference (–): Tuples in reln. 1, but not in reln. 2.
• Join operations: (⋈) A Join operation combines related tuples from
different relations, if and only if a given join condition is satisfied.
Unary Operations
A. Select Operation: The select operation selects a subset of tuples from a relation
instance that satisfies a given predicate (condition).
It is denoted by: σ C (R)
Where σ represents the SELECT operator, C is a Boolean expression of the select
condition, and R is the relation or relational algebra expression.
Example1: From the “EMPLOYEES” relation to extract Senior Managers, selection
operation can be written as:
Employees (Emp_id, Dno, Name, BDate, Age, Gender, Position, Salary)
σ Position="Senior Manger" (Employees)
σ Salary>=3000(Employees)
Example 2: 𝜎 (Dno=4 AND salary>2500) OR (Dno=5 AND salary>30000)
(EMPLOYEE)

Fundamentals of Database Systems Organized By Melkam A.


45
B. Projection Operation: While the select operation is picking certain rows from a
relation, projection operation forms a new relation by picking certain columns in the
relation.
It is denoted by: Π A (R)
Where Π represents the PROJECT operator and A is a set of attributes in the relation R.
Example: To extract Employees Name and Position only from the “EMPLOYEES”
relation:
Π Name, Position (Employees)
Π Name, Position( σ Salary>=3000 (Employees))
C. Rename Operation: Unlike relations in the relational model the new relations driven
from the relational algebra expression do not have name that will allow us to refer to
them in other expressions. The renaming operator can be used to explicitly rename
resulting relations of an expression.
It is denoted by: ρS (A1, A2, An) (R)
Where ρ represents the RENAME operator and S is a name for the new relation and A1,
A2 … An are new names for the attributes in the relation R.
After the renaming the name of the relation and the attributes can be used as ordinary
relation and attributes in a sequence of relational algebra expressions:
Syntax: ρ (relation2, relation1), relation2 is the new name and relation1 is the old
name.
To rename STUDENT relation to STUDENT1, we can use rename operator like:
ρ (STUDENT1, STUDENT)
If you want to create a relation STUDENT_NAMES with ROLL_NO and NAME
from STUDENT, it can be done using rename Operator as:
ρ (STUDENT_NAMES, ∏(ROLL_NO, NAME)(STUDENT))
Rename (ρ) Example: Let’s say we have a table customer, we are fetching
customer_name and we are renaming the resulted relation to CUST_NAMES.

Fundamentals of Database Systems Organized By Melkam A.


46
Binary Operations
Cartesian product Operation (X): The Cartesian product operation (also known as
Cross Product or Cross Join or Product) is binary set operation that generates a new
relation from two relation in a combinatorial fashion.
It is denoted by: R Χ S
Where X represents the PRODUCT operator and R and S are relations to be joined.
The product operation is just like the product operation in set theory that maps each tuple
in relation with every tuple in S.
This type of operation is helpful to merge columns from two relations. Generally, a
Cartesian product is never a meaningful operation when it performs alone. However, it
becomes meaningful when it is followed by other operations.
Example: Consider the following tables.

Table A Table B

column 1 column 2 column 1 column 2

1 1 1 1

1 2 1 3

The Cartesian product of: σ column 2 = '1' (A X B)


Fundamentals of Database Systems Organized By Melkam A.
47
Output – The above example shows all rows from relation A and B whose column 2 has
value 1

σ column 2 = '1' (A X B)

column 1 column 2

1 1

1 1

Union Operation (U)


The union operation on R and S denoted by R U S results a relation that includes all
tuples either in R or in S or in both.
It is denoted by: R U S
For a union operation to be valid, the following conditions must hold -

• R and S must be the same number of attributes.


• Attribute domains need to be compatible.
• Duplicate tuples should be automatically removed.

Example: Consider the following tables.

Table A Table B

column 1 column 2 column 1 column 2

1 1 1 1

1 2 1 3

A ∪ B gives

Fundamentals of Database Systems Organized By Melkam A.


48
Table A ∪ B

column 1 column 2

1 1

1 2

1 3

Intersection Operation (∩): The intersection operation on R and S denoted by R ∩S


results a relation that includes all tuples in both R and S.

Example: from the above two relations A and B :

A ∩ B

Table A ∩ B

column 1 column 2

1 1

Set Difference Operation (-): The result of the set difference operation on R and S
denoted by R − S is the set of elements in R but not in S.

Fundamentals of Database Systems Organized By Melkam A.


49
Example: from the above two tables:

A-B

Table A – B

column 1 column 2

1 2

Note: For the set operations (Union, Intersection, and Set difference) the two relational
operands R and S must have same type of tuples, this condition is known as Union
Compatibility.

Join Operations: Join operation combines two relations to form a new relation. The
tables should be joined based on a common column. The common column should be
compatible in terms of domain.

Types of Join Operation

Various forms of join operation are:

Inner Joins:

• Theta join (read based on your own)


• EQUI join
• Natural join

Outer join:

• Left Outer Join


• Right Outer Join
• Full Outer Join

1. Natural Join: The natural join performs an equi join of the two relations R and S over
all common attributes. One occurrence of each common attribute is eliminated from the
result. In other words a natural join will remove duplicate attribute.

Fundamentals of Database Systems Organized By Melkam A.


50
In most systems a natural join will require that the attributes have the same name to
identity the attributes to be used in the join. This may require a renaming mechanism.
Even if the attributes do not have same name, we can perform the natural join provided
that the attributes should be of same domain.

Input: Two relations (tables) R and S

Notation: R S

Purpose: Relate rows from second table and

– Enforce equality on all column attributes

– Eliminate one copy of common attribute

Example of Natural Join Operation

• Consider two relations EMPLOYEE and DEPARTMENT.


• Let the common attribute to the two relations be DEPTNUMBER.
• The two relations are shown later:

Fundamentals of Database Systems Organized By Melkam A.


51
2. Equi Join: When Theta join uses only equality comparison operator, it is said to be
equijoin

Example of Equi Join: Given the two relations STAFF and DEPT, produce a list of staff
and the departments they work in.

Fundamentals of Database Systems Organized By Melkam A.


52
4. Outer Join: In outer join, matched pairs are retained unmatched values in other tables
are left null.

Types of Outer Join

Consider two relations R and S

1. Left Outer Join: Left outer joins is a join in which tuples from R that don’t have
matching values in the common column of S are also included in the result relation.

2. Right Outer Join: Right outer join is a join in which tuples from S that do not have
matching values in the common column of R are also included in the result relation.

3. Full Outer Join: Full outer join is a join in which tuples from R that do not have
matching values in the common columns of S still appear and tuples in S that do not have
matching values in the common columns of R still appear in the resulting relation.

Example of Full Outer Left Outer and Right Outer Join

Consider two relations PEOPLE and MENU determine the full outer, left outer, and right
outer join.

Fundamentals of Database Systems Organized By Melkam A.


53
Fundamentals of Database Systems Organized By Melkam A.
54
Fundamentals of Database Systems Organized By Melkam A.
55
1. The left outer join of PEOPLE and MENU on Food is represented as

PEOPLE PEOPLE. Food=MENU. Food MENU.

1. The result of the left outer join is shown above

From this table, it is to be noted that all the tuples from the left table (in our case it is
PEOPLE relation) appears in the result.

If there is any unmatched value then a NULL value is returned.

2. The right outer join of PEOPLE and MENU on Food is represented in the relational

algebra as PEOPLE PEOPLE. Food= Menu.Food MENU.

The result of the right outer join is also shown in the previous page.

Fundamentals of Database Systems Organized By Melkam A.


56
From this table, it is clear that all tuples from the right-hand side relation (in our case the
right hand relation is MENU) appears in the result.

3. The full outer join of PEOPLE and MENU on Food is represented in the relational

algebra as PEOPLE PEOPLE. Food=MENU. Food MENU.

The result of the full outer join is shown above.

From this table, it is clear that tuples from both the PEOPLE and the MENU relation
appears in the result.

Advantages of Relational Algebra

• The relational algebra has solid mathematical background.


• The mathematical background of relational algebra is the basis of many
interesting developments and theorems.
• If we have two expressions for the same operation and if the expressions are
proved to be equivalent, then a query optimizer can automatically substitute the
more efficient form.
• Moreover, the relational algebra is a high level language which talks in terms of
properties of sets of tuples and not in terms of for-loops.

Limitations of Relational Algebra


Fundamentals of Database Systems Organized By Melkam A.
57
• The relational algebra cannot do arithmetic. For example, if we want to know the
price of 10 l of petrol, by assuming a 10% increase in the price of the petrol,
which cannot be done using relational algebra.
• The relational algebra cannot sort or print results in various formats. For example
we want to arrange the product name in the increasing order of their price.
• It cannot be done using relational algebra.
• Relational algebra cannot perform aggregates.
• For example we want to know how many staff are working in a particular
department.
• This query cannot be performed using relational algebra.
• The relational algebra cannot modify the database. For example we want to
increase the salary of all employees by 10%.
• This cannot be done using relational algebra.
• The relational algebra cannot compute “transitive closure.” In order to understand
the term transitive closure consider the relation RELATIONSHIP, which
describes the relationship between persons.

Chapter 6 (cont.…)

Structured Query Language (SQL)

Structured Query Language (SQL) is a query language that is standardized by the


American National Standards Institute (ANSI) for most commercial relational database
management systems (RDBMS).

Ideally, a database language should allow a user to create the database and relation
structures; it should allow a user to perform basic data management tasks, such as the
insertion, modification and deletion of data from the relations; and it should allow a user
to perform both simple and complex queries to transform the raw data into information.

In addition, a database language must perform these tasks with minimal user effort, and
its command structure and syntax must be relatively easy to learn.

Fundamentals of Database Systems Organized By Melkam A.


58
SQL is an example of a transform-oriented language, or a language designed to use
relations (tables) to transform inputs into required outputs.

As a language SQL has two components:

❖ A Data Definition Language (DDL) for defining the database structure, and
❖ A Data Manipulation Language (DML) for retrieving and updating data.

SQL contains only these definitional and manipulative commands; it does not contain
flow control commands. In other words, there are no IF..THEN..ELSE, GO TO, DO ...
WHILE or other commands to provide a flow of control.

1. Data Definition

The SQL data definition language allows us to create or destroy database objects
(schemas, domains, tables, views and indexes).

The main SQL data definition language statements are:

CREATE SCHEMA

DROP SCHEMA

CREATE DOMAIN

ALTER DOMAIN

DROP DOMAIN

CREATE TABLE

ALTER TABLE

DROP TABLE

CREATE VIEW

Fundamentals of Database Systems Organized By Melkam A.


59
DROP VIEW

The SCHEMA is a collection of database objects that are in some way related to one
another. (All objects in a database are described in one schema or another).

The objects in a schema can be tables, views, domains, character sets, assertions (rules),
etc.

At present CREATE and DROP SCHEMA are not yet widely implemented.

In some implementations, the following statement is used instead of SCHEMA.

CREATE DATABASE database_name

So, this might be the first step before starting to create any object.

Data definition for Domains

• CREATE DOMAIN domain_name [AS] data_type


• DEFAULT default_value [CHECK (search_condition)]
• A domain is given a name, domain_name, a data type (CHAR, VARCHAR,
INTEGER, NUMERIC, FLOAT, DATE, etc.)

Examples:

• CREATE DOMAIN gender AS CHARACTER (1) CHECK (VALUE IN


(‘M’,’F’))
• Creates a domain called gender that consists of a single character with either the
value ‘M’ or ‘F’.When defining the Sex column we can use this domain.
• DROP DOMAIN domain_name [RESTRICT|CASCADE]
• Removes domains defined in the system. In the CASCADE, any table column that
is based on the domain is automatically changed to the underlying data type.
• RESTRICT means don`t remove domain if it is used in any column definition.

DROP DOMAIN gender


Fundamentals of Database Systems Organized By Melkam A.
60
• Creating Database

Syntax

CREATE DATABASE <database_name>

<database_name> is the name of the new database.

Example: CREATE DATABASE test

Table Creation and Modification

The CREATE TABLE command in the SQL statement is used to specify a new relation
in a database by giving it a name and listing its attributes.

Syntax

CREATE TABLE table_name

(Column_name data type| [NOT NULL|NULL]]

[DEFAULT default_value][CHECK (search_condition)]... {For each column and then


after all columns defined}

[PRIMARY KEY (column(s))][UNIQUE (column(s))]

[FOREIGN KEY (column(s)) REFERENCES target_table_name


[list_of_candidate_key_columns]

[ON UPDATE referential_integrity_action]

[ON DELETE referential_integrity_action]]

- <column_name> is the name of the column.

- <data_type> is the SQL supported data types: CHAR (n), VARCHAR (n), INT,

Fundamentals of Database Systems Organized By Melkam A.


61
SMALLINT, DECIMAL (i, j), DATE, TIME (DATETIME) …

- {column _constraint} is optional constraints on the column such as NULL, NOT


NULL,

PRIMARY KEY, FOREIGN KEY, UNIQUE, DEFAULT …

Examples

1. Create a table with fields’ name. Age and sex.

CREATE TABLE new table (name char (30) NOT NULL, age integer NOT
NULL CHECK age>0, sex char (1))

2. Create a table with fields: name, age, sex and city; and make the default value of city
‘Addis Ababa’ (expecting many records will hold this value) and a rule for male
members to be of age>30.

CREATE TABLE temp (name char (30) NOT NULL, age integer NOT NULL
CHECK age>0, sex char (1) CHECK VALUE IN (‘M’,’F’) PRIMARY KEY (name)
CHECK sex=‘M’AND age>30)

3. The following table defines a foreign key rule for the employee table whose target is
the department table.

CREATE TABLE department (depid char (5) NOT NULL, depname char (40), depid
char (5), budget float (14, 2) PRIMARY KEY (depid) UNIQUE (depname))

CREATE TABLE employee (empid char (5) NOT NULL, empname char (40), depid
char (5), salary float (10, 2) PRIMARY KEY (empid) FOREIGN KEY (depid)
REFERENCES department ON UPDATE CASCADE ON DELETE CASCADE).

Changing a Table definition (ALTER TABLE)

Fundamentals of Database Systems Organized By Melkam A.


62
The ALTER TABLE command in SQL is used to change the structure/definition of an
existing table.

It is used mainly to:

Add a new column to a table

Drop a column from a table

Add a new table constraint

Drop a table constraint

Set a default for a column

Drop the default for a column

The basic format of the statement is:

ALTER TABLE table_name ADD|ALTER column_name data_type [NOT


NULL] [DEFAULT dvalue] [CHECK (condition)]]

[ALTER column_name data_type]

[DROP column_name [RESTRICT|CASCADE]]

[ADD [CONSTRAINT [constraint_name]] table_constraint_definition]

[DROP CONSTRAINT constraint_name [RESTRICT|CASCADE]]

[ALTER [COLUMN] SET DEFAULT default_value]

[ALTER [COLUMN] DROP DEFAULT]

Where the parameters are as defined in the CREATE TABLE. The


table_constraint_definition is one of the clauses:

Fundamentals of Database Systems Organized By Melkam A.


63
PRIMARY KEY, UNIQUE, FOREIGN KEY OR CHECK.

Examples

1. Change the field width of the empid column inside the employee table.

ALTER TABLE employee ALTER COLUMN empid character (20)

2. Add a new default value of .AA. For the empid field inside the employee table.

ALTER TABLE employee ALTER COLUMN empid SET DEFAULT .AA.

3. Tell the system that empid is the primary key inside the employee table.

ALTER TABLE employee ADD PRIMARY KEY (empid)

Removing Tables (DROP TABLE)

To remove a table we use the DROP TABLE command as follows:

DROP TABLE table_name

2. Data Manipulation Language

The Data Manipulation Language (DML) is part of the SQL syntax for executing queries
to insert, retrieve, update, and delete records. The statements are;

- INSERT INTO - inserts new data into a database table.

- SELECT - extracts data from a database table.

- UPDATE - updates data in a database table.

- DELETE - deletes data from a database table.

SELECT statement

Fundamentals of Database Systems Organized By Melkam A.


64
The purpose of the SELECT statement is to retrieve and display data from one or more
database tables.

It is the most frequently used SQL command. The general form of the SELECT statement
is:

• SELECT [DISTINCT|ALL] {*|[Column_expression [AS new_name]] }


FROM table_expression [WHERE condition] [GROUP BY column_list]
[HAVING condition] [ORDER BY column_list]
• Column_expression represents a column or an expression. The sequence of
processing in a SELECT statement is:
• FROM specifies the table or tables used.
• WHERE Filters the rows subject to some condition.
• GROUP BY Form groups of rows the same column value.
• HAVING filters the groups subject to some condition.
• SELECT Specifies which columns are to appear in the output.
• ORDER BY specifies the order of the output.

The above order of the clauses in the processing of SELECT cannot be changed. The
only two mandatory clauses are the first two: SELECT and FROM and the remainder are
optional. The result of a query on a table is another table.

The following examples show the different usage of this statement.

Fundamentals of Database Systems Organized By Melkam A.


65
Since many SQL retrievals require all columns of a table, there is a quick way of
expressing all columns… Using an asterisk (*) in place of the column names.

SELECT * FROM EMPLOYEE

The result table for the above two SELECT statements is:

Fundamentals of Database Systems Organized By Melkam A.


66
2. Retrieve specific columns, in all rows.

SELECT empname, salary FROM employee

This shows all employees with their name and salary.

Fundamentals of Database Systems Organized By Melkam A.


67
3. Using distinct (includes all those values that are unique in a particular column (by
eliminating duplicate values)).

SELECT DISTINCT depid FROM employee:

4. Using a calculated field or an expression

SELECT empname, ’Yearly salary =‘AS title, salary * 12 as yearly FROM employee

This command displays the following table. Note the title and Yearly column names in
the following table heading.

Fundamentals of Database Systems Organized By Melkam A.


68
Row selection

The above examples show the use of the SELECT statement to retrieve all rows from a
table.

However, we often need to restrict the rows that are retrieved.

This can be achieved with the WHERE clause, which consists of the keyword WHERE
followed by a search condition that specifies the rows to be retrieved.

The five basic search conditions are as follows:

Fundamentals of Database Systems Organized By Melkam A.


69
List all employees with a salary greater than 800

SELECT * FROM employee WHERE salary > 800

Sorting results (ORDER BY clause)

Fundamentals of Database Systems Organized By Melkam A.


70
In general, the rows of an SQL query result table are not arranged in any particular order.
However, we can sort the results of a query using the ORDER BY clause in the SELECT
statement.

This clause consists of a list of column identifiers that the result is to be sorted on,
separated by commas.

1. List all employee ordered by empname column

SELECT * FROM employee ORDER BY empname

2. List all employee ordered by empname column (in reverse order)

SELECT * FROM employee ORDER BY empname DESC

3. List all employees by empname and those with similar names by department Id.

SELECT * FROM employee ORDER BY empname ASC, depid ASC

ASC stands for ascending sort order and DESC stands for reverse sort order.

The INSERT INTO Statement

• The INSERT INTO statement is used to insert new rows into a table.

Syntax

INSERT INTO table_name VALUES (value1, value2…)

You can also specify the columns for which you want to insert data:

INSERT INTO table_name (column1, column2...) VALUES (value1, value2,....)

The Update Statement

The UPDATE statement is used to modify the data in a table.

Fundamentals of Database Systems Organized By Melkam A.


71
Syntax

UPDATE table_name SET column_name = new_value WHERE column_name =


some_value

The DELETE Statement

The DELETE statement is used to delete rows in a table.

Syntax

DELETE FROM table_name WHERE column_name = some_value

Fundamentals of Database Systems Organized By Melkam A.


72
Fundamentals of Database Systems Organized By Melkam A.
73
Fundamentals of Database Systems Organized By Melkam A.
74

You might also like