Dbms-Module%201
Dbms-Module%201
MODULE 1
Chapter 1:
Databases and Database Users
Databases and database systems have become an essential component of everyday life in modern
society. In the course of a day, most of us encounter several activities that involve some interaction
with a database. For example, if we go to the bank to deposit or withdraw funds; if we make a hotel or
airline reservation; if we access a computerized library catalog to search for a bibliographic item; or if
we order a magazine subscription from a publisher, chances are that our activities will involve
someone accessing a database. Even purchasing items from a supermarket nowadays in many cases
involves an automatic update of the database that keeps the inventory of supermarket items.
The above interactions are examples of what we may call traditional database applications, where
most of the information that is stored and accessed is either textual or numeric. In the past few years,
advances in technology have been leading to exciting new applications of database systems.
Multimedia databases can now store pictures, video clips, and sound messages. Geographic
information systems (GIS) can store and analyze maps, weather data, and satellite images. Data
warehouses and on-line analytical processing (OLAP) systems are used in many companies to extract
and analyze useful information from very large databases for decision making. Real-time and active
database technology is used in controlling industrial and manufacturing processes. And database
search techniques are being applied to the World Wide Web to improve the search for information
that is needed by users browsing through the Internet.
1.1 Introduction
Importance: Database systems have become an essential component of life in modern society, in that
many frequently occurring events trigger the accessing of at least one database: bibliographic library
searches, bank transactions, hotel/airline reservations, grocery store purchases, online (Web) purchases,
etc., etc.
Sahana M Page 1
MODULE –I NOTES DBMS
Definitions
The term database is often used, rather loosely, to refer to just about any collection of related data. E&N
say that, in addition to being a collection of related data, a database must have the following properties:
• It represents some aspect of the real (or an imagined) world, called the miniworld or universe
of discourse. Changes to the miniworld are reflected in the database. Imagine, for example, a
UNIVERSITY miniworld concerned with students, courses, course sections, grades, and course
prerequisites.
• It is a logically coherent collection of data, to which some meaning can be attached. (Logical
coherency requires, in part, that the database not be self-contradictory.)
• It has a purpose: there is an intended group of users and some preconceived applications that the
users are interested in employing.
• To summarize: a database has some source (i.e., the miniworld) from which data are derived,
some degree of interaction with events in the represented miniworld (at least insofar as the data
is updated when the state of the miniworld changes), and an audience that is interested in using
it.
An Aside: data vs. information vs. knowledge: Data is the representation of "facts" or "observations"
whereas information refers to the meaning thereof (according to some interpretation). Knowledge, on
the other hand, refers to the ability to use information to achieve intended ends.
Computerized vs. manual: Not surprisingly (this being a CS course), our concern will be with
computerized database systems, as opposed to manual ones, such as the card catalog-based systems that
were used in libraries in ancient times (i.e., before the year 2000). (Some authors wouldn't even
recognize a non-computerized collection of data as a database, but E&N do.)
Size/Complexity: Databases run the range from being small/simple (e.g., one person's recipe database)
to being huge/complex (e.g., Amazon's database that keeps track of all its products, customers, and
suppliers).
Sahana M Page 2
MODULE –I NOTES DBMS
Security protection: preventing unauthorized or malicious access to database. Given all its
responsibilities, it is not surprising that a typical DBMS is a complex piece of software.
A database management system (DBMS) is a collection of programs that enables users to create and
maintain a database. The DBMS is hence a general-purpose software system that facilitates the
processes of defining, constructing, and manipulating databases for various applications. Defining a
database involves specifying the data types, structures, and constraints for the data to be stored in the
database. Constructing the database is the process of storing the data itself on some storage medium
that is controlled by the DBMS. Manipulating a database includes such functions as querying the
database to retrieve specific data, updating the database to reflect changes in the mini world, and
generating reports from the data.
In traditional file processing, data definition is typically part of the application programs
themselves. Hence, these programs are constrained to work with only one specific database, whose
structure is declared in the application programs. For example, a PASCAL program may have record
structures declared in it; a C++ program may have "struct" or "class" declarations; and a COBOL
program has Data Division statements to define its files. Whereas file-processing software can
access only specific databases, DBMS software can access diverse databases by extracting the
database definitions from the catalog and then using these definitions.
Sahana M Page 3
MODULE –I NOTES DBMS
1.2 Example
Let us consider an example that most readers may be familiar with: a UNIVERSITY
database for maintaining information concerning students, courses, and grades in a university
environment. Figure 01.02 shows the database structure and a few sample data for such a
database. The database is organized as five files, each of which stores data records of the same
type (Note 2). The STUDENT file stores data on each student; the COURSE file stores data on
each course; the SECTION file stores data on each section of a course; the GRADE_REPORT
file stores the grades that students receive in the various sections they have completed; and the
PREREQUISITE file stores the prerequisites of each course.
Sahana M Page 4
MODULE –I NOTES DBMS
Sahana M Page 5
MODULE –I NOTES DBMS
The main characteristics of the database approach versus the file-processing approach are the following.
Sahana M Page 7
MODULE –I NOTES DBMS
Database Administrators
Database Designers
End Users
Database Administrators
In any organization where many persons use the same resources, there is a need for a
chief administrator to oversee and manage these resources. In a database environment, the
primary resource is the database itself and the secondary resource is the DBMS and related
software. Administering these resources is the responsibility of the database administrator
(DBA). The DBA is responsible for authorizing access to the database, for coordinating and
monitoring its use, and for acquiring software and hardware resources as needed.
Database Designers
Database designers are responsible for identifying the data to be stored in the database
and for choosing appropriate structures to represent and store this data. It is the responsibility of
database designers to communicate with all prospective database users, in order to understand
their requirements, and to come up with a design that meets these requirements.
End Users
End users are the people whose jobs require access to the database for querying, updating,
and generating reports; the database primarily exists for their use. There are several categories of
end users:
• Casual end users occasionally access the database, but they may need different information
each time. They use a sophisticated database query language to specify their requests and
are typically middle- or high-level managers or other occasional browsers.
• Naive or parametric end users make up a sizable portion of database end users. Their
main job function revolves around constantly querying and updating the database, using
standard types of queries and updates—called canned transactions—that have been
carefully programmed and tested.
Bank tellers check account balances and post withdrawals and deposits.
Sahana M Page 8
MODULE –I NOTES DBMS
• Sophisticated end users include engineers, scientists, business analysts, and others who
thoroughly familiarize themselves with the facilities of the DBMS so as to implement
their applications to meet their complex requirements.
• Stand-alone users maintain personal databases by using ready-made program packages
that provide easy-to- use menu- or graphics-based interfaces. An example is the user of a
tax package that stores a variety of personal financial data for tax purposes.
System Analysts and Application Programmers (Software Engineers)
System analysts determine the requirements of end users, especially naive and
parametric end users, and develop specifications for canned transactions that meet these
requirements. Application programmers implement these specifications as programs; then
they test, debug, document, and maintain these canned transactions. Such analysts and
programmers (nowadays called software engineers) should be familiar with the full range
of capabilities provided by the DBMS to accomplish their tasks.
1.5 Workers behind the Scene
In addition to those who design, use, and administer a database, others are associated with
the design, development, and operation of the DBMS software and system environment.
These persons are typically not interested in the database itself. We call them the
"workers behind the scene," and they include the following categories.
DBMS system designers and implementers are persons who design and implement
the DBMS modules and interfaces as a software package. A DBMS is a complex
software system that consists of many components or modules, including modules for
implementing the catalog, query language, interface processors, data access,
concurrency control, recovery, and security.
Tool developers include persons who design and implement tools—the software
packages that facilitate database system design and use, and help improve
performance. Tools are optional packages that are often purchased separately. They
include packages for database design, performance monitoring, natural language or
graphical interfaces, prototyping, simulation, and test data generation .
Operators and maintenance personnel are the system administration personnel who
are responsible for the actual running and maintenance of the hardware and software
environment for the database system.
Sahana M Page 9
MODULE –I NOTES DBMS
Controlling Redundancy
Controlling Redundancy
In traditional software development utilizing file processing, every user group maintains
its own files for handling its data-processing applications. For example, consider the
UNIVERSITY database example two groups of users might be the course registration personnel
and the accounting office. In the traditional approach, each group independently keeps files on
students. The accounting office also keeps data on registration and related billing information,
whereas the registration office keeps track of student courses and grades. Much of the data is
stored twice: once in the files of each user group. Additional user groups may further duplicate
some or all of the same data in their own files.
This redundancy in storing the same data multiple times leads to several problems. First,
there is the need to perform a single logical update—such as entering data on a new student—
multiple times: once for each file where student data is recorded. This leads to duplication of
effort. Second, storage space is wasted when the same data is stored repeatedly, and this problem
may be serious for large databases. Third, files that represent the same data may become
inconsistent. This may happen because an update is applied to some of the files but not to others.
Sahana M Page 10
MODULE –I NOTES DBMS
In the database approach, the views of different user groups are integrated during
database design. For consistency, we should have a database design that stores each logical data
item—such as a student’s name or birth date—in only one place in the database. This does not
permit inconsistency, and it saves storage space.
Databases can be used to provide persistent storage for program objects and data
structures. This is one of the main reasons for the emergence of the object-oriented database
systems. Programming languages typically have complex data structures, such as record types in
PASCAL or class definitions in C++. The values of program variables are discarded once a
program terminates.
The persistent storage of program objects and data structures is an important function of
database systems. Traditional database systems often suffered from the so-called impedance
mismatch problem, since the data structures provided by the DBMS were incompatible with the
programming language’s data structures. Object- oriented database systems typically offer data
structure compatibility with one or more object-oriented programming languages.
Sahana M Page 12
MODULE –I NOTES DBMS
3. Object-Oriented databases
In the 1980’s with the emergence of object oriented programming languages, it
was necessary to store and share complex structured objects. This led to the
development of object oriented databases. They are used in specialized
applications such as engineering design, multimedia publishing and
manufacturing systems.
Advantages
• It provided more general data structures.
•In incorporated many of the useful object oriented paradigms, such as
ADT(Abstract Data Types), encapsulation, inheritance etc..
4. Web based database applications
The World Wide Web is a large interconnection of a number of computer networks. User
can create web documents (using HTML ( Hyper Text Markup Language)) called web
pages and store them on web servers from where other web clients can access.
Sahana M Page 13
MODULE –I NOTES DBMS
Documents can be linked together through hyperlinks, which are pointers to other
documents.
5. File systems
Large number of records of similar structure were stored and maintained in large
organization.
Drawback
There was intermixing a conceptual relationship with the physical storage
and placement of records on disk. Although for original queries and transaction data
access was efficient, it did not provide enough flexibility to access records
efficiently when new queries and transactions were identified.
When changes were made to the requirements of the application, it was
difficult to reorganize the database.
These systems only provide programming language interfaces.
Implementing new queries and transactions was time-consuming and expensive.
Sahana M Page 14
MODULE –I NOTES DBMS
Chapter 2
OVERVIEW OF DATABASE LANGUAGES AND ARCHITECTURES
2.1 Data Models, Schemas, and Instances
One fundamental characteristic of the database approach is that it provides some level of data
abstraction.
Data abstraction generally refers to the suppression of details of data organization and storage, and
the highlighting of the essential features for an improved understanding of data.
One of the main characteristics of the database approach is to support data abstraction so that different
users can perceive data at their preferred level of detail.
A data model—a collection of concepts that can be used to describe the structure of a database—
provides the necessary means to achieve this abstraction
By structure of a database means the data types, relationships, and constraints that apply to the data.
Most data models also include a set of basic operations for specifying retrievals and updates on the
database.
-provide concepts that are close to the way many users perceive data
- provide concepts that describe the details of how data is stored on the computer storage media,
typically magnetic disks.
- Concepts provided by low-level data models are generally meant for computer specialists, not for
end users
Sahana M Page 15
MODULE –I NOTES DBMS
3. Representational (or implementation) data models
- provide concepts that may be easily understood by end users but that are not too far removed from the
way data is organized in computer storage
-in between high level and low level
- hide many details of data storage on disk but can be implemented on a computer system directly
Schemas
The description of a database is called the database schema, which is specified during database design
and is not expected to change frequently.
A displayed schema is called a schema diagram
Object in the schema—such as STUDENT or COURSE—a schema construct
A schema diagram displays only some aspects of a schema, such as the names of record types and data
items, and some types of constraints.
The actual data in a database may change quite frequently. Changes every time we add a new student or
enter a new grade.
The data in the database at a particular moment in time is called a database state or SNAPSHOT. It is
also called the current set of occurrences or instances in the database.. In a given database state, each
schema construct has its own current set of instances; for example, the STUDENT construct will
contain the set of individual student entities (records) as its instances.
Sahana M Page 16
MODULE –I NOTES DBMS
When we define a new database, we specify its database schema only to the DBMS. At this point, the
corresponding database state is the empty state with no data.
We get the initial state of the database when the database is first populated or loaded with the initial
data. From then on, every time an update operation is applied to the database, we get another database
state.
At any point in time, the database has a current state.
The DBMS is partly responsible for ensuring that every state of the database is a valid state—that is, a
state that satisfies the structure and constraints specified in the schema.
The DBMS stores the descriptions of the schema constructs and constraints—also called the meta-
data—in the DBMS catalog.
The schema is sometimes called the INTENSION, and a database state is called an EXTENSIONof the
schema.
Application requirements change occasionally, which is one of the reasons why software maintenance
is important. On such occasions, a change to a database's schema may be called for. An example would
be to add a Date_of_Birth field/attribute to the STUDENT table. Making changes to a database schema
is known as SCHEMA EVOLUTION. Most modern DBMS's support schema evolution operations that
can be applied while a database is operational.
Sahana M Page 17
MODULE –I NOTES DBMS
The DBMS must transform a request specified on an external schema into a request against the
conceptual schema, and then into a request on the internal schema for processing over the stored
database. If the request is database retrieval, the data extracted from the stored database must be
reformatted to match the user’s external view. The processes of transforming requests and results
between levels are called mappings. These mappings may be time-consuming, so some DBMSs—
especially those that are meant to support small databases—do not support external views.
Even in such systems, however, a certain amount of mapping is necessary to transform requests
between the conceptual and internal levels.
Data Independence
Data independence, which can be defined as the capacity to change the schema at one level of a
database system without having to change the schema at the next higher level.
We can define two types of data independence:
1. Logical data independence
capacity to change the conceptual schema without having to change external schemas or
application programs
change the conceptual schema to expand the database, to change constraints, or to reduce the
database
Sahana M Page 18
MODULE –I NOTES DBMS
changes to constraints can be applied to the conceptual schema without affecting the external
schemas or application programs
DBMS packages provide an integrated feature of above languages into a single language called
Structured Query Language.
Data definition language (DDL), is used by the DBA and by database designers to define both
schemas.
Storage definition language (SDL), is used to specify the internal schema.
View definition language (VDL), to specify user views and their mappings to the conceptual schema.
Data manipulation language (DML) provides set of operations like retrieval, insertion, deletion, and
modification of the data.
Whenever DML commands, whether high level or low level, are embedded in a general-purpose
programming language, that language is called the host language and the DML is called the data
sublanguage. A high-level DML used in a standalone interactive manner is called a query language.
Sahana M Page 19
MODULE –I NOTES DBMS
DBMS Interfaces
Menu-Based Interfaces for Web Clients or Browsing. These interfaces present the user with lists
of options (called menus) that lead the user through the formulation of a request.
Forms-Based Interfaces displays a form to each user. Users can fill out all of the form entries to
insert new data, or they can fill out only certain entries, in which case the DBMS will retrieve
matching data for the remaining entries.
Graphical User Interfaces displays a schema to the user in diagrammatic form. The user then can
specify a query by manipulating the diagram. GUIs utilize both menus and forms. Most GUIs
use a pointing device.
Natural Language Interfaces accepts requests written in English or some other language and
attempt to understand them.
Speech Input and Output use of speech as an input query and speech as an answer to a question
or result. The speech input is detected using a library of predefined words and used to set up the
parameters that are supplied to the queries.
Interfaces for Parametric Users such as bank tellers, often have a small set of operations that they
must perform repeatedly.
Interfaces for the DBA. DBA use privileged commands. These include commands for creating
accounts, setting system parameters, granting account authorization, changing a schema, and
reorganizing the storage structures of a database.
Many DBMSs have their own buffer management module to schedule disk read/write.Stored data
manager controls access to DBMS information that is stored on disk, whether it is part of the database or
the catalog.
Top half figure:
it shows interfaces for the DBA staff, casual users who work with interactive interfaces to
formulate queries
application programmers who create programs using some host programming languages
parametric users who do data entry work by supplying parameters to predefined transactions.
the DBA staff works on defining the database and tuning it by making changes to its definition
using the DDL and other privileged commands
Sahana M Page 20
MODULE –I NOTES DBMS
DBA staff:
The DDL compiler processes schema definitions, specified in the DDL, and stores descriptions
of the schemas (meta-data) in the DBMS catalog
The catalog includes information such as the names and sizes of files, names and data types of
data items, storage details of each file, mapping information among schemas, and constraints
Casual users:
interact using some form of interface, which we call the interactive query interface
queries are parsed and validated for correctness of the query syntax, the names of files anddata
elements, and so on by a query compiler that compiles them into an internal form
This internal query is subjected to query optimization
query optimizer is concerned with the rearrangement and possible reordering of operations,
elimination of redundancies, and use of correct algorithms and indexes during execution.
It consults the system catalog for statistical and other physical information about the stored
data and generates executable code that performs the necessary operations for the query and
makes calls on the runtime processor
Application programmers
write programs in host languages such as Java, C, or C++ that are submitted to a precompiler
pre compiler extracts DML commands from an application program
Sahana M Page 21
MODULE –I NOTES DBMS
commands are sent to the DML compiler for compilation
rest of the program is sent to the host language compiler
The object codes for the DML commands and the rest of the program are linked, forming a
canned transaction
An example is a bank withdrawal transaction where the account number and the amount may be
supplied as parameters.
In the lower part of Figure,
the runtime database processor executes
1. the privileged commands
2. the executable query plans, and
3. the canned transactions with runtime parameters.
It works with the system catalog and may update it with statistics
It also works with the stored data manager, which in turn uses basic operating system services
for carrying out low-level input/output (read/write) operations between the disk and main
memory
The runtime database processor handles other aspects of data transfer, such as management of
buffers concurrency control and backup and recovery systems, integrated into the working of the
runtime database processor for purposes of transaction management.
Sahana M Page 22
MODULE –I NOTES DBMS
Sahana M Page 23
MODULE –I NOTES DBMS
Earlier architectures used mainframe computers to provide the main processing for all system
functions
These mainframes replaced by users with their terminals with PCs and workstations
DB systems used these computers similarly to how they had used display terminals
So that the DBMS itself was still a centralized DBMS in which all the DBMS functionality,
application program execution, and user interface processing were carried out on one machine
Gradually, DBMS systems started to exploit the available processing power at the user side,
which led to client/server.
The client/server architecture was developed to deal with computing environments in which a large
number of PCs, workstations, file servers, printers, database servers, Web servers, e-mail servers, and
other software and equipment are connected via a network.
The idea is to define specialized servers with specific functionalities
it is possible to connect a number of PCs or small workstations as clients to a file server that
maintains the files of the client machines
Another machine can be designated as a printer server by being connected to various printers; all
print requests by the clients are forwarded to this machine
Web servers or e-mail servers also fall into the specialized server category. The resources
provided by specialized servers can be accessed by many client machines
The client machines provide the user with the appropriate interfaces to utilize these servers, as
well as with local processing power to run local applications.
This concept can be carried over to other software packages, with specialized programs—such as a CAD
(computer-aided design) package
Some machines would be client sites only, other machines would be dedicated servers, and
others would have both client and server functionality
The concept of client/server architecture assumes an underlying framework that consists of many
PCs and workstations as well as a smaller number of mainframe machines, connected via LANs
and other types of computer networks
Sahana M Page 24
MODULE –I NOTES DBMS
A client machine provides user machine that provides user interface capabilities and local
processing
A server is a system containing both hardware and software that can provide services to the
client machines, such as file access, printing, archiving, or database access.
In general, some machines install only client software, others only server software, and still
others may include both client and server software
Sahana M Page 25
MODULE –I NOTES DBMS
A standard called Open Database Connectivity (ODBC) provides an application programming
interface (API)
The 2nd approach to two-tier client/server architecture was taken by some object-oriented
DBMSs, where the software modules of the DBMS were divided between client and server
The server level may include the part of the DBMS software responsible for handling data
storage on disk pages, local concurrency control and recovery, buffering and caching of disk
pages, and other such functions.
the client level may handle the user interface, data dictionary functions, DBMS interactions with
programming language compilers, global query optimization, concurrency control, and recovery
across multiple servers, structuring of complex objects from the data in the buffers
The architectures described here are calledtwo-tier architectures because the software
components are distributed over two systems: client and server.
The advantages of this architecture:
simplicity and seamless compatibility with existing systems
Many Web applications use an architecture called the three-tier architecture, which adds an
intermediate layer between the client and the database server
This intermediate layer or middle tier is called the application server or the Web server, depending
on the application
This server plays an intermediary role by running application programs and storing business
rules (procedures or constraints) that are used to access data from the database server. It can also
improve database security by checking a client’s credentials before forwarding a request to the
database server
Clients contain GUI interfaces and some additional application-specific business rules
Sahana M Page 26
MODULE –I NOTES DBMS
The intermediate server accepts requests from the client, processes the request and sends
database queries and commands to the database server, and then acts as a conduit for passing
(partially) processed data from the database server to the clients
Thus, the user interface, application rules, and data access act as the three tiers
The presentation layer displays information to the user and allows data entry
The business logic layer handles intermediate rules and constraints before data is passed up to
the user or down to the DBMS
The bottom layer includes all data management services. The middle layer can also act as a Web
server, which retrieves query results from the database server and formats them into dynamic
Web pages that are viewed by the Web browser at the client side
If business logic layer is divided into multiple layer, then called as n-tier architecture
Sahana M Page 27
MODULE –I NOTES DBMS
CHAPTER 3
DATA MODELLING USING ENTITIES AND RELATIONSHIPS
Sahana M Page 28
MODULE –I NOTES DBMS
The company is organized into departments. Each department has a unique name, a unique number,
and a particular employee who manages the department. We keep track of the start date when that
employee began managing the department. A department may have several locations.
A department controls a number of projects, each of which has a unique name, a unique number,
and a single location.
The database will store each employee’s name, Social Security number,address, salary, sex
(gender), and birth date. An employee is assigned to one department, but may work on several projects,
which are not necessarily controlled by the same department. It is required to keep track of the current
number of hours per week that an employee works on each project, as well as the direct supervisor of
each employee (who is another employee).
The database will keep track of the dependents of each employee for insurance purposes, including
each dependent’s first name, sex, birth date, and relationship to the employee.
Sahana M Page 29
MODULE –I NOTES DBMS
Entity, which is a thing or object in the real world with an independent existence.
An entity may be an
1. object with a physical existence (for example, a particular person, car, house, or employee) or
2. object with a conceptual existence (for instance, a company, a job, or a university course)
Each entity has attributes—the particular properties that describe it. For example, an EMPLOYEE
entity may be described by the employee’s name, age, address, salary, and job
Sahana M Page 30
MODULE –I NOTES DBMS
The EMPLOYEE entity e1 has four attributes: Name, Address, Age, and Home_phone; their
values are ‘John Smith,’ ‘2311 Kirby, Houston, Texas 77001’, ‘55’, and ‘713-749-2630’,
respectively.
The COMPANY entity c1 has three attributes: Name, Headquarters, and President; their values
are ‘Sunco Oil’, ‘Houston’, and ‘John Smith’, respectively.
Types of attributes occur in the ER model
1. simple versus composite
2. single valued versus multivalued
3. stored versus derived
Attributes that are not divisible are called simple or atomic attributes.
For eg: attribute Age cannot be divided
Sahana M Page 31
MODULE –I NOTES DBMS
single-valued
Most attributes have a single value for a particular entity
For eg:Age is a single-valued attribute of a person
Multivalued
An entity having multiple values for that attribute For eg: color of a color color={black,red}
Person’s degree degree={BE, MTech, PhD}
NULL Values
In some cases, a particular entity may not have an applicable value for an attribute.
foreg, the Apartment_number attribute of an address applies only to addresses that are in
apartment buildings and not to other types of residences, such as single-family homes
College_degrees attribute applies only to people with college degrees
Complex Attributes
composite and multivalued attributes can be nested arbitrarily
arbitrary nesting by grouping components of a composite attribute between parentheses ( ) and
separating the components with commas, and by displaying multivalued attributes between
braces { }. Such attributes are called complex attributes
For example, if a person can have more than one residence and each residence can have a single
address and multiple phones, an attribute Address_phone for a person
Sahana M Page 32
MODULE –I NOTES DBMS
3.3.2 Entity Types, Entity Sets, Keys, and Value Sets
1. Entity Types and Entity Sets
entity type defines a collection (or set) of entities that have the same attributes
each entity type in the database is described by its name and attributes
below figure shows two entity types: EMPLOYEE and COMPANY, and a list of some of the attributes for
each
Fig: two entity types: EMPLOYEE and COMPANY, and a list of some of the attributes for each
The collection of all entities of a particular entity type in the database at any point in time is called
an entity set or entity collection entity set is usually referred to using the same name as the entity
type.
An entity type is represented in ER diagrams as a rectangular box.
Attribute names are enclosed in ovals and are attached to their entity type by straight lines.
Composite attributes are attached to their component attributes by straight lines.
Multivalued attributes are displayed in double ovals
Collection of entities of a particular entity type is grouped into an entity set, which is also called the
extension of the entity type
Important constraint on the entities of an entity type is the key or uniqueness constraint on
attributes
An entity type usually has one or more attributes whose values are distinct for each individual
entity in the entity set. Such an attribute is called a key attribute, and its values can be used to
identify each entity uniquely.
Sahana M Page 33
MODULE –I NOTES DBMS
For eg, the Name attribute is a key of the COMPANY entity type in because no two companies are
allowed to have the same name
For the PERSON entity type, a typical key attribute is Ssn
Specifying that an attribute is a key of an entity type means that the preceding uniqueness property
must hold for every entity set of the entity type
Hence, it is a constraint that prohibits any two entities from having the same value for the key
attribute at the same time
Some entity types have more than one key attribute.
For eg, each of the Vehicle_id and Registration attributes of the entity type CAR is a key in its own right
Sahana M Page 34
MODULE –I NOTES DBMS
For a composite attribute A, the value set V is the power set of the Cartesian product of P(V1),
P(V2), . . . , P(Vn), where V1, V2, . . . , Vn are the value sets of the simple component attributes
that form A: V = P(P(V1) × P(V2) × . . . × P(Vn)
Whenever an attribute of one entity type refers to another entity type, some relationship exists for
example, the attribute Manager of DEPARTMENT refers to an employee who manages the
department, the attribute Controlling_department of PROJECT refers to the department that
controls the project in the ER model, these references should not be represented as attributes but as
relationships
Sahana M Page 35
MODULE –I NOTES DBMS
3.4.1 Relationship Types, Sets, and Instances
A relationship type R among n entity types E1, E2, . . . , En defines a set of associations—or a
relationship set—among entities from these entity types
entity types and entity sets, a relationship type and its corresponding relationship set are
customarily referred to by the same name, R
Mathematically, the relationship set R is a set of relationship instances ri, where each ri associates
n individual entities (e1, e2, . . . , en), and each entity ej in ri is a member of entity set Ej , 1 ≤ j ≤ n
a relationship set is a mathematical relation on E1, E2, . . . , En; alternatively, it can be defined as a
subset of the Cartesian product of the entity sets E1 × E2 × . . . × En
each of the entity types E1, E2, . . . , En is said to participate in the relationship type R
each of the individual entities e1, e2, . . , en is said to participate in the relationship instanceri = (e1,
e2, . . , en)
consider a relationship type WORKS_FOR between the two entity types EMPLOYEE and
DEPARTMENT, which associates each employee with the department for which the employee
works. Each relationship instance in the relationship set WORKS_FOR associates one
EMPLOYEE entity and one DEPARTMENT entity.
the employees e1, e3, and e6 work for department d1
the employees e2 and e4 work for department d2; and the employees e5 and e7 work for
department d3
In ER diagrams, relationship types are displayed as diamond-shaped boxes, which are connected by
straight lines to the rectangular boxes representing the participating entity types. The relationship
name is displayed in the diamond-shaped box
Sahana M Page 36
MODULE –I NOTES DBMS
Sahana M Page 37
MODULE –I NOTES DBMS
In 1:1 an employee can manage at most one department and a department can have at most one
manager
In M:Nan employee can work on several projects and a project can have several employees
Cardinality ratios for binary relationships are represented on ER diagrams by displaying 1, M, and N
Sahana M Page 38
MODULE –I NOTES DBMS
on the diamonds
The participation constraint specifies whether the existence of an entity depends on its being
related to another entity via the relationship type
This constraint specifies the minimum number of relationship instances that each entity can
participate in and is sometimes called the minimum cardinality constraint
There are two types of participation constraints—total and partial
If a company policy states that every employee must work for a department, then an employee
entity can exist only if it participates in at least one WORKS_FOR relationship instance. Thus, the
participation of EMPLOYEE in WORKS_FOR is called total participation, meaning that every
entity in the total set of employee entities must be related to a department entity via WORKS_FOR.
Total participation is also called existence dependency.
we do not expect every employee to manage a department, so the participation of EMPLOYEE in
the MANAGES relationship type is partial, meaning that some or part of the set of employee
entities are related to some department entity via MANAGES, but not necessarily all
In ER diagrams, total participation (or existence dependency) is displayed as a double line
connecting the participating entity type to the relationship, whereas partial participation is
represented by a single line
Relationship types can also have attributes, similar to those of entity types.
For example, to record the number of hours per week that a particular employee works on a
particular project, we can include an attribute Hours for the WORKS_ON relationship type
to include the date on which a manager started managing a department via an attribute Start_date
for the MANAGES relationship type
Sahana M Page 39
MODULE –I NOTES DBMS
weak entities that are related to the same owner entity
assume that no two dependents of the same employee ever have the same first name, the attribute
Name of DEPENDENT is the partial key
choose names that convey the meanings attached to the different constructs in the schema
use singular names for entity types, rather than plural ones
use the convention that entity type and relationship type names are in uppercase letters, attribute
names have their initial letter capitalized, and role names are in lowercase letters
nouns appearing in the narrative tend to give rise to entity type names, and the verbs tend to
indicate names of relationship types
choosing binary relationship names to make the ER diagram of the schema readable from left to
right and from top to bottom
Schema design process should be considered an iterative refinement process, where an initial design is
created and then iteratively refined until the most suitable design is reached. Some of the refinements that
are often used include the following:
A concept may be first modeled as an attribute and then refined into a relationship because it is
determined that the attribute is a reference to another entity type
Similarly, an attribute that exists in several entity types may be elevated or promoted to an
independent entity type. For example, suppose that each of several entity types in a UNIVERSITY
database, such as STUDENT, INSTRUCTOR, and COURSE, has an attribute Department in the
initial design; the designer may then choose to create an entity type DEPARTMENT with a single
attribute Dept_name and relate it to the three entity types (STUDENT, INSTRUCTOR, and
COURSE) via appropriate relationships
An inverse refinement to the previous case may be applied—for example, if an entity type
DEPARTMENT exists in the initial design with a single attribute Dept_name and is related to only
one other entity type, STUDENT. In this case, DEPARTMENT may be reduced or demoted to an
attribute of STUDENT
Sahana M Page 40
MODULE –I NOTES DBMS
Sahana M Page 41
MODULE –I NOTES DBMS
3.7.4 ER diagrams for the company schema, with structural constraints specified using (min, max)
notation and role names
Sahana M Page 42
MODULE –I NOTES DBMS
Sahana M Page 43
MODULE –I NOTES DBMS
Sahana M Page 44
MODULE –I NOTES DBMS
Choosing between Binary and Ternary (or Higher-Degree) Relationships
The ER diagram notation for a ternary relationship type is shown in Figure (a), which displays the
schema for the SUPPLY relationship type that was displayed at the entity set/relationship set or
instance level
Recall that the relationship set of SUPPLY is a set of relationship instances (s, j, p), where s is a
SUPPLIER who is currently supplying a PART p to a PROJECT j
In general, a relationship type R of degree n will have n edges in an ER diagram, one connecting R
to each participating entity type.
Figure (b) shows an ER diagram for three binary relationship types CAN_SUPPLY, USES, and
SUPPLIES
In general, a ternary relationship type represents different information than do three binary
relationship types
Consider the three binary relationship types CAN_SUPPLY, USES, and SUPPLIES. Suppose that
CAN_SUPPLY, between SUPPLIER and PART, includes an instance (s, p) whenever supplier s
can supply part p (to any project); USES, between PROJECT and PART, includes an instance (j, p)
whenever project j uses part p; and SUPPLIES, between SUPPLIER and PROJECT, includes an
instance (s, j) whenever supplier s supplies some part to project j. The existence of three
relationship instances (s, p), (j, p), and (s, j) in CAN_SUPPLY, USES, and SUPPLIES,
respectively, does not necessarily imply that an instance (s, j, p) exists in the ternary relationship
SUPPLY, because the meaning is different.
It is often tricky to decide whether a particular relationship should be represented as a relationship
type of degree n or should be broken down into several relationship types of smaller degrees
The designer must base this decision on the semantics or meaning of the particular situation being
represented
Sahana M Page 45
MODULE –I NOTES DBMS
ER diagram UNIVERSITY DB
Sahana M Page 46
MODULE –I NOTES DBMS
An ER diagram for an AIRLINE DB
Sahana M Page 47
MODULE –I NOTES DBMS
An ER diagram for BANK DB
Sahana M Page 48
MODULE –I NOTES DBMS
An ER diagram for MOVIE DB
Specializations the process of defining a set of subclasses of an entity type; this entity type is called
the super class of the specialization.
The set of subclasses that forms a specialization is defined on the basis of some distinguishing
characteristic of the entities in the supe rclass.
For example, the set of subclasses {SECRETARY, ENGINEER, TECHNICIAN} is a
specialization of the super class EMPLOYEE that distinguishes among employee entities based on
the job type of each employee.
Sahana M Page 49
MODULE –I NOTES DBMS
Generalization
One can think of a reverse process of abstraction in which suppress the differences among several
entity types, identify their common features, and generalize them into a single superclass of which
the original entity types are special subclasses.
For example, consider the entity types CAR and TRUCK shown in below figure . Because they
have several common attributes, they can be generalized into the entity type VEHICLE, as shown
in Figure.
Both CAR and TRUCK are now subclasses of the generalized superclass VEHICLE. We use the
term generalization to refer to the process of defining a generalized entity type from the given
entity types.
Sahana M Page 50