Dbms Unit-1 Notes For Students
Dbms Unit-1 Notes For Students
Department of ECE
B.Tech IV Year I Semester (R18 Syllabus)
(Professional Elective-IV)
UNIT–I (Syllabus)
A Historical Perspective, File Systems versus a DBMS, the Data Model, Levels of
UNIT-I
Databases have developed alongside computers, and changed accordingly since their
inception. Database is a foundational element as we interact with the database even
without knowing it, such as
From the earliest days of computers, storing and manipulation of data have been a major
application focus. The initial computer applications focused on clerical tasks.
.
4
For example:
The history of the database systems can be divided into decade wise:
1. 1950-1960:
In this decade, Magnetic tapes were developed for data storage. Data processing tasks
such as payroll were automated, with data stored on tapes. Processing of data consisted
of reading data from one or more tapes and writing data to a new tape. For input, process
punched cards were used, and for output, the printer was used.
In the tapes, storage data read in sequential order. Access to the database was through
low-level pointer operations. Storage detail depended on the type of data to be stored. A
user would need to know the physical structure of the database in order to query for
information.
2. 1960-1970:
During this decade, hard disks were used for data storage and changed the structure of
data processing, since hard disks allowed direct access to data. Any location on disk could
be accessed. The first concept of database introduces in this decade by Charles
Bachman and he was the first person to develop the Integrated Data Store (IDS) which
was based on network data model.
5
Both databases i.e., IDS & IMS were components of the ‘navigational database’.
Navigational databases required users to navigate through the entire database to find
the required information.
The hierarchical model was developed by IBM. In it, data is organized like a family tree.
Each data entry has a parent record, starting with a root record. The Hierarchical model
is considered navigational because it is necessary to navigate from up to down for the
required information.
Network model navigate from up to down and down to up for the required information. The
network model was released at the Conference on Data Systems Languages
(CODASYL). It differed from the hierarchical model in that it allowed a record to have more
than one parent and child record.
3. 1970-1980:
During this decade the relational database model was developed by E.F. Codd (Edgar
Frank Codd). He proposed the relational model for the database. E. F. Codd releases
6
his paper “A Relational Model of Data for Large Shared Data Banks”. This paper the term
‘relational database’ which starts development of new way to store and access data.
Many of the database models we use today are relational based. It was considered the
standard database model. The relational model, which uses “declarative” techniques in
which you ask the system for what you want instead of how to navigate to it.
Two major relational database system prototypes were created during this decade and
they were:
INGRES was developed by Michael Stonebraker and Eugene Wong at the University of
California by INGRES Corporation. INGRES, stands for Interactive Graphics and
Retrieval System, was a relational database model. INGRES used a query language
called QUEL and later creation of systems such as
❖ MS SQL Server
❖ Sybase
QUEL
SEQUEL
In this decade, Peter Chen a Chinese scientist introduces a new database model known
as ER Model (Entity-Relationship Model). E-R Model design is basically made to for data
applications.
4. 1980-1990:
During this decade, SQL was declared as a standard language for the queries by ISO
(international organization for standardization) and ANSI (American National
Standards Institute) which we still use today.
Another noteworthy event the history of database was the development and coming into
use of Object-oriented database management systems (OODBMS). Object databases
would view data as ‘objects’. They would work with programming languages that
supported the ‘object-oriented’ approach.
5. 1990-2000:
The internet / WWW were introduced in this decade. It allows remote access to the
database systems and users began to use the client-server database.
8
The online businesses were increased resulting in a rise in demand for internet database
connectors like Active server page (ASP), Java Servlets, FrontPage, Dream Weaver.
Enterprise Java Beans etc.
In this decade, the creation of MySQL in 1995, which was an open source RDBMS
(Relational Database Management System) developed by ORACLE. MySQL is still used
by many organizations today. MYSQL based on SQL (Structured Query Language) and
runs on Linux, UNIX and Windows. MYSQL applications is used for a wide range of
purposes including Data Warehousing and E-Commerce. It’s used by many popular
websites including Facebook, Flickr, twitter and Youtube.
6. 2000-2010:
The term NoSQL (not only structured query language) was introduced. It refers to
databases that use query language other than SQL to store and retrieve data.
NoSQL databases are useful for unstructured data, and they growth in the
2000s.
NoSQL allowed for faster processing of larger and more varied datasets. NoSQL
databases are more flexible than the traditional relational databases.
7. 2010 onwards: (Big data, Distributed databases and cyber security were
introduced)
In this period, Big data (non-relational databases). Big Data is the type of data that
includes a combination of structured, semi-structured and unstructured data collected by
organizations that can be search for information which can be used in machine learning
and other applications. Big data meant for big databases in a variety of forms. The big
data is also refers to data that is so large or complex that it’s difficult or impossible to
process by using traditional methods.
9
Cyber security was introduced, it was involving computers or computer networks (such as
the Internet). Cyber security is the practice of defending computers, servers, mobile
devices, electronic systems, networks, and data from virus attacks. It's also known as
information technology security or electronic information security.
.
File System versus a DBMS
10
Storing and managing data is an important task for an individual as well as for a large
organization. There are various methods to store and manage data. Two of them are by
using the
❖ File system
❖ DBMS.
File System: File system was an early attempt to computerize the manual filing
system. It is basically a collection of application programs that performed services for the
end users. Each program within a file system defines and manages its own data. In this
system, a number of files are needed to perform various tasks. A file system is software
that manages the data files in a computer system.
DBMS: DBMS helps to easily store, retrieve and manipulate data in a database. DBMS
is software to create and manage databases. DBMS provides more advantages than a
file system.
has to write the programs for than a file system. DBMS is a collection of
managing the database. Handling a data. In DBMS the user is not required to write
file system is easier than DBMS, the programs.
File system stored data in the form Data stored in a tabular form i.e., structured
of unstructured format. format.
Exampe: Data stores in a tabular form
Exampe: In a Notepad file data
stores like Empno Name Salary
2.
101 xyz 20000
1. Abc 20000 manager 102 Abc 40000
hyderabad 103 Sam 35000
104 Smith 45000
2. Xyz 4000 director pune
For accessing a data file, you need Here in DBMS, accessing a file thru a simple
to know the file location where its query.
exist in the hard disk such as Drive
(C: or D: or E:) - Main directory
name – Subdirectory name - and
file name.
4.
For accessing a data file. Here, you
need to write a program in
C/C++/JAVA and mention the
location of the file in the program.
Compile the program, if there is no
errors then Run the program and it
will the access the data file.
6 Data redundancy and Data In DBMS, the related data resides in the same
inconsistency exist in file system. storage location or the same information may
13
File processing system has more not be duplicated as a result minimizing data
data redundancy redundancy and reduced data inconsistency.
In DBMS there is no redundancy and no
In a file management system, the
inconsistency.
redundancy of data is greater.
Data redundancy is defined as the The redundancy of data is low in the DBMS or
storing of the same data in multiple less data redundancy in DBMS
locations.
Due to centralization of database, the problem
Data redundancy means the same
of data redundancy as well as inconsistency
piece of information may be exists
is controlled. Data inconsistency is low in a
in many different files.
database management system.
9.
15
The file system offers lesser Database Management System offers high
security. security.
Data is isolated in file system. Data can be shared (multi-user can access
Isolated data (data stored in data from a single machine). DBMS provide
standalone machine) multiple user interfaces
13. There is no efficient query You can easily query data in a database
processing in the file system. using the SQL language
15. Multiple user access to the Allow multiple users to access the database
information is difficult to provide at the same time
16. It is cheaper to design It is relatively expensive to design
17. It has simple structure Its structure is relatively complex
It is very difficult to protect a file DBMS provides a good protection
18.
under the file system mechanism.
.
17
Data model means to model the data i.e., to give a shape to the data and to give a figure
to the stored data. A data model makes it easier to understand the meaning of the data
by its figure.
In simple words, we can define data model as “a collection of high-level data description
that hide many low-level storage details. A data model can also be defined as a collection
of conceptual tools for describing data, data relationships and consistency constraints”. A
DBMS allows a user to define the stored data in terms of data model.
A Data Model in Database Management System (DBMS), is the concept of tools that are
developed to summarize the description of the database. A database model is a
specification describing how a database is structured and used. A data model can
sometimes be referred to as a data structure, especially in the context of programming
languages.
There are various types of data model but the relational model is the most widely used
model.
1. Hierarchical Model
2. Network Model
3. Relational Model
4. Object-Oriented Model
5. Object-Relational Model
6. Entity-Relationship Model
18
1. Hierarchical Model
Hierarchical Model was the first DBMS model and one of the oldest Database Model The
general shape of this model is like an Organizational chart (Example-2). A node on the
chart represents a particular entity. The terms parent and child are used in describing a
hierarchical model. This model organizes the data in the hierarchical tree structure. The
hierarchy starts from the root which has root data and then it expands in the form of a
tree adding child node to the parent node.
This model used the tree as its basic structure. A tree is a data structure that consists of
a hierarchy of nodes, with a single node called the root, at the highest level. A node
represents a particular entity. A node may have any number of children, but each child
node may have only one parent node. This kind of structure is often referred to as “Inverted
Tree” (upwards to downwards). In this model parent-to-child creates one-to-many
relationship, but the child–to-parent creates one-to-one relationship.
In a hierarchical data model, records are arranged in a top-down structure. The nodes of
the tree represent data records. The relationships are represented as links or pointers
between nodes. Example: To locate a particular record in a hierarchical database, you
have to start at the top of the tree with a parent record and trace down the tree to the child.
Syntax
19
Hierarchical database in DBMS that represent data in a tree-like form. The relationship
between records is one-to-many. That means, one parent node can have many child
nodes. A hierarchical database model is a data model where data is stored as records but
linked in a tree-like structure with the help of a parent and child level. Each record has only
one parent. The first record of the data model is a root record.
In the above example, college is the root node here the root node has two children. The
root record is always on level 0 and is the first element to be traversed in the data model.
The next level children of the root record are Level 1 and have root as their parent. The
next level is Level 2 and so on.
.
20
Example:
The hierarchical model is based on the parent-child hierarchical relationship. In this model,
there is one parent entity with several children entity. At the top, there should be only one
entity which is called root.
For example: an organization is the parent entity called root and it has several children
entities like clerk, officer, and many more.
2. Parent-Child Relationship: Each child node has a parent node but a parent
node can have more than one child node. Multiple parents are not allowed.
4. Pointers: Pointers are used to link the parent node with the child node and are
used to navigate between the stored data. Example: In the example-2 the
'Department' node points to the three other nodes 'course' node and 'Faculty’ and
’Student' node.
2. Network Model
This model is an extension of the hierarchical model. It was the most popular model
before the relational model. This model is the same as the hierarchical model; the only
difference is that a record can have more than one parent. The network model was
developed to overcome the limited scope of hierarchical model. It replaces the
hierarchical tree with a graph.
22
In Network Model, multiple parent-child relationships are used. The network model uses
a network structure, which is a data structure of nodes and branches.
In this model, there is no difference between parent and child nodes as in the hierarchical
model. Each node may be related to more than one node. In this model, directed graphs
are used instead of tree structure to represent the structure of database.
The main difference of Network model and hierarchical model is that a network model
permits a child node to have more than one parent nodes, whereas hierarchical model
dos not allows a child node to have multiple parent nodes.
Example-1:
Example-2:
23
The network model for ‘UNIVERSITY’ system is shown above figure, the Mathematics
Department node is associated with ‘Computer Department’ node. Similarly ‘Computer
Lab; and ‘Library’ nodes are associated with both the ‘Mathematics Department’ and
‘Computer Department’ nodes.
2. Many paths: As there are more relationships so there can be more than one path
to the same record. This makes data access fast and simple.
3. Circular Linked List: The operations on the network model are done with the help
of the circular linked list. The current position is maintained with the help of a
program and this position navigates through the records according to the
relationship.
• The data can be accessed faster as compared to the hierarchical model. This is
because the data is more related in the network model and there can be more
than one path to reach a particular node. So the data can be accessed in many
ways.
• As more and more relationships need to be handled the system might get
complex. So, a user must be having detailed knowledge of the model to work with
the model.
3. Relational Model
Relational Model is the most widely used model. In this model, the data is maintained in
the form of a two-dimensional table. All the information is stored in the form of rows and
columns. The basic structure of a relational model is tables. So, the tables are also
called relations in the relational model.
The most popular data model in DBMS is the relational model. This model was
initially described by. E.F. (Edgar Frank) Codd, in 1970. The relational data model is the
widely used model which is primarily used by commercial data processing applications.
The relational model is considered as one of the most popular developments in the
database technology.
In relational model, data is organized in terms of rows and columns in a table known as
relation. Each table consists of rows also known as tuples. A tuple represents a collection
of information that describes a person, place or thing for example student roll number,
student name, student course etc., and columns also known as attributes. An attribute
25
represents the characteristics of a person, place or thing, for example Salary attribute in
a given below example.
The number of tuples in a relation determines its cardinality and the number of attributes
in a relation determines its degree.
The relational database relates or connects data in different tables through the use of a
common field or attribute.
The most popular and extensively used data model is the relational data model. The data
model allows the data to be stored in tables called a relation. The relations are normalized
and the normalized relation values are known as atomic values. Each of the rows in a
relation is called tuples which contains the unique value. The attributes are the values in
each of the columns which are of the same domain.
26
Popular examples of standard relational databases include Microsoft SQL Server, Oracle
Database, MySQL and IBM DB2.
.
27
• Tuples: Each row in the table is called tuple. A row contains all the information
about any instance of the object. In the above example, each row has all the
information about any specific individual like the first row has information about
John.
• Attribute or field: Attributes are the property which defines the table or relation.
The values of the attribute should be from the same domain. In the above
example, we have different attributes of the employee like Salary, Mobile_no, etc.
• Simple: This model is more simple as compared to the network and hierarchical
model.
• Scalable: This model can be easily scaled as we can add as many rows and
columns we want.
• Hardware Overheads: For hiding the complexities and making things easier for
the user this model requires more powerful hardware computers and data storage
devices.
• Bad Design: As the relational model is very easy to design and use. So the users
don't need to know how the data is stored in order to access it. This ease of design
can lead to the development of a poor database which would slow down if the
database grows.
28
4. Object-Oriented Model
The only way in which one object can access the data of another object is by invoking a
method of that other object. This action is called sending a message to the object. An
object-oriented data model is one of the most developed data models which contain video,
graphical files, and audio.
In this model, both the data and relationship are present in a single structure known as
an object. We can store audio, video, images, etc in the database which was not possible
in the relational model. Although you can store audio and video in relational database, it
is advised not to store in the relational database. In this model, two or more objects are
connected through links. We use this link to relate one object to other objects. This can
be understood by the example given below.
29
In the above example, we have two objects Employee and Department. All the data and
relationships of each object are contained as a single unit. The attributes like Name,
Job_title of the employee and the methods which will be performed by that object are
stored as a single object. The two objects are connected through a common attribute i.e
the Department_id and the communication between these two will be done with the help
of this common id.
5. Object-Relational Model
An Object relational model is a combination of a Object oriented model and a Relational
model. So, it supports objects, classes, inheritance etc. just like Object Oriented models
and has support for data types, tabular structures etc. like Relational data model.
One of the major goals of Object relational data model is to close the gap between
relational database and the object oriented database frequently used in many
programming languages such as C++, C#, Java etc.
30
We can have many advanced features like we can make complex data types according to
our requirements using the existing data types. The problem with this model is that this
can get complex and difficult to handle.
6. Entity-Relationship Model
Entity-Relationship Model or simply ER Model is a high-level data model diagram. In this
model, we represent the real-world problem in the pictorial form to make it easy to
understand. It is also very easy for the developers to understand the system by just
looking at the ER diagram. We use the ER diagram as a visual tool to represent an ER
Model.
1. Entity
2. Attribute
3. Relationship
31
Syntax:
Example:
In the above diagram, the entities are Teacher and Department. The attributes
of Teacher entity are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The
attributes of entity Department entity are Dept_id, Dept_name. The two entities are
connected using the relationship. Here, each teacher works for a department.
Features of ER Model
32
• This model helps the database designers to build the database and is widely used
in database design.
Advantages of ER Model
• Easy Conversion to any Model: This model maps well to the relational model
and can be easily converted relational model by converting the ER model to the
table. This model can also be converted to any other model like network model,
hierarchical model etc.
Disadvantages of ER Model
.
Levels of Abstraction in a DBMS:
There are three levels of data abstraction in DBMS which reduce the complexity of
the database. They are
Physical Schema or Internal level is the lowest level of abstraction in the DBMS which
describes how the data is actually stored in the database and it also describes complex low
level data structure and access methods used by database. This internal level deals with
the storage of the data for the whole database system. This is the first or lowest level of
abstraction which describes how a record is actually stored in the system memory. It
is a low-level representation of the database
The Internal level of abstraction actually contains the database storage files and binary
files which is the actual storage of the database system. It depends on the hardware and
OS of the system
35
The entire database is described in this level that is internal level. It is a very complex level
to understand. For example, customer's information is stored in tables and data is stored
in the form of blocks of storage such as bytes, KB’s, Megabytes, Gigabytes etc.
Database developer will decide how the data is to be stored in the database. It is really
complex to understand. If we want to have indices to be created above the data that also
will be decided by the database application programmer. The entire database is described
over here in detail in this level. This is the lowest level of data abstraction. It describes how
data is actually stored in database. You can get the complex data structure details at this
level.
2. Logical Level:
Logical level or Conceptual Schema is the intermediate level, next in higher level and
also known as conceptual level which describes what data is stored and reveals the
relationships that exists among the stored data.
It tried to describe the entire data. It means what tables to be created and what are the links
are between these tables are mentioned in this level. This is less complicated than the
physical level. Little bit of complexity over here as well. But, it is not that much like at the
physical level. This level is used by database administrators or developer.
In short, the logical level contains fields and attributes along with the datatypes and the
relationships among the attributes which can be logically implemented.
Example: Let us take an example where we use the relational model for storing the
data. We have to store the data of a student, the columns in the student table will be
student_name, age, mail_id, roll_no etc. We have to define all these at this level while
we are creating the database. Though the data is stored in the database but the structure
36
of the tables like the Student table or Employee table etc are defined here in the
conceptual level or logical level. Also, how the tables are related to each other is defined
here.
It is less complex than the physical level. So, overall, the logical level contains tables (fields
and attributes) and relationships among table attributes.
Example: Take the example of the university database. We need to store data about
Faculty and students.
Here we define the structure of the database and relationships among the data.
3. View Level:
View level or External Schema: When coming to the third level that is the view level and
it is the highest level. This is also called the external level. There are different levels of views
in view level and each view defines only a part of a entire data. It also interacts with the
user since it provides the different views of the same database. It also provides multiple
views for the same database. View level can be used by all the users of the database. This
level is the least complex in all of these three levels and easy to understand. View level is
the highest level of data abstraction. This level describes the user interaction with database
system. This level tells the application about how the data should be shown to the user.
Example: If the student has a login-id and password in a university system, then as a
student, he can view his marks, attendance, fee structure, etc. But the faculty of the
37
university will have a different view. He will have options like salary, edit marks of a
student, enter attendance of the students, etc. So, both the student and the faculty have
a different view.
By doing so, the security of the system also increases. In this example, the student can't
edit his marks but the faculty who is authorized to edit the marks can edit the student's
marks.
Similarly, the dean of the college will have some more authorization and he will access
his view. So, different users will have a different view according to the authorization they
have.
Data independence
Data independence means capacity to change schema of one level of the database system
without having to change the schema at the next higher level or Data independence is the
ability to modify the schema (design of a database) at one level of the database system (i.e.,
internal schema) without affecting a schema in the next higher level (i.e., external schema).
One of the highest advantages of database is data independence. It means that we can
change the logical level or conceptual schema without affecting the data at another level.
It also means that we can also change the structure of a database without affecting the
38
data required by users and programs. This feature was not available in the file oriented
approach.
Physical data independence is the ability to change the physical schema or internal
schema without causing application programs to be rewritten. Modifications at the physical
level are occasionally necessary to improve performance. It means we change the physical
storage without affecting the logical or view level of the data.
Or
The ability to change the physical schema or internal schema without changing the logical
schema is called physical data independence.
For example:
A change to the internal level, such as using different storage devices should be possible
without having to change the logical level or view level.
The ability to change the logical level or conceptual schema without changing the View
level or External schema is called logical data independence.
Logical data independence is more difficult to achieve than the physical data
independence because the application programs are always dependent on the logical
structure of the database. Since application programs are heavily dependent on the logical
structure of the data that they access.
Logical Data independence means if we add new columns or remove columns from table
then the user view and programs should not change.
Logical data independence is the ability to modify the logical level or conceptual schema
without causing application programs to be rewritten. Modifications at the logical level or
conceptual schema are necessary whenever the logical structure of the database is
altered.
For example: Consider two users A & B. Both are selecting the fields "Employee
Number" and "Employee Name". If user B adds a new column (e.g. salary) to his table, it
will not affect the external view for user A, though the internal schema of the database has
been changed for both users A & B.
Physical database is
Application program need not be changed if new concerned with the change
4.
fields are added or deleted from the database. of the storage device
9. Example: change in
compression techniques,
Example: Add/Modify/Delete a new attribute
hashing algorithms, storage
devices, etc
Structure of a DBMS
42
System structure
Structure of a DBMS:
43
DBMS (Database Management System) acts as an interface between the user and the
database. The user requests the DBMS to perform various operations such as insert,
delete, update and access on the database.
The components of DBMS perform these requested operations on the database and
provide necessary data to the users.
The Structure of DBMS can be classified into four components. They are:
1. DBMS Users
2. Query processor
3. Storage manager
4. Disk storage
1. DBMS users:
.
a) Naïve users
b) Application programmers
c) Sophisticated users
d) Database administrator
Naive Users are unsophisticated users who interact with the system by using permanent
application programs.
For example: A bank teller who needs to transfer Rs. 5000 from account A to account
B invokes a program called transfer. This program asks the teller for the amount of money
to be transferred, the account from which the money is to be transferred, and the account
to which the money is to be transferred.
b) Application programmers:
Application programmers are computer professionals who write application programs.
Application programmers can choose software tools to develop user interfaces.
Application programmers are the developers who interact with the database by means
of DML queries. These DML queries are written in the application programs like C,
C++, JAVA, Pascal, etc.
• Games
45
• Accounting software
• Graphics software
• Media players….. etc
c) Sophisticated users:
Sophisticated users interact with the system without writing program. Instead, they form
their requests in a database query language. They submit each such query to a query
processor, whose function is to break down DML statements into instructions that the
storage manager understands. Analysts who submit queries to explore data in the
database fall in this category.
2. Oracle RDBMS:
d) Database administrator:
2. Query processor:
46
• Games
• Accounting software
• Graphics software
• Media players
Object code:
47
Object code is a set of instruction codes that is understood by a computer at the lowest hardware
level. Object code is usually produced by a compiler that reads some higher level computer
language source instructions and translates them into equivalent machine language instructions.
Linker: The linker is a program in a system which helps to link a object modules of
program into a single object file. A linker combines these object code files into an executable.
It performs the process of linking. Linkers are also called link editors. Linking is process of
collecting and maintaining piece of code and data into a single file.
3. DML queries:
Data Manipulation Language queries which are used to manipulate data itself. DML
commands are used to modify or manipulate data records present in the database tables.
Some of the basic DML operations are data insert (INSERT), data updation (UPDATE), data
removal (DELETE) and data querying (SELECT).
The following are the DML commands which can be used for DML queries:
3. UPDATE: Command to change or update the present/existing data to a newer value inside
the database
4. DELETE: Command to remove or delete the values or data information from the database’s
current table
48
DML compiler: DML (Data Manipulation Language) compiler translates the DML
statements which are in a query language into the low-level instructions which the query
evaluation engine understands easily.
Linkers also link a particular module into system library. It takes object modules from assembler
as input and forms an executable file as output for loader.
5. DDL interpreter:
he DDL interpreter interprets DDL statements and records the definition in the data
dictionary. The DML compiler translates DML statements in a query language into an evaluation
plan consisting of low-level instructions that the query evaluation engine understands.
Query evaluation engine, which executes low-level instructions generated by the DML
compiler. It interprets the requests (queries) received from end user via an application
program into instructions. It also executes the user request which is received from the
DML.
3. Storage Manager:
.
49
The storage manager is important because databases typically require a large amount of
storage space.. A storage manager is a program module that provides the interface
between the low level data stored in the database and the application programs and
queries submitted to the system. The storage manager is responsible for the interaction
with the file manager. The raw data are stored on the disk using the file system, which is
usually provided by a conventional operating system. The storage manager translates the
various DML statements into low-level file-system commands.
1. Buffer Manager
2. File Manager
4. Transaction Manger
1. Buffer Manager: It is responsible for cache memory and the transfer of data
Buffer manager, which is responsible for fetching data from disk storage into main
memory, and deciding what data to cache in main memory. The buffer manager is a critical
part of the database system, since it enables the database to handle data sizes that are
much larger than the size of main memory.
It is responsible for cache memory and the transfer of data between the secondary
storage and main memory.
2. File manager:
50
It manages the allocation of space on disk storage and the data structures used to
represent information stored on disk. The Buffer manager is responsible for fetching the
data from disk storage into main memory and deciding what data to cache in
main memory. It manages the file space and the data structure used to represent
information in the database.
File manager, which manages the allocation of space on disk storage and the data
structures used to represent information stored on disk.
4. Transaction Manager:
Transaction manager, which ensures that the database remains in a consistent (correct)
state despite system failures, and that concurrent transaction executions proceed without
conflicting.
4. Disk storage:
51
The disk storage components such as data, data indices, data dictionary and statistical
data.
1. Data dictionary:
It contains all the information about the database. As the name suggests, it is
the dictionary of all the data items. It contains a description of all
the tables, view, materialized views, constraints, indexes, triggers, etc. It contains the
information about the structure of any database object.
It is the repository of information that governs the metadata.
2. Indices:
It provides faster retrieval of data item.
3. Data files:
It stores the data. It has the real data stored in it. It can be stored as magnetic tapes,
magnetic disks, or optical disks.
4. Statistical data:
Statistical data as a measurement, such as a person's height, weight, IQ, or blood
pressure; or they're a count, such as the number of stock shares a person owns, how many
teeth a dog has, or how many pages you can read of your favorite book before you fall
asleep.
52
Database Design:
Database design is the process of constructing a stable database structure from user
requirements analysis. The database design is considered to be the most important task
while following database approach for a reality. The database design structure the
grouping the fields into different files and then establishes meaningful associations
between different files in an optimal manner which helps to minimize the response time
while accessing and manipulating the database during its use. To accomplish this, one
should look at the user requirements and find the correct means of logically representing
them. Once the basic data needs are identified, the conceptual data model (logical level)
can be created.
❖ Repetition of information
❖ Inability to represent certain information
.
53
The database design process can be divided into six steps. They are
1. Requirement Analysis
2. Conceptual Design
3. Logical Design
4. Schema Refinement
5. Physical Design
6. Security Design.
1. Requirement Analysis:
The very first step in designing a database application is to understand what data is to be
stored in the database, what applications must be built on top of it, and what operations
are most frequent and leads to performance requirement.
This step is carried out using the Entity-Relationship (ER) model. The goal of this step is
to create a simple description of the data that closely matches how users and developers
think of the data. This facilitates discussion among all the people involved in the design
process, even those who have no technical background. At the same time the initial design
must be sufficiently precise to enable a straightforward translation into a data model
supported by a commercial database system that is relational model.
In this step, we must choose a DBMS to implement our database design and convert the
conceptual database design into a database schema in the data model of the chosen
DBMS. We will consider only relational DBMS and therefore, the task in the logical design
step is to convert an ER schema into relational database schema.
54
4. Schema Refinement:
The forth step in database design is to analyze the collection of relations in relational
database schema to identify potential problems and to refine it.
6. Security Design:
ER Diagrams contain different symbols that use rectangles to represent entities, ovals to
define attributes and diamond shapes to represent relationships.
At first look, an ER diagram looks very similar to the flowchart. However, ER Diagram
includes many specialized symbols, and its meanings make this model unique. The
55
ER Diagram Examples
For example, in a University database, we might have entities for Students, Courses, and
Lecturers. Students entity can have attributes like Rollno, Name, and DeptID. They might
have relationships with Courses and Lecturers.
Entities:
An entity is a real-world objects that can be identify from all other objects. In ER diagram
an entity is represented by a rectangle that can be denoted as entity box. The name of
the entity is written in the centre of rectangle, whenever ER diagram is applied to relational
model, an entity is mapped to relational table where in each row represents an entity
instance.
56
Example:
Example: Professors, Students, Courses, Departments, etc are some of the entities of
a College Management System.
For example, each person in an enterprise is an entity. A entity has a set of properties
and the values for some set of properties may uniquely identify an entity.
For example, a customer with customer-id property with value 101 uniquely identifies that
person.
Examples of entity:
For example, in a College database, the entities can be Professors, Students, Courses, etc.
Entities has attributes, which can be considered as properties describing it, for example, for
57
Example of Entities:
A university may have some departments. All these departments employ various lecturers
and offer several programs.
Some courses make up each program. Students register in a particular program and enroll
in various courses. A lecturer from the specific department takes each course, and each
lecturer teaches a various group of students.
Example: Professors, Students, Courses, Buildings, Departments, etc are some of the
entities of a College Management System.
Attributes:
An entity is represented by a set of attributes. Attributes are descriptive properties an
entity. An attribute is defined as a property that describes the characteristics feature of
a particular entity. It can also be defined as the qualifier that provides additional
information about the entity. Generally, an attribute is an atomic value or unit of
information associated helps in uniquely defining an entity.
Attributes are the properties which define the entity type. For example, Roll_No, Name, DOB,
Age, Address, Mobile_No are the attributes which defines entity type Student. In ER diagram,
attribute is represented by an oval.
58
For each attribute associated with an entity set, we must identify a domain of possible
values. For example, the domain associated with the attribute name of ‘Employee’ might
be the set of 20 character strings. Another Example, ‘Employee number’ associated
domain consists of integers 1 through 10.
Example:
Types of Attributes:
S.No. Types of Attributes Description
A simple attribute is an attribute
composed of a single component
with an independent existence.
Simple attributes cannot be
1. Simple attribute or Key attribute further subdivided. Examples of
simple attributes are Roll-no,
Age.., etc..
Simple attributes can’t be divided
any further. For example, a
59
An attribute composed of
multiple components each with
an independent existence is
called a composite attribute.
Example of composite attributes
are
1. Name: which is composed of
attributes such as First name,
Middle name and Last name
2. Address: which is composed
of other components such as
street,city and pincode.
Entity Sets:
An entity set is a set of entities of the same type that share the same properties or
attributes. The set of all customers at a given bank can be defined as the entity set
customer.
An entity set is defined as a group of entities that have similar types or attributes. For
example, Employees working in an organization are defined as entities E1,E2,E3,E4,….
which may contain similar attributes defined under a specific entity type called ‘Employee’.
Here, the group of entities i.e., { E1,E2,E3,E4,E5,…..} is referred to as an entity set.
If an entity set contains enough attributes for creating a primary key, then it is termed as
‘Strong entity set’. On the other hand, if an entity set does not contain enough attributes
for creating a primary key then it is termed as ‘Weak entity set’.
An entity set is a group of similar kind of entities. All rows of a relation (table) in RDBMS
is entity set. For example, a student name, and student ID describes the ‘Student’ entity,
A set of the same type of entities is known as an ‘Entity set’.
The entity can be divided into two: a)Strong Entity b)Weak Entity
Strong entity set always has a primary It does not have enough attributes to build a
key. primary key.
It is represented by a rectangle symbol. It is represented by a double rectangle symbol.
It contains a Primary key represented by It contains a Partial Key which is represented
the underline symbol. by a dashed underline symbol.
The member of a strong entity set is The member of a weak entity set called as a
called as dominant entity set. subordinate entity set.
In a weak entity set, it is a combination of
Primary Key is one of its attributes which
primary key and partial key of the strong entity
helps to identify its member.
set.
In the ER diagram the relationship The relationship between one strong and a
between two strong entity set shown by weak entity set shown by using the double
using a diamond symbol. diamond symbol.
The connecting line of the strong entity The line connecting the weak entity set for
set with the relationship is single. identifying relationship is double.
In the ER diagram the relationship The relationship between one strong and a
between two strong entity set shown by weak entity set shown by using the double
using a diamond symbol. diamond symbol.
Relationships:
A relationship defines an association among two or more entities. Consider two entities
such as Student and a Class. These two entities can be associated as Student “studies”
in a class. Here, “studies” is a relationship between the two entities such as student and
class. Similarly, Student “Enrolled” in a course. Here, “Enrolled” is a relationship between
the two entities such as Student and Course.
Example:
64
Logical association among entities is called relationship. Relationship tells how two
attributes are related. Example: Professor works for a department i.e.,
Relationships among entities.
Relationship is nothing but an association among two or more entities. E.g., Tom works
in the Chemistry department. Entities take part in relationships. We can often identify
relationships with verbs or verb phrases.
For example:
• You are attending this lecture
• I am giving the lecture
• Just look entities, we can classify relationships according to relationship-types:
• A student attends a lecture
• A lecturer is giving a lecture.
Example:
In the above diagram, the entities are Teacher and Department. The attributes
of Teacher entity are Teacher_Name, Teacher_id, Age, Salary, Mobile_Number. The
65
attributes of entity Department entity are Dept_id, Dept_name. The two entities are
connected using the relationship. Here, each teacher works for a department.
Relationship sets:
A relationship set is a set of relationships of the same type. As with entities, we may wish
to collect a set of similar relationships into a relationship set. A relationship set can be
thought of as set of n-tuples:
{(e1….en)}
Each n-tuple denotes a relationship involving, n entities e1 through en, where entity e1 is
in entity set Ei.
The set of the same type of relationships is known as 'relationship set'. A relationship set
is a set of relationships of same type.
Features of ER Diagrams:
• Database Design: This model helps the database designers to build the database
and is widely used in database design.
Advantages of ER Diagrams:
Disadvantages of ER Diagrams:
1. Key Constraints
2. Participant Constraints
3. Weak Entities
4. Class Hierarchies
5. Aggregation
.
67
1. Key Constraint:
Keys Constraints are rules that define what data values are allowed in certain data
columns. They are an important database concept and are part of a database's schema
definition.
Certain restrictions must be laid on the level of association that an entity has with a
relationship. These restrictions are called key constraint.
Constraints or the rules that are to be followed while entering data into columns of the database
table. Constraints ensure that data entered by the user into columns must be within the criteria
specified by the condition For example, if you want to maintain only unique IDs in the employee
table or if you want to enter only age under 18 in the student table etc
Consider a relationship set called manages between the Employees and Departments
entity sets such that each department has at most one manager, although a single
68
Employee is allowed to manage more than one Department. The restriction that each
department has at most one manager is an example of a key constraint, and it implies that
each Department entity appears in at most one manager relationship in any allowable
instance of manages. This restriction is indicated in the ER diagram by using an arrow
from Department to Manages.
Example:
A student can study in almost one college at a time. This restriction is called a key
constraint.
2. Participant constraints:
Entities can participate in a relationship either totally or partially. Participant constraints
can be classified into two types.
a) Total Participation
b) Partial Participation
a) Total Participation:
The participation of an entity set E in a relationship set R is said to be Total if every entity
in E participates in at least one relationship in R. If every entity in an entity set, participates
in a relationship of relationship set, then the participation is said to Total Participant else
Partial Participant.
69
For example, all the student will study in the college but only few of them participate in
other activities such as Games.
Diagram:
b) Partial Participation:
If only some entities in E participate in relationships R, the participation of entity set E in
relationship R is said to be partial.
Diagram:
3. Weak Entities:
Weak entity is the one that depends on other entities for existence.
If the existence of entity ‘x’ depends on the existence of entity ‘y’ then ‘x’ is said to be
existence dependent on ‘y’. If ‘y’ is deleted then there is no existence of ‘x’. Entity ‘y’ is
said to be a dominant entity, and ‘x’ is said to be sub-ordinate entity.
An entity like order item is a good example for this. The order item will be meaningless
without an order so it depends on the existence of order.
A weak entity cannot be identified uniquely as it does not have sufficient entities to form a
primary key. It can be made uniquely unidentifiable, by associating it with a another entity
set called identifying or owner entity set. The owner entity set “OWN” the weak entity set.
The relationship among the two entity sets is called identifying relationship. The identifying
relationship is always many-to-one and its participation is Total. Weak entity set is not
provided with a primary key.
Example:
Consider the entity set ‘Loan’ and the entity set ‘Payment’ that keeps information about all
the payments that were made in connection
4. Class Hierarchies:
71
Class Hierarchy is a method of classifying the entities into sub-classes i.e., entities can be derived
from the parent class. The entities that represent the subclasses can inherit the attributes of
parent class entity and even can have their own entities.
For example:
Consider a “Person” entity set as the parent entity with attributes name, address, and age. The
two sub-classes of this entity set are “Student” and “Faculty”. The attributes of “Student” include
attributes of “Person” plus Course and the attributes of “Faculty” include attributes of “Person”
plus Lecture.
Therefore, it can be said that the attributes of “Person” are inherited by “Student” and “Faculty”
and that both these sub-classes are “ISA” person. It is even possible to classify the entity set
“Person” based on different criterion like senior_person simply by adding a second “ISA” node to
the “Person” entity set.
The subclass-superclass relationship is an inheritance and are often called “ISA” relationship
because a member of the subclass “ISA” member of the superclass. The relationship between
superclass and subclass is represented using Class Hierarchies.
The following are the two ways of representing a class hierarchy. They are
1. Specialization
2. Generalization
1. Specialization:
Specialization is the process of designating sub groupings within an entity set. Specialization is
a top-down process. All the entities within a entity set do not share all the attributes.
For example:
In a college database, faculty and student entities have same common attributes like name, street
and city. In addition, they have extra attributes like faculty has empid, salary and student has
studentid, marks etc.
72
An entity person can be defined with attributes name, street and city which can further be
subdivided into student and faculty. This subgrouping is known as specialization.
It is a process of identifying the subsets of an entity set each of which has different characteristic
features. In this process, the superclass is defined followed by the subclasses definition. After
defining the superclass of subclasses, the attributes relationship associated with these
subclasses are defined.
For example:
73
In specialization all the entities within an entity set do not share all the attributes.
An Entity person can be defined with attributes name, street and city which can further be
subdivided into student and faculty. This subgrouping is known as specialization. It is a process
of identifying the subsets of an entity set each of which have different characteristics features.
Thus specialization is the process of defining a set of subclasses of an entity type. This entity
type is also termed as the superclass of specialization. The set of subclasses forming a
specialization is on the basis of some distinguishing characteristics of the entities in the
superclass.
For example:
In a college database faculty and student entities have same common attributes like name, street
and city. In addition, they have extra attributes like faculty has emp-id, salary and student has
student-id, marks etc….
2. Generalization:
Generalization is a special case of specialization. The design approach may be top-down or
bottom-up. In top-down, the entities are identified and are subdivided. In contrast, in bottom-up
approach all the low-level entities are grouped to form a high level entity.
For example:
Designer may first identify attributes of Student, Faculty and then group of common attributes into
a higher entity, this is known as Generalization. The high-level entity is called superclass and
low-level entity is called a subclass.
74
In this example, Person entity is a super class of Employee , Customer subclass entities. We
can say that attributes of Person entity have been inherited by Emplioyee, Customer entities
and that Employee ISA person.
75
Example:
Example:
.
76
5. Aggregation
It is an abstraction in which relationship sets are treated as higher level entity sets and can
participate in relationships. Aggregation allows us to indicate that a relationship set participates in
another relationship set.
Aggregation is used to simplify the details of a given database where ternary relationships will be
changed into binary relationships. Ternary relation is only one type of relationship which is working
between three entities.
.
77
The main difference between Entity and Attribute is that an entity is a real-world object that
represents data in RDBMS while an attribute is a property that describes an entity.
Relational Database Management System (RDBMS) is a type of database management system
based on the relational model. It helps to store and manage data efficiently to access them easily.
RDBMS stores data in tables or relations. Each table consists of columns and rows. Before
creating a database, it is essential to design a database. An ER diagram helps to accomplish that
task. Entity and Attribute are two concepts related to ER diagrams.
Entity and attribute are the most common terms of DBMS. The fundamental difference between
the entity and attribute is that an entity is an object that exists in a real-world and can be easily
distinguished among all other objects of real-world whereas, the attributes define the
characteristics or the properties of an entity on the basis of which it is easily distinguishable
among other entities of the real-world.
In the relational database, we collect the data in the form of a table. So, the rows of a table
represent the entities of the same type and the columns of a table are considered as attributes of
the entities present in that table.
78
The main difference between entity and relationship in DBMS is that the entity is a real-world
object while the relationship is an association between the entities. Also, in the ER diagram,
a rectangle represents an entity while a rhombus or diamond represents a relationship.
An entity is a table in DBMS, and it represents a real-world object. These entities are connected
to each other using relationships.
A Database Management System (DBMS) is a software program that stores, retrieves and
manipulates data in the databases. A DBMS contains multiple databases, and each database
consists of multiple tables. The tables are related to each other using relationships. An entity is a
table in DBMS, and it represents a real-world object. These entities are connected to each other
using relationships.
.
79
A ternary relationship is an association among three entities. The ternary relationship construct
is a single diamond connected to three entities. Sometimes a relationship is mistakenly modeled
as ternary when it could be decomposed into two or three equivalent binary relationships.