DBMS Unit-1
DBMS Unit-1
Data:
Facts that can be recorded.
It is raw or unprocessed form.
You cannot take decision based on data.
Data has no meaning.
Example(s):
Text, numbers, images, videos etc….
25, Karthik, Karimnagar.
Processing
Information
Example(s):
1
Database (db): Collection of similar or related data or
The related information when placed in an organized form makes a database or
organized collection of related information is known as database.
Example:
Various types of databases can be Traditional databases (text & numbers), Multimedia
databases (audio, video, movies, speech), Geographically Information System (satellite
images), Real time databases (store).
Data ware house: Data ware house is a kind of database in which the data is going to be
very huge and historical.
Example: Data stored about past 100 years of a company .stock market rates.
***Good decisions require good information that is derived from raw facts (Data).
Database Management systems: It is the software system that allows the user to define
and maintain the database and provide control access to the database or
DBMS is a collection of interrelated data and a set of programs to access data. The
primary goal of a DBMS is to provide a way to store and retrieve database information
that is both convenient and efficient.
2
Summary:
Functionalities:
1. Define: Specifying the data type, structure and constraints for the data to be
stored.
2. Construct: Process of storing data on some storage medium.
3. Manipulate: Querying the database to retrieve specific data, updating database
and generating reports.
4. Share: Allows multiple users and programs to access the database concurrently.
3
Database System Application:
A Historical Perspective:
4
1960s:
Charles Bachman designed first general – purpose DBMS, was called integrated
data store.
IDS formed the basis for the network data model.
Late 1960s:
1970:
1980s:
The relational model consolidated its position as the dominant DBMS paradigm,
and database systems continued to gain widespread use,
The SQL query language for relational databases, developed as part of IBM’s
project, is now the standard query language.
Several vendors (e.g IBMS’s DB2, Oracle 8, Informix) extended their systems
with the ability to store new data types such as images, videos and to ask more
complex queries.
Specialized systems have been developed by numerous vendors for creating data
warehouses.
Present era:
DBMS are entered the Internet age. While the first generation of websites stored
their data in operating system files, the use of a database accessed through a web
browser is becoming wide spread.
Queries are generated through web-accessible forms and answers are formatted
using a markup language such as HTML.
5
File Systems vs a DBMS
1. Data Redundancy:
There are no methods to validate the insertion of duplicate data in the system. Any user
can enter any data. File system doesn’t validate for the kind of data being entered nor
doesn’t validate for previous existence of the same data in the same file. Duplicate data in
the system is not appreciated as it is a waste of space and always lead to confusion and
misleading of data. When there are duplicate data in the file and if we need to update or
delete the record, we might end up in updating / deleting one of the records leaving the
other record in the file.
2. Data inconsistency:
For Example student and student_report files have student’s address in it, and there was a
change request for one particular student’s address. The program search only student file
for the address and updated it correctly. There is another program which prints the
student’s report and mails it to the address mentioned in the student_report file. There is a
mismatch in the actual address and his report is sent to his old address. This mismatch in
different copies of same data is called data inconsistency.
3. Data Isolation:
Imagine we have to generate a single report of student, who is studying in particular
class, his study report, his library book details and hostel information. All these
information are stored in different files. How do we get all these details in one report?
We have to write a program. But before writing the program , the programmer should
find out which all files have the information needed, what is the format of each file, how
to search data in each file etc. Once all these analysis is done, he writes a program. If
there is 2-3 files involved, programming would be bit simple. Imagine if there is lot many
files involved in it? It would require lot of effort for the programmer. Since all the data
are isolated from each other in different files, programming becomes difficult.
4. Security:
Each file can be password protected. But what if we have to give access to only few
records in the file? For Example, user has to be given access to view only their bank
account details in the file. This is very difficult in the file system.
6
5. Integrity:
If we need to check for certain insertion criteria while entering the data in to file, it is not
possible directly. We can do it writing program, say, if we have to restrict the students
above age 18 then it is by means program alone. There is no direct checking facility in
the file system. Hence these of integrity checks are not easy in file system.
6. Atomicity:
If there is any failure to insert, update, or delete in the file system, there is no mechanism
to switch back to the previous state. Consider a program to transfer 500 from the account
of A to the account of B. If a system failure occur during the execution of the program, it
is possible that the 500 as removed from the account A but was not credited to the
account B, resulting in an inconsistent database state. It is essential to database
consistency that either both the credit and debit occur, or that neither occur i.e the funds
transfer must be atomic-it must happen in its entirely or not at all. It is difficult to ensure
atomicity in a file processing system.
7. Limited data sharing:
Data are scattered in various files also different files may have different formats and these
files may be stored in different folders may be of different departments. So, due to this, it
is difficult to share data among different applications.
8. Data Mapping and Access:
Although all the related information is grouped and stored in different files, there is no
mapping between any two files. i.e.; any two dependent files are not linked. Even though
Student files and Student_Report files are related, they are two different files and they are
not linked by any means. Hence if we need to display student details along with his
report, we cannot directly pick from those two files. We have to write a lengthy program
to search the Student file first, get all details, then go Student_Report file and search for
his report. When there is a very huge amount of data, it is always a time-consuming task
to search for particular information from the file system. It is always an inefficient
method to search for the data.
7
S.No File System DBMS
1 Suitable for small organization. Suitable for both small and large organization.
2 It can have unstructured data Can have structured data.
3 Storing and retrieving of data can't be done DBMS is efficient to use as there are a wide variety
efficiently in a file system. of methods to store and retrieve data.
4 The file system doesn't have a crash
DBMS provides a crash recovery mechanism
recovery mechanism.
5 Protecting a file system is very difficult. DBMS offers good protection mechanism.
6 In a file management system, the
The redundancy of data is low in the DBMS system.
redundancy of data is greater.
7 Data inconsistency is higher in the file Data inconsistency is low in a database management
system. system.
8 It doesn't offer backup and recovery of DBMS system provides backup and recovery of data
data if it is lost. even if it is lost.
9 There is no efficient query processing in You can easily query data in a database using the
the file system. SQL language.
10 This system doesn't offer concurrency. DBMS system provides a concurrency facility.
11 Data cannot be shared Data can be shared.
8
Data Models:
1. Hierarchical Model.
2. Network Model
3. Entity – Relationship Model
4. Relational Model
Hierarchical Model:
This database model organizes data into a tree-like-structure, with a single root, to which all the
other data is linked. The hierarchy starts from the Root data, and expands like a tree, adding
child nodes to the parent nodes. In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book, recipes etc.
It has one-to-many relationship between two different types of data, for example, one department
can have many courses, many professors and many students.
9
Network Model:
This is an extension of the Hierarchical model. In this model data is organized more like a graph,
and are allowed to have more than one parent node.
In this database model data is more related as more relationships are established in this database
model. Also, as the data is more related, hence accessing the data is also easier and fast. This
database model was used to map many-to-many data relationships.
E-R Models are defined to represent the relationships into pictorial form to make it easier for
different stakeholders to understand.
This model is good to design a database, which can then be turned into tables in relational model.
Let's take an example, If we have to design a School Database, then Student will be an entity
with attributes name, age, address etc. As Address is generally complex, it can be another
entity with attributes street name, pincode, city etc, and there will be a relationship between
them.
10
Relational Model:
In this model, data is organised in two-dimensional tables and the relationship is maintained by
storing a common field.
This model was introduced by E.F Codd in 1970, and since then it has been the most widely used
database model, infact, we can say the only database model used around the world.
The basic structure of data in the relational model is tables. All the information related to a
particular type is stored in rows of that table.
Hence, tables are also known as relations in relational model.
11
Levels of Abstraction in a DBMS
Data abstraction is a process of hiding the implementation details (such as how the data are
stored and maintained) and representing only the essential features to simplify user's interaction
with the system.
To simplify user's interaction with the system, the complexity is hidden from the database users
through several levels of abstraction.
View Level:
Example: If we have a login-id and password in a university system, then as a student, we can
view our marks, attendance, fee structure, etc. But the faculty of the university will have a
12
different view. He will have options like salary, edit marks of a student, enter attendance of the
students, etc. So, both the student and the faculty have a different view. By doing so, the security
of the system also increases. In this example, the student can't edit his marks but the faculty who
is authorized to edit the marks can edit the student's marks. Similarly, the dean of the college or
university will have some more authorization and accordingly, he will have his view. So,
different users will have a different view according to the authorization they have.
Example: Let us take an example where we use the relational model for storing the data. We
have to store the data of a student, the columns in the student table will be student_name, age,
mail_id, roll_no etc. We have to define all these at this level while we are creating the database.
Though the data is stored in the database but the structure of the tables like the student table,
teacher table, books table, etc are defined here in the conceptual level or logical level. Also, how
the tables are related to each other is defined here. Overall, we can say that we are creating a
blueprint of the data at the conceptual level.
Physical Level:
It tells the actual location of the data that is being stored by the user. The Database
Administrators (DBA) decides that which data should be kept at which particular disk drive, how
the data has to be fragmented, where it has to be stored etc. They decide if the data has to be
centralized or distributed. Though we see the data in the form of tables at view level the data
13
here is actually stored in the form of files only. It totally depends on the DBA, how he/she
manages the database at the physical level.
Data Independence
Data independence is ability to modify a schema definition in one level without affecting a
schema definition in the next higher level. or Capacity of changing the schema at one level
without effecting the another level.
Physical Data Independence refers to the characteristic of changing the physical level without
affecting the logical level or conceptual level. Using this property we can easily change the
storage device of the database without affecting the logical schema.
14
Examples of changes under Physical Data Independence
Due to Physical independence, any of the below change will not affect the conceptual layer.
It refers to the characteristics of changing the logical level without affecting the external or
view level. This also helps in separating the logical level from the view level. If we do any
changes in the logical level then the user view of the data remains unaffected. The changes in
the logical level are required whenever there is a change in the logical structure of the
database.
Due to Logical independence, any of the below change will not affect the external layer.
15
Structure of a DBMS
DBMS (Database Management System) acts as an interface between the user and the
database. The user requests the DBMS to perform various operations such as insert,
delete, update and retrieval on the database.
The components of DBMS perform these requested operations on the database and
provide necessary data to the users.
16
Naive users:
Any user who does not have any knowledge about database can be in this category. There task is
to just use the developed application and get the desired results. For example: Clerical staff in
any bank is a naive user. They don’t have any dbms knowledge but they still use the database
and perform their given task.
Application programmers:
As its name shows, application programmers are the one who writes application programs that
uses the database. These application programs are written in programming languages like PHP,
Java etc. These programs meet the user requirement and made according to user requirements.
Retrieving information, creating new information and changing existing information is done by
these application programs.
Sophisticated Users:
They are database developers, who write SQL queries to select/insert/delete/update data. They do
not use any application or programs to request the database. They directly interact with the
database by means of query language like SQL. These users will be scientists, engineers,
analysts who thoroughly study SQL and DBMS to apply the concepts in their requirement.
Database Administrators:
The life cycle of database starts from designing, implementing to administration of it. A database
for any kind of requirement needs to be designed perfectly so that it should work without any
issues. Once all the design is complete, it needs to be installed. Once this step is complete, users
start using the database. The database grows as the data grows in the database. When the
database becomes huge, its performance comes down. Also accessing the data from the database
becomes challenge. There will be unused memory in database, making the memory inevitably
huge. These administration and maintenance of database is taken care by database Administrator
(DBA).A DBA has many responsibilities. A good performing database is in the hands of DBA.
17
Installing and upgrading the DBMS Servers:
DBA is responsible for installing a new DBMS server for the new projects. He is also
responsible for upgrading these servers as there are new versions comes in the market or
requirement. If there is any failure in up gradation of the existing servers, he should be
able revert the new changes back to the older version, thus maintaining the DBMS
working.
Performance tuning:
Since database is huge and it will have lots of tables, data, constraints and indices, there
will be variations in the performance from time to time. Also, because of some designing
issues or data growth, the database will not work as expected. It is responsibility of the
DBA to tune the database performance. He is responsible to make sure all the queries and
programs works in fraction of seconds.
Sometimes, users using oracle would like to shift to SQL server or MySQL. It is the
responsibility of DBA to make sure that migration happens without any failure, and there
is no data loss.
Proper backup and recovery programs needs to be developed by DBA and has to be
maintained him. This is one of the main responsibilities of DBA. Data/objects should be
backed up regularly so that if there is any crash, it should be recovered without much
effort and data loss.
Security:
DBA is responsible for creating various database users and roles, and giving them
different levels of access rights.
18
Documentation:
DBA should be properly documenting all his activities so that if he quits or any new
DBA comes in, he should be able to understand the database without any effort. He
should basically maintain all his installation, backup, recovery, security methods. He
should keep various reports about database performance.
Data Interpreter: It interprets DDL statements and records them in tables containing metadata.
DML Compiler: The DML commands such as insert, update, delete, retrieve from the
application program are sent to the DML compiler for compilation into object code for database
access. The object code is then optimized in the best way to execute a query by the query
optimizer and then send to the data manager.
Query Evaluation Engine: It executes low-level instructions generated by the DML compiler.
Buffer Manager: It is responsible for retrieving data from disk storage into main memory. It
enables the database to handle data sizes that are much larger than the size of main memory.
File Manager: It manages the allocation of space on the disk storage and the data structure used
to represent information stored on disk.
Authorization and Integrity Manager: Checks the authority of users to access data and
satisfaction of the integrity constraints.
Transaction Manager: It ensures that the database remains in a consistent state despite the
system failures and that concurrent transaction execution proceeds without conflicting.
Data dictionary: Data Dictionary, which stores metadata about the database, in particular the
schema of the database such as names of the tables, names of attributes of each table, length of
attributes, and number of rows in each table.
Indices: An index is a small table having two columns in which the 1st column contains a copy
of the primary or candidate key of a table and the second column a set of pointers holding the
address of the disk block.
19
Statistical data:
These statistics provide the optimizer with information about the state of the tables that will be
accessed by the SQL statement that is being optimized. The types of statistical information
stored in the system catalog include:
Information about tables including the total number of rows, information about
compression, and total number of pages.
Information about columns including number of discrete values for the column and the
distribution range of values stored in the column.
Information about table spaces including the number of active pages.
Current status of the index including whether an index exists or not, the organization of
the index (number of leaf pages and number of levels), the number of discrete values for
the index key, and whether the index is clustered.
Information about the table space and partitions.
20
ER Diagrams
1. Entity sets
2. Attributes
3. Relationship sets
Types of Entities:
Entity set: Entity set is a group of similar entities that share the same properties i.e
it represents schema / structure.
Symbol:
Student
Attributes: Attributes are the units that describe the characteristics / properties of
entities.
21
For each attribute there is a set of permitted values called domains.
In ER Diagram, represented by ellipse or oval.
Types of Attributes:
Example: DOB
Name
Student
Example:
L_name F_name
Name
Student Roll_No
Single attribute: Single attribute can have only one value at an instance of time.
Represented by oval
Example:
DOB
Student
Multi-valued attribute: Multi-valued attribute can have more than one value at an
instance of time.
Represented by double oval.
22
Example:
Phne_No
Student
Stored attribute: Stored attribute is an attribute which are physically stored in the
database.
Example:
DOB
Student
Derived Attribute: Derived attribute are the attributes that do not exist in the
physical database, but their values are derived from other attributes present in the
database.
Represented by dotted oval.
Example:
DOB
Student age
23
T1 S1
T2 S2
T3
. S3
. .
. .
. .
tn
sn
Example:
teaches Student
Teacher
Since
works Department
Employee
for
Case Study:
24
employee Works for department
E1. . D1
E2. .
. D2
E3. .
E4. . D3
E5. .
. D4
E6.
N Works 1
employe Department
for
Figure - 1
25
Rectangles Represents entity sets
26
Additional Features of ER Model:
1. Participation constraints
Total participation is represented using a double line between the entity set and
relationship set.
Example:
Works 1
Employee Department
for
Partial participation: It specifies that each entity in the entity set may or may not
participate in the relationship instance in that relationship set. It is also called as
optional participation.
Partial participation is represented using a single line between the entity set and
relationship set.
27
Example:
N
Works 1
employee Department
for
One to one
One to many
Many to one
Many to many
1. One to one:
Example: Every dept should have a hod and only one employee manages a
dept and an employee can manage only one dept.
28
2. Many to one: An entity in A is associated with at most one entity in B. An
entity in B, however can be associated with any number (zero or more) of
entities in A.
E1. . D1
E2. .
. D2
E3. .
E4. . D3
E5. .
E6. . D4
N
Works 1
employee Department
for
1 N
Has
Dept Employee
29
4. Many to many: An entity in A is associated with any number (zero or
more) of entities in B, and an entity in B is associated with any number (zero
or more) of entities in A.
Example: Every emp is supported to work at least one project and he can
work on many projects, a project is supported to have many emp and a
project should have at least one emp.
E1 . P1
.
E2 . .
. P2
E3 .
.
E4 . . P3
E5 . .
. P4
E6 . .
.
M N
Employee Works project
on
30
Key: Key is an attribute or collection of attributes that uniquely indentifies an
entity among entity set. For example the roll number of a student makes him or her
identifiable among students.
Roll No Name
Student
Strong entity: An entity type is strong if its existence does not depend on some
other entity type. Such entity is called Strong Entity.
Representation
Example:
31
Consider the ER diagram which consists of two entities student and course
Student entity is a strong entity because it consists of a key called student id
which is enough for accessing each record uniquely
The same way, course entity contains of course ID attribute which is capable
of uniquely accessing each row it is each course details
Weak Entity:
A weak entity is an entity set that does not have sufficient attributes for Unique
Identification of its records. Simply a weak entity is nothing but an entity which
does not have a key attribute.
It contains a partial key called as discriminator which helps in identifying a
group of entities from the entity set.
Discriminator is represented by underlining with a dashed line.
Representation
Example:
32
Note:
Example:
If we have a person entity type who has attributes such as Name, Phone_no,
Address. Now, suppose we are making software for a shop then the person can be
33
of two types. This Person entity can further be divided into two entity i.e.
Employee and Customer. How we will specialize it? We will specialize the Person
entity type to Employee entity type by adding attributes like Emp_no and Salary.
Also, we can specialize the Person entity type to Customer entity type by adding
attributes like Customer_no, Email, and Credit. These lower-level attributes will
inherit all the properties of the higher-level attribute. The Customer entity type will
also have attributes like Name, Phone_no, and Address which was present in the
higher-level entity.
Generalization:
It is the process of extracting shared characteristics from two or more classes and
combining them into a generalized super class, shared characteristics can be
attributes.
Reverse of Specialization
Bottom – up approach.
34
The attributes of Customer entity type are Name, Phone, Address, Credit,
Customer_id, and Email.
We can see that the three attributes i.e. Name, Phone, and Address are common
here. When we generalize these two entities, we form a new higher-level entity
type Person. The new entity type formed is a generalized entity. We can see in the
below E-R diagram that after generalization the new generalized entity Person
contains only the common attributes i.e. Name, Phone, and Address. Employee
entity contains only the specialized attribute like Employee_id and Salary.
Similarly, the Customer entity type contains only specialized attributes like
Customer_id, Credit, and Email. So from this example, it is also clear that when
we go from bottom to top, it is a Generalization and when we go from top to
bottom it is Specialization. Hence, we can say that Generalization is the reverse of
Specialization.
35
Aggregation: Aggregation is the process of merging. It completely different from
generation. In generation, we merge entities of same domain into one entity; In
this case we merge related entities into one entity.
Representation:
Example:
36
Before aggregation
Drawbacks:
After aggregation
37
Conceptual Design with ER Model
While identifying the attributes of an entity set, it is sometimes not clear whether a
property should be modeled as an attribute or as an entity set. For example,
consider adding address information to the Employees entity set. One option is to
use an attribute address. This option is appropriate if we need to record only one
address per employee, an alternate is to create an entity set called Addresses and to
record associations between employees and address using a relationship (say,
Has_address). This more complex alternative is necessary in two situations.
38