Dbms Notes
Dbms Notes
Database : A database is a collection of related data. By data, we mean known facts that can be
recorded and that have implicit meaning. A database has the following implicit properties
1. A database is a logically coherent collection of data with some inherent meaning.
2. A database is designed, built, and populated with data for a specific purpose. It has an
intended group of users and some preconceived application in which these users are
interested.
3. A database represents some aspect of the real world.
A database can be of any size and of varying complexity. A database may be generated
and maintained manually or by machine.
A database management system (DBMS) is a collection of programs that enables users to
create and maintain a database. A DBMS is hence a general-purpose software system that
facilitates the processes of defining, constructing and manipulating databases for various
applications. The database and software are together called a database system.
5. Restricting Unauthorized Access: - When multiple users share a database, it is likely that some
users are not authorized to access all information in the database. In addition some user may be
permitted only to retrieve data, whereas others are allowed to both retrieve and update. Hence the
type of access operation can also be controlled. A DBMS should provide a security and
authorization subsystem, which is used by the DBA to create account and specify account
restrictions.
6. Providing Multiple Interfaces:- Because many types of users, with varying technical knowledge,
use a database, a DBMS should provide a variety of user interfaces. The types of interfaces
include query languages for casual users, programming language interfaces for application
programmers forms for parametric users, menu-driven interfaces for native users and natural
language interfaces.
7. Representing Complex Relationships among Data : - A database may include a variety of data
that are interrelated in many ways. A DBMS must have the capability to represent a variety of
complex relationships among the data as well as to retrieve and update related data in easy and
efficient manners.
8. Enforcing Integrity Constraints: - Most database applications will have certain integrity constraints
like data type of data item, uniqueness constraint of data item, relationship between records of different
file etc. These integrity constraints must be specified during the database design.
9. Provide Backup and Recovery: - A DBMS must provide facilities for recovering from hardware
or software failures. The backup and recovery subsystem of the DBMS is responsible for
recovery.
DBMS users: - Many persons are involved in the design, use and maintenance of a large
database. Different types of DBMS users are:
1. Database Administrator(DBA): -
2. Database Designer
3. End users
Three-Level architecture of DBMS: - The goal of the three-level architecture is to separate the
user applications and the physical database. In this architecture, level or schema can be defined
at the following three levels
Stored Database
1. Internal Level : - The internal level has a internal schema, which describes the physical
storage structre of the database. The internal schema uses a physical data model and
describes the complete details of data storage and access paths for the database.
2. Conceptual Level: - The conceptual schema or level describes the structure of the whole
database for a community of users. The conceptual schema hides the4 details of physical
storage structures and concentrates on describing entities, data types, relationships, user
operations and constraints.
3. External or view level: - The external level includes a number of external schemas or user
views. Each external schema describes the part of the database that a particular user
group is interested in and hides the rest of the database from the user group.
Data Independence: - Data independence can be defined as the capacity to change the schema
at one level of a database system without having to change the schema at the next higher level.
There are two types of data independence :
1. Logical data independence: - It is the capacity to change the conceptual schema without
having to change external schemas or application programme. We may change the
conceptual schema to expand the database, to change constraints, or to reduce the database.
2. Physical data independence: - It is the capacity to change the internal schema without
having to change the conceptual schema. Hence, the external schemas need not be
changed as well. Changes to the internal schema may be needed because some physical
files had to be reorganized.
DBMS Languages: - The DBMS must provide appropriate languages and interfaces for each
category of users. Different types of DBMS languages are:
1. Data Definition Language (DDL): - It is used by the DBA and by database designers to
define both schemas. The DBMS will have a DDL compiler whose function is to process
DDL statements in order to identify descriptions of the schemas constructs and to store
the schema description in the DBMS catalog.
2. Data Manipulation Language (DML): - Once the database schemas are compiled and the
database is populated with data, users must have some means to manipulate the database.
Typical manipulations include retrieval, insertion, deletion and modification of
3. the data. The DBMS provides a set of operations or a language called the data
manipulation language (DML) for these purposes.
Traditional Data Models: - A data model is a set of concepts that can be used to describe the
structure of database. By structure of a database, we mean the data types, relationships, and
constraints that should hold on the data. Most data models also include a set of operations for
specifying retrievals and updates on the database. There are three different types of data model
a) Relational b) Hierarchical c) Network
Relational Data Model: - The Relational Data Model represents a database as a collection of
tables, which consist of rows and columns. In relational database terminology, a row is called a
tuple, a column name is called an attribute and table is called a relation. Most relational
databases have high-level query languages and support a limited form of user views.
Hierarchical Data Model: - In the hierarchical model there are two main data structuring
concepts : records and parent-child relationship. Records of the same type are grouped into
record types. A parent-child relationship type (PCR type) is a 1:N relationship between two
record types. The record type on the 1-side is called the parent record type and the one on the N-
side is called the child record type of the PCR type. Properties of a Hierarchical schema are
a) One record type, called the root of the hierarchical schema, does not participate as a
child record type in any PCR type.
b) Every record type except the root participates as a child record type in exactly one
PCR type
c) A record type can participate as parent record type in any number of PCR types
d) A record type that does not participate as parent record type in any PCR type is called
a leaf of the hierarchical schema
If a record type participates as parent in more than one PCR type, then its child record types are
ordered. The order is displayed, by convention, from left to right in a hierarchical diagram.
Entity Relationship Model (ER Model): - At the present time, the ER model is used
mainly during the process of database design.
1. Entities: - The basic object that the ER model represents is an entity, which is a “thing” in the
real world with an independent existence. An entity may be an object with a physical
existence – a particular person, car etc, or it may be an object with a conceptual existence
like a company, a job or a university course etc.
2. Weak Entity: - Some entity may not have any key attributes of their own. This implies that
we may not be able to distinguish between some entities because the combinations of values
of their attributes can be identical. Such entity is called weak entity. Weak entity is identified
by being related to specific entities from another entity type in combination with some of
their attribute values. Weak entity always has a total participation constraint with respect to
its identifying relationship. Weak Entity type has a partial key, which is the set of attributes
that can uniquely identify weak entities related to the same owner entity.
3. Attribute: - Each entity has particular properties called attributes that describe it. For example
a student entity may be described by RollNo, Name, Class, Address etc. Different types of
attributes are
a) Composite Attribute: - An attribute, which is composed of more basic attributes, is called
composed attribute. For example Address attribute of a student.
b) Atomic Attribute: - The attributes that are not divisible are called simple or atomic
attributes. For example RollNo attribute of a student.
c) Single-valued Attribute: - Most attributes have a single value for a particular entity, such
attributes are called single-valued attribute. For example Date_of_Birth attribute of a person.
d) Multivalued Attribute: - The attribute, which has a set of values for the same entity is
called multivalued attribute. A multivalued attribute may have lower and upper bounds on
the number of values for an individual entity. For example Subject attribute of a student.
e) Derived Attribute: - In some cases two or more attribute values are related. The value of
one attribute has to be calculated from the value of another attribute. Such type of attribute is
called derived attribute. For example Age attribute of a person can be calculated from current
date and his date of birth.
f) Key Attribute: - An entity type usually has an attribute whose values are distinct for each
individual entity. Such an attribute is called a key attribute and its values can be used to
identify each entity uniquely. Sometimes several attributes together can form a key, meaning
that the combination of the attribute values must be distinct for each individual entity. Some
entity types have more than one key attribute. In this case, each of the keys is called a
candidate key. When a relation schema has several candidate keys, the choice of one to
become primary key is arbitrary, however, it is usually better to choose a primary key with a
single attribute or a small number of attributes.
join operation with such a general join condition is called a theta join.
a) Equi Join: - The most common join operation involves join condition with equality
comparisons only. Such a join where only comparison operator used is equal sign is called an
Equijoin. The relation produced by Equijoin, contain one or more pairs of attributes that have
identical values in every tuple because the equality join condition is specified on these two
attributes.
b) Natural join: - It is basically an equijoin but it eliminates the duplicate attribute in the
result. It is denoted by *.
c) Outer join: - Generally Join operation select all the tuples from the two relation which
will satisfy the join condition. But outer join is used to keep all tuples in R or S or both in the
result, whether or not they have matching tuples in the other relation. Different types of outer
joins are
i) Left outer join: - The left outer join operation keeps every tuples in the first or left
relation
ii) Right outer join:- Right outer join( ) keeps every tuple in the second or right
iii) Full outer join:- The full outer join( ) keeps all tuples in both the left and right
relations when no matching tuples are found, assigning them with null values as needed.
d) SET operation:- Set operations are the standard mathematical operation on sets. They apply to
the relational model because a relation is defined to be a set of tuples and they are used whenever
we process the tuples in two relation as sets. Set operations are binary that is they are applied to
two sets. The two relation on which these operations are applied, must be union compatib le.
Union compatible means the two relation must have the same type of tuples. There are three set
operations :
i) UNION: -
ii) INTERSECTION: -
iii) DIFFERENCE:
Advantage of SQL: -
3.ALTER TABLE command: - To add attributes to an existing relation, we can use ALTER
TABLE command.
ALTER TABLE tablename ADD attributename datatype ;
UPDATE command: - The UPDATE command is used to modify attribute values of one or more
selected tuples. Where-clause can be used to specify the tuples to be modified from a single
relation. The syntax is
UPDATE tablename
SET attributename= value
WHERE expression ;
Build-in-Functions: -
a) COUNT:- The COUNT function returns the number of tuples or values specified in a
query.
b) SUM:- The SUM function return summation of the specified attribute’s value
c) MAX:- The MAX function return maximum value of the specified attribute
d) MIN:- The MIN function return minimum value of the specified attribute
e) AVG:- The AVG function return average value of the specified attribute
Integrity Constraints: - Intetrity constraints are specified on a database schema and are
expected to hold on every database instance of that schema. There are three types of integrity
constraints :
i) Key Constraint: - Key constraints specify the candidate keys of each relation schema.
Candidate key values must be unique for every tuple in any relation instance of that
relation schema.
ii) Entity Integrity Constraint: - The entity integrity constraint states that no primary key
value can be null
iii) Referential Integrity Constraint: - It is a constraint that is specified between two
relations and is used to maintain the consistency among tuples of the two relations.
Informally, the referential integrity constraint states that a tuple in one relation that refers
to another relation must refer to an existing tuple in that relation.
Normalization
If a database design is not perfect, it may contain anomalies, which are like a bad dream
for any database administrator. Managing a database with anomalies is next to
impossible.
• Update anomalies − If data items are scattered and are not linked to each other properly,
then it could lead to strange situations. For example, when we try to update one data item
having its copies scattered over several places, a few instances get updated properly while a
few others are left with old values. Such instances leave the database in an inconsistent
state.
• Deletion anomalies − We tried to delete a record, but parts of it was left undeleted
because of unawareness, the data is also saved somewhere else.
• Insert anomalies − We tried to insert data in a record that does not exist at all.
Normalization is a method to remove all these anomalies and bring the database to a consistent
state.
First Normal Form
First Normal Form is defined in the definition of relations (tables) itself. This rule
defines that all the attributes in a relation must have atomic domains.
The values in an atomic domain are indivisible units.
We re-arrange the relation (table) as below, to convert it to First Normal Form.
Each attribute must contain only a single value from its pre-defined domain.
If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute. That is, if X → A holds, then there
should not be any proper subset Y of X, for which Y → A also holds true.
We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID. According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be
dependent upon both and not on any of the prime key attribute individually. But we find that
Stu_Name can be identified by Stu_ID and Proj_Name can be identified by Proj_ID
independently. This is called partial dependency, which is not allowed in Second
Normal Form.
We broke the relation in two as depicted in the above picture. So there exists no partial
dependency.
Third Normal Form
For a relation to be in Third Normal Form, it must be in Second Normal form and the
following must satisfy −
• For any non-trivial functional dependency, X → A, then either − o X is a superkey or, o A is prime
attribute.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip
is a superkey nor is City a prime attribute. Additionally, Stu_ID → Zip
→ City, so there exists transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as follows
−
In the above image, Stu_ID is the super-key in the relation Student_Detail and Zip is the
super-key in the relation ZipCodes. So, Stu_ID → Stu_Name, Zip and
Zip → City
Which confirms that both the relations are in BCNF.
Database normalization is a database schema design technique, by which an existing schema
is modified to minimize redundancy and dependency of data. Normalization split a large table
into smaller tables and define relationships between them to increases the clarity in
organizing data.
• The words normalization and normal form refer to the structure of a database.
• Normalization was developed by IBM researcher E.F. Codd In the 1970s.
• Normalization increases the clarity in organizing data in Database.
Normalization of a Database is achieved by following a set of rules called 'forms' in creating the
database.
Example:
Sample Employee table, it displays employees are working with multiple departments.
Melvin 32 Marketing
Melvin 32 Sales
Edward 45 Quality Assurance
Alex 36 Human Resource
Second Normal Form
The entity should be considered already in 1NF, and all attributes within the entity should
depend solely on the unique identifier of the entity.
Example:
1 Monitor Apple
2 Monitor Samsung
3 Scanner HP
productID product
1 Monitor
2 Scanner
3 Head
phone
Brand table:
1 1 1
brandID brand
1 Apple
2 Samsung
2 1 2
3 HP
3 2 3
4 JBL
4 3 4
The entity should be considered already in 2NF, and no column entry should be dependent
on any other entry (value) other than the key for the table.
3NF and all tables in the database should be only one primary key.
Well, this is a highly simplified explanation for Database Normalization. One can study this
process extensively though. After working with databases for some time, you'll automatically
create Normalized databases, as it's logical and practical.
Back up and Recovery: - A DBMS must provide facilities for recovering from hardware or
software failures. The backup and recovery subsystem of the DBMS is responsible for recovery.
For example, if the computer system fails in the middle of a complex update transaction, the
recovery subsystem is responsible for making sure that the database is restored to the state it was
in before the transaction started executing. Alternatively, the recovery subsystem could ensure
that the transaction is resumed from the point at which it was interrupted so that its full effect is
recorded in the database.
Transaction: - A transaction is an atomic unit of work that is either completed in its entirety or
not done at all. A transaction has the following stages :
1. BEGIN_TRANSACTION: - This marks the beginning of transaction execution.
2. READ OR WRITE: - These specify read or write operations on the database items that are
executed as part of a transaction.
3. END_TRANSACTION: - This specifies that READ or WRITE transaction operations have
ended and marks the end of transaction execution.
4. COMMIT_TRANSACTION: This signals a successful end of the transaction so that any
changeds executed by the transaction can be safely committed to the database and will not be
undone.
5. ROLLBACK (OR ABORT): - This signals that the transaction has ended unsuccessfully, so
that any changes or effects that the transaction may have applied to the database must be undone.
1. Recovery techniques based on Deferred update: - The idea behind deferred update techniques
is to defer or postpone any actual updates to the database until the transaction completes its
execution successfully and reaches it commit point. During transaction execution, the updates are
recorded inly in the log. After the transaction reaches its commit point and the log is force-
written to disk, the update are recorded in the database. If the transaction fails before reaching its
commit point, there is no need to undo any operation.
2. Recovery techniques based on Immediate update : - In these techniques, when a transaction
issues an update command, the database can be updated immediately without any need to wait
for the transaction to reach its commit point. The update operation is recorded in the log.
Provision must be made for undoing the effect of update operations that have been applied to the
database by a failed transaction.
3. Shadow paging: - Shadow paging considers the database to be made up of a number of fixed-
size disk page for recovery purposes. A directory is maintained to keep the information of each
page. When a transaction begins executing, the current directory – whose entries point to the
most recent or current database pages is copied into a shadow directory. The shadow directory is
then saved on disk while the current directory is used by the transaction.
During transaction execution, the shadow directory is never modified. When a write_item
operation is performed, a new copy of the modified database is created, but the old copy of the
page is not overwritten. Instead, the new page is written elsewhere on unused disk block. The
current directory entry is modified to point to the new disk page where as shadow directory is not
modified and continues to point to the old unmodified disk page.
If the transaction completes successfully then shadow page directory is deleted to make
the update permanent in the database. If the transaction fails to complete then the current
directory is deleted and shadow page directory is considered as current directory to bring the
database to the previous stage of the execution of the transaction.
Security importance of data security : -Different issues of database security are :
1. Legal and ethical issues regarding the right to access certain information. Some information
may be deemed to be private and cannot be accessed legally by unauthorized persons.
2. Policy issues at the governmental, institutional, or corporate level as to what kinds of
information should not be made publicly available.
3. System-related issues such as the system levels at which various security functions should be
enforced.
4. The need in some organizations to identify multiple security levels and to categorize the data
and users based on these classification.
Database security and the DBA: - Database administrator (DBA) is the central authority for
managing a database system. The DBA’s responsibilities include granting privileges to users
who need to use the system and classifying users and data in accordance with the policy of the
organization. The DBA has a DBA account in the DBMS, sometime called a system or super
user account. DBA privileged commands include commands for granting and revoking privileges
to individual accounts, users, or user groups and for performing the following types of actions:
1. Account creation: This action creates a new account and password for a user or a group of
users to enables access to the DBMS
2. Privilege granting: This action permits the DBA to grant certain privileges to certain accounts.
3. Privilege revocation: This action permits the DBA to revoke (cancel) certain privileges that
were previously given to certain accounts.
4. Security level assignments: This action consists of assigning user accounts to the appropriate
security classification level.
Authentication verifies who you are. For e.g. login to a system or access mail.
Authorization verifies what you are authorized to do. For e.g. you are allowed to login but you
are not authorized to browser / or any other file system.
Usually the connection attempt must be both authenticated and authorized by the system.
You can easily find out why connection attempts are either accepted or denied with the help of
these two factors.
Atomicity. A transaction must be an atomic unit of work (either all of its data
modifications are performed, or none of them is performed).
Consistency. When completed, a transaction must leave all data in a consistent state.
Durability. After a transaction has completed, its effects are permanently in place in the
system. The modifications persist even in the event of a system failure.
Q: Describe the three-tier ANSI-SPARC ARCHITECTURE OF DATABASE SYSTEM. Write
its advantage.
A:
Stored Database
Meta-data.
The database definition or descriptive information is also stored by the DBMS in the form of a database catalog or
dictionary; it is called meta-data.
Constructing-the database is the process of storing the data on some storage
medium that is controlled by the DBMS.
Manipulating- a database includes functions such as querying the database to
retrieve specific data, updating the database to reflect changes in the miniworld,
and generating reports from the data.
Sharing- a database allows multiple users and programs to access the database
simultaneously