Database Management Systems2013-14
Database Management Systems2013-14
1. Introduction
Data: The input given to the computer in the form of numbers,
alphabets etc are called data.
Database: The collection of data stored in a computer which contains
information relevant to enterprise is called database.
DBMS: DBMS is a collection of inter related data and a set of programs
to access and modify these data.
DBMS provides a convenient and efficient way to store and retrieve
database information.
Applications of DBMS
1. Banking: It is helpful for storing and processing customer
information, accounts, loans and banking information.
2. Universities: For student information, course registrations, marks,
grades, teachers information etc.
3. Telecommunication: For keeping records of calls made, generating
monthly bills, maintaining balances on prepaid cards and storing
information about the communication networks.
4. Human Resources: For managing information about employees, their
payroll, access and generation of paycheques.
5. Airlines: This is the first area which used database in a
geographically distributed manner.
6. Credit card transactions
7. Finance: In share market
8. Sales: For Customer Product and Purchase information.
9. Researches: For analyzing data using warehouses.
10. Hospitals: To manage Patient’s Disease History Etc..
11. Manufacturing: for Management of Supply and tracking production of
items.
Purpose of DBMS
o Prior to database system, file processing system was used in which
data were stored in operating system files.
o Application programs were developed which allow users to manipulate
and add new information.
The above conventional system has many disadvantages
1. Data redundancy and inconsistencies:
As the requirement arises software is updated. Different programs
create the files and application programs. The files may contain some
data in different files. This leads to more storage space and access
is lost.
This redundancy may also lead to inconsistency. When information
in some file is changed and the same information in another file is
left unchanged.
2. Difficulty in accessing data:
The conventional file processing environment don’t allow needed
data in convenient and efficient manner i.e. subset of content of
files cannot be retrieved.
3. Data isolation:
Since data is stored in different files and may be in different
forms. Writing application for retrieving this data may be difficult.
4. Integrity problems:
Data values stored in the in the database must satisfy certain
consistency constraints, these could not be enforced properly without
a DBMS.
View level
Logical level
Physical level
1. Physical level:
This is the lowest level of abstraction and describes how the data is
actually stored. The physical level describes complex low level structures
in detail.
2. Logical level:
The next level describes what data is stored in the database and what
relations exist between the data. Database administrators who must decide
what information to keep in the database use the logical level of
abstraction. Logical level describes the entire database in terms of a small
number of relatively simple structures. Implementation of these structures
may involve complex simple structures.
3. View level:
The view level abstraction is to simplify user interaction with the
system. The view level describes only a part of entire database.
The system may provide many views for the same database.
Database languages
Database system provides Data Definition Language (DDL) to specify
database keyword and a Data Manipulation Language (DML) to express database
queries and updates. In practice DDL and DML are parts of Structured Query
Language (SQL).
Data Manipulation Language (DML):
DML are of two types
1. Procedural DML: User has to specify has to specify where data is
needed and how to get this data.
2. Non procedural DML or Declarative DML: User has to specify what
data is needed but need not specify how to get data.
DML enables users to access or manipulate data as organized by the
appropriate data model.
A query is a statement requesting the retrieval of the information
from the database. Portion of DML involves information retrieval is called
Query Language. In common practice DML and Query language both words are
used synonymously.
SQL (Structured Query Language)
It is a most widely used query language. A query process component
of DBMS, translates DML queries in to sequence of actions at physical level.
Types of DML statements
1. Insertion of new information into a database.
Ex: Insert into <table_name> Values <val1, val2………>
2. Deletion of information from the database.
Ex: Delete from <table_name> where <condition>
3. Modification of information stored in the database
Ex: Update <table_name> set <col_name> = <value> where
<condition>
4. Retrieval of information
Ex: Select col1, col2....column n from table1, table2……
where<condition>
3. Assertions
A data value is to follow certain validity rules before it is inserted
in to the particular column.
Ex: Balance of an account table of a bank database cannot be negative.
Domain constraint and referential integrity are special forms of assertions.
4. Authorization
The differentiations among the database users to control access are
expresses by authorization.
Transaction Manager:
When several operations on the database form a single logical unit
of work, the atomicity of the transaction must be ensured. After this
transaction the correctness of the database state shows consistency.
After the successful execution of all transactions all new values must
exist in the database, despite possibility of system failure. This
persistence requirement is called durability.
A transaction is a collection of operations that perform a single
logical function of a database application. Each transaction is a unit of
both atomicity and consistency. In case of transparent funds from account A
to B we write two programs to debit one account and to credit one account.
This time it may be necessary to allow temporary inconsistency.
Ensuring the atomicity and durability properties is the
responsibility of the transaction-management component because of various
types of failures. Transaction may not always complete its execution
successfully. In this case failure recovery component should detect the
failure and restore the database.
When several transactions update the database concurrently an each
individual transaction is correct but consistency of data may not be
preserved. Concurrency control manager controls the interaction among the
concurrent transactions to ensure the consistency of the database.
Database Users and Administrators
Database Users
The way users interact with the DBMS they are classified into 4 types
1. Naive Users: They are unsophisticated users who interact with the
system by invoking one of the application programs that have been written
previously. Typical users interface forms. User may fill appropriate fields
of the form to interact with the database.
Ex: ATM users, bank cashiers
2. Application programmers: It develops user interfaces using many
development tools that enable an application programmer to construct forms
and reports with minimum effort.
3. Sophisticated users: These are the users who submit queries to
explore the database in DML language.
4. Specialized users: These are the special type of users who write
specialized database applications like computer aided design systems which
store complex data such as graphics, audio, video etc.
Database Architecture
Naïve Users Application Sophisticated Database
(tellers, Agents, Programmers users (analysts) Administrators
Web users)
Query Processor
Compiler DML DDL
and Linker Queries interpreter
Application
Program
DML Compiler
Object Code
and organizer
Query evaluation
engine
Storage manager
Buffer Manager File Manager Authorization Transaction
and integrity Manager
Manager
Disk Storage
Indices Data
Dictionary
Data
Statistical data
Client
USER
Application
Network
Server
Database system
Three-tier architecture:
1. Client machine acts as frontend and communicates with the
application server through forms interface.
2. Application server which communicates with the database system to
access data.
Client
USER
Application
Network
Server
Application Server
Database system
2. E-R Model
ER-Model is very useful in mapping the meanings and interactions of
real world Enterprises onto a conceptual schema. The E-R model employs 3
basic notations entity sets relationship sets & attributes.
Entity Set:
An entity is a thing or object in the real world that is
distinguishable from all other objects. Ex: Person in an enterprise.
Entity Set is a set of entities of the same type that share the same
properties (or) attributes.
Ex: employees in an organization.
Entity sets need not be disjoint.
Ex: employee & customer are entity sets.
Any person may be an entity in both entity sets (or) in any one (or)
none. An entity is represented by a set of attribute of descriptive
properties possessed by each member of entity set. Each entity may have its
own values for each attribute.
Employee table
Id Name Address Designation
101 Vivekananda CTA Principal
104 Nagaraj DVG Professor
108 Govindraju CTA Professor
109 Channakeshava CTA Asst Professor
… … … …
Relationship set:
A relationship is an association among several entities.
Ex: Govindraj is working in maths department.
A relationship set is a set of relations of the same type. It is a
mathematical relation on n>=2 entity sets.
If E1,E2...En are entity sets. Then relationship set R is a subset of
{(e1,e2,e3...en)/e1 E1,e2 E2...en En}
Where e1,e2,e3...en relationship.
An association b/w entity sets is referred to as participation. Entity
sets E1, E2...En participate in relation set R. a relationship instance in an
E-R schema represents an association b/w the named entities in the real
world enterprise. The function that an entity plays in a relation is called
Role of that entity.
In a relationship set the same entity set participates in relationship
more than once in different roles this is called recursive relationship set.
Employee E1 E2 Department
Id Name Address Designation Dept id Dept name
101 Mahesh CTA Principal 01 Administrative
104 Naveen DVG Professor 02 Cs
108 Govindraj CTA Professor 03 Maths
109 Channakeshava CTA Asst Professor E3 Commities
C_ID Committee Name
01 Admission
02 Time table
03 NSS
The relationship sets Depositor and Account are an example for binary
relationship. Most of the relationship sets in the database system are
binary
Attributes
A set of permitted values for an attribute is called domain (or) value
set of that attribute.
Ex: since an entity set may have several attributes each entity can be
described by a set of (attribute name, data value) pairs. One pair for each
attribute of the entity set.
A particular employee entity may be described by the set
{(id,101),(name,Mahesh),(address,CTA),(desg,principal)}
Attributes can be characterized as
Simple & Composite attributes:
Simple attribute contains only part and cannot be divided into sub
parts. Ex: emp-id, name.
Composite attribute can be divided into sub parts.
Ex: Name can be divided into 1st name, middle name, last name
Name Address
First Name Middle Name Last Name Street City State Postal Code
Derived Attribute:
The values for these types of attributes can be derived from the
values of other related attributes. Such attributes are called derived
attributes. Ex: Age can be derived from Date_of_Birth.
Null values:
An attribute takes null value when entity set do not have any value to
it. Null is not equivalent to zero, Null means the value is not known.
Constraints:
Mapping cardinalities, key constraints and participation constraints are
the constants that contents of a database must confirm.
1. Mapping cardinalities:
It expresses the number of entities to which another entity can be
associated via relationship.
A) One-one: Entity in A is associated with the at most one element of
B. And an entity in B associated with at most element of A.
A B
a1 b1
a2 b2
a3 b3
a4 b4
a1 b1
a2 b2
a3 b3
a4 b4
a1 b1
a2 b2
a3 b3
a4 b4
a1 b1
a2 b2
a3 b3
a4 b4
2. Key Constraints:
A key is a property of an entity set rather than the individual
entity. Individual entities in a database are conceptually different.
Therefore To distinguish them, the values of attribute of an entity
must be such that they can uniquely identify an entity.
Key allows us to identify a set of attributes that suffice to
distinguish entities from each other. They also help to uniquely
identify relationships.
Keys on Entity Sets:
a) Super Key: is a set of one or more attributes that are taken
collectively, allows us to identify an entity in entity set.
Ex: (Customer_id), (customer_name, address, age), (Sdudent_Name,
Class, Age).
b) Candidate Key: is a minimal super key of a set of attributes such
that no subset of it again a super key.
Ex: (Cus_id), (Cus_name, Street_no)
c) Primary Key is a term used by the database users to denote
candidate key.
Properties:
a) Candidate key must be chosen such that it has minimal key
attributes.
b) No two entities of a set are permitted to have same value in that
key.
c) Primary key/Candidate key must be chosen such that Its value change
very rarely or never.
(address is possible to change so it should be avoided)
Keys in Relationship sets:
We have primary key, candidate key, super key, defined for entity
sets, we can have similar keys for relationship sets also.
Let R be the relationship set involving entity sets E1,E2,...En.
Let primary-key[Ei] denote the primary key for entity set Ei. Assume
that the attribute names of all primary keys are unique & each entity set
participates only once in the relationship.
If the relationship R has no attributes associated with it then the
set of attributes
Primary-key[E1]Primary-key[E2] …. … Primary-key[En]
If the relationship set R has attributes {a1, a2,...an} associated with it
then the set of attributes
E-R Diagram for entity set Employee & Department (many to one
relation), phone_no is multivalued attribute, address is composite
attribute, age is derived attribute, emp_id is primary key.
locality
street city
dob
emp_name pincode
dept_name
address location
emp_id dept_id
phone_no age
and one customer can have more than one account. access_date is
relationship attribute.
customer_street Access_date
branch_name
customer_city
customer_name acc_no balance
title level
dob
emp_name job dept_name
address location
emp_id dept_id
customer_street
branch_name
customer_city
customer_name acc_no balance
B R C B RB E RC C
loan_pay
customer payment
ment
Extended ER-features:
Specialization:
An entity set person with attributes (person_id, name, street and
city), may be sub classified further depending on the database we are using.
In the example of banking we may classify the entities of person as:
Employee and Customer i.e., a person could be an employee or customer or
both or neither.
The process of designating sub group within an entity set is called
Specialization.
E.R.Diagram for Specialization: Triangle with label-ISA
ISA
Person
Employee Customer
ISA ISA
Secretary
Account
ISA
interest Rate Overdraft amount
Savings Checking
Account Account
Generalization:
Specialization represents a top-down approach in design process by
refining an initial entity set into successive levels af entity sub groups.
The design process by which multiple entity sets are synthesized into a
higher level entity set on the basis of common features is a bottom up
approach called Generalization.
Generalization is reverse process of Specialization & is bottom-up
approach.
Attribute inheritance:
Entities are Sub grouped and classified as higher and lower by
specialization and generalization the attributes of higher level entities
sets are said to be inherited by the lower level entity sets is called as
attribute inheritance.
Example: The entity sets employee and customer inherits the peson_id,
name, address etc. attributes of person entity set.
If a higher level entity set participation any relationship set, the
lower level entity set also inherits participation in relationship set.
Ex:
Account Made by Customer
ISA
SB CA
The lower and higher level entities may be arrived by either
generalization or specialization the outcome is same. The attributes and
relationship of a higher level entity set apples to all of its lower level
entity sets.
The distinctive features of a lower level set apply only within a
particular lower level entity set. The E-R diagram of generalization /
specialization depicts a hierarchy at entity sets. If a lower lever
entity set involved in only one ISA relationship with higher level entity
set it is referred as Single Inheritance.
EX:
Person
ISA
Employee Customer
If a lower level entity set involved in more than one ISA relationship
with the higher level entity set, it is referred as Multiple Inheritance.
Person Employee
ISA ISA
CONSTRAINTS ON GENERALIZATION
Constraints are required to model an enterprise more accurately. They
may be used in different situations.
I. Constraint to determine entities that can be in the lower level entity
set:
1) Condition defined: Membership of lower level entity sets is evaluated on
the basis of whether or not an entity satisfies an explicit conditionor.
Ex: If account a higher level entity set has attribute account_type
all entities that satisfies the condition account_type=”savings account” are
allowed to belong the lower entity set savings account;
Since all the lower level entities are evaluated on the bases of the
same attribute. This type of generalization is said to be Attribute Defined.
2) User Defined: This constraint will not have any membership condition. The
database user assigns entities to a given entity set.
Ex: Assigning employee to a committee work or team.
II. Second type of constraint
Constraint that relates to whether or not entities may belongs to more than
one lower level entity set within a single generalization the lower level
entity sets may be disjoint or over lapping
1) Disjoint: Disjoint constraint requires that entity do not belong to more
than one lower level entity set.
Ex: an account entity can be any one of savings account or checking
account but cannot be both.
ISA
(Disjoint)
2)Overlapping: if the same entity belong to more than one lower level
entity set within a single generalization
Account
ISA
Aggregation:
Aggregation is an abstraction through the relationships is treated as
higher level entities.
E-R model con not express relationship among relationships
Ex: consider a ternary relationship works–on between employee, job and
branch, to record managers for jobs performed by an employee at a branch.
If we represent this by binary relationship b/w employee and manager
will not explain comprehensively.
Job
Job
Manages
Manages
manager
manager
mages b/w manager entity set and works-on to represent who manages which
task.
ALTERNATIVE E-R NOTATIONS:
Previously used symbols in E-R notations
1) Entity set
2) Attributes
3) Weak entity set
4) Multi valued attribute
5) Relationship set
6) Derived attribute
7) Identifying relationship set for weak entity set
8) Total participation of entity set in relationship
9) Primary key
10) Discriminator
11) Many to many
12) One many
13) One to one relationships
14) Cardinality limits
15) Role indicator
16) Generalization/Specialization
17) Total generalization
18) Disjoint generalization
Alternative E-R notations
Entity set E with attributes A1,A2,A3 and primary key.
E
A1
A2
A3
* R * OR
There is no standard for E-R notation. The notation we used are called
chen’s notation.
The US National Institute for Standards and Technology defined a
standard that used . Craw’s-foot notation IDEF1X in 1993.
3. RELATIONAL MODEL
Structure of relational database.
A relational database contains of a collection of tables each of which
is assigned unique name the row in a table represents a relationship among a
set of values.
Basic structure:
D1 D2 D3
Emp_id Emp_name Emp_address
101 Raju Dvg
102 Rama Cta
: : :
KEYS:
A super key is a set of one OR more attributes that allows us to
uniquely identity a tuple in the relation.
EX: (customer_id), (customer_id, customer_name)
Candidate key is a minimal super key, where no proper subset of it is
again a super key.
EX: {cusemer_id}, {customer_name, customer_street}
Primary key: It denotes a candidate key choosen by database designer
as principle means of identifying tuples within a relation.
Primary key, candidate key OR a super key are property of entire
relation.
Candidate key must be chosen such that duplicate values never occur in
the attributes and the primary key should be choosen suchthat its value
never OR rarely changes.
Let 'R' be a relation schema, K R is super key for R. Relation r(R)
has no two distinct tuples in which have the same values on all attributes
in K.
i.e, if t1 and t2 are in r
and t1≠t2 then
t1[k] ≠t2[k].
Foreign key:-
(r1) account (r2) branch
account_no branch_name
branch_name branch_city
balance assets
A relation r1 may include among its attributes the primary key of
another relation schema r2 this attribute is called foreign key.
Any insertion or updation on relation r1 references for existence of
that particular value in the relation r2. If that particular value is not
loan borrower
loan_no cust_name
loan_no
branch_name
amount
Query languages:-
A Query language is a language in which a user requests information
from the database Query languages are categorized into two
1. Procedural and 2. Non procedural
In a procedural language user gives a sequence of operations on the
database to compute the desired result.
In a Non procedural language user describes the desired information
without giving a specific procedure.
The relational algebra is procedural where as the tuple relational
calculus and domain relational calculus are Non procedural.
Fundamental Relational Algebra
Relational algebra consists of a set of operations that take one or
two relations as input and produce a new relation as their result.
There are six fundamental operations in Relational algebra are
Unary operations - select, project and rename.
Binary operations - union, set difference and cartesion product.
Apart from these fundamental operations there are other operations
like set intersection, natural join, division and assignment all these
operations are defined in terms of then fundamental operations .
The result of a relational operation itself is a relation .
Select Operation:-
The select operation selects tuples that satisfy a given predicate.
Lower case Greek letter sigma (σ) is used to denote selection operation and
the predicate appears as subscript to σ.
Relational algebra queries.
1. To select tuples from loan relation whose branch name is “cta”.
σbranch-name=”cta”loan)
2. To select tuples whose amount is more than 1200.
σamount>1200(loan)
We can use relational operators =,≠,<,≤,>,≥, in the selection
predicate to compare. To use more complex predicates (or) to combine
them we also use logical operators AND-(ᴧ), OR-(v), NOT-().
Ex:- to use both conditions in the above examples.
σbranch_name=”cta” ᴧ amount>1200(loan).
Project operation:-
The project operation is a unary operation that returns it’s argument
relation with only specified attributes (or) certain attributes left at.
Projection is denoted by upper case greek letter ‘PAI’(П),the
attributes we wish to get in the result are listed in the subscript.
Пloan_number,amount(loan).
Union operation:-
Using union operation we can combine output of two relational
operations into one.
Consider two tables,
1)Depositer (Account_no, customer_name, balance).
Account_no Cust_name Balance
A-51 RNC 25000
A-15 CKM 35260
2) Borrower (loan_no, customer_name, amount)
Loan_no Cust_name Amount
L-21 DGR 20,000
L-42 NPN 15,000
The union operator can be used to combine customer=names from both
tables.
Пcustomer_name(borrower)∪ Пcustomer_name(depositor).
For union of two relations “r∪s” to be valid two conditions must hold.
1)The relations r and s must be of the same arity. They must have the
same number of attributes.
2) i, The domain of the ith attribute of r and the ith attribute of s
must be the same.
r and s can be either database relations or temporary relations that
are the result of relational algebra expression.
loan=(loan_number,branch_name,loan_amount)
Then relation schema for r = borrower loan is
(borrower.customer_name, borrower.loan_number, loan.loan_number,
loan.branch_name, loan.amount)
However attribute loan_number is in both relations so removing
relation name for other attributes will not raise any ambiguity. So,relation
schema for r is:
(customer_name, borrower.loan_number, loan.loan_number, branch_name, amount)
Borrower Loan
Customer_name Loan_number Loan_number Branch_name Amount
C K M l-01 l-01 C T A 25000
R N C l-02 l-02 D V G 15000
D G R l-03 l-03 C T A 25000
Rename Operation:
The Rename Operation denoted by rho(ρ), lets us to give names to
results of relational algebra expressions.
ρx(E) returns the result of expression E under the name ‘x’.
If relational algebra expression E has arity ‘n’ then the expression
ρx(A1,A2,……An)(E)
Returns the result of expression E under the name x and with the
attributes renamed to A1,A2,...An.
Ex: to find employee name along with their Manager name where as both
are in the same table.
To find the Managers of every employee, first we have to compute
temporary relation. Exactly as employee
emp_id emp_name manager_id
101 Vivekananda 101
102 Sannamma 101
103 Shirahatti 102
104 Madhu 102
105 Shobha Dalawai 101
106 Nagaraj 105
Then we will take the tuples which are having Manager_id of relation
is equal to emp_id of temporary relation as below.
Пemployee.emp_name,m.emp_name(σemployee.manager_id=m.emp_id(employee × ρm(employee))))
This will give the result as.
Employee. emp_name M.emp_name
Vivekananda Vivekananda
Sannamma Vivekananda
Shirahatti Sannamma
Madhu Sannamma
Shobha Dalawai Vivekananda
Nagaraj Shobha Dalawai
2. Natural join:
Consider two relations r(R) & s(S). The natural join of r&s
denoted by r⋈s is a relation on schema R∪S.
r⋈s=ПRUS(σr.A1=s.A1∧r.A2=s.a2∧ … ∧r.An=s.An(r×s)
where R∩S={A1,A2,...An}
the natural join of borrower and loan tables,
Пcustomer_name,loan_number,amount(borowwer⋈loan)
Can be implemented by using
Пcustomer_name,loan_number,amount(σborrower.loan_no=loan.loan_no(borrower×loan))
3. Division operator:
We can find branch name where amount of loan is more than 20000 by
r1=Пbranch_name(σamount>20000(loan))
We can get customer name along with branch name by using
r2=Пcustomer_name,branch_name(borrower⋈loan)
If we want to find out customers in r2 whose branch name is in r1
we can use division operator.
Пcustomer_name,brench_name(borrower⋈loan) Пbranch_name(σamount>20000(loan))
4. Assingment operator:
Works like an assignment operator in any program language. We may
assign output of certain relation algebra operation to a variable.
ex: temp1 ← ΠR-S(r)
General form:
∏F1,F2…Fn(E).
Where E is any relational-algebra expression, and each of F1, F2…Fn is
an arithmetic Expression involving constants and attributes in the Schema of
E. The expression may be simply an attribute, or a constant, or it may be an
arithmetic expression which includes +, -, *, and ÷ on numeric attributes,
numeric constants or expressions which generate arithmetic values.
Ex: ∏empid,fname,salary*12(emp).
Displays EmpID, firstname and annual salary from emp table.
2. Aggregation:
Aggregate operation permits the use of aggregate functions such as min
or average, on sets of values.
Aggregate functions take a collection of values and return a single value as
a result. Ex: the sum function takes collection of values as input adds
them and gives the result of addition as output. Similarly we can perform;
average of set of values, Count set of values, find minimum or maximum among
set of values.
General form: G1, G2, … Gn G F1(A1), F2(A2), … Fm(Am)(E)
where E is any relational algebra expression; G1, G2, … Gn is the list
of attributes on which the aggregate operation has to group. Fi is an
aggregate function on Attribute Ai.
The tuples in the result of expression E are partitioned into groups in such
a way that:
1. All tuples in a group have the same valuesfor G1, G2, … Gn.
2. Tuples in different groups have different values for G1,G2,…Gn.
As a special case of the aggregate operation, the listof attributes
G1, G2, … Gn can be empty, this corresponds to aggregation without grouping
(simple aggregation).
Ex: to find sum of salaries of all employees in emp table we can write:
Gsum(salary)(emp).
4. SQL
Background of SQL
IBM developed the original version of SQL, Originally called sequel,
as part of the system-R project in the early 1970’s. The sequel language has
evolved since from then & it is known as SQL, (Structured Query Language).
Now many database projects support SQL.
In 1986, American National Standards Institute (ANSI) & International
organization for Standardization (ISO) published an SQL standard, called
SQL-86. ANSI published the extended standard for SQL, SQL-89, SQL-92,SQL-99
and the latest is SQL-2003.
SQL uses a combination of relational algebra and relational calculus
constructs.
SQL Language has Several Parts:
1) Data Definition Language (DDL).
The SQL DDL provides commands for defining relation schemas, deleting
relations and modifying relation schemas.
2) Interactive Data Manipulation Language (DML).
The SQL DML includes a query language based on both the relational
algebra on both the relational algebra & the tuple relational calculus. It
includes commands to insert, delete & modify tuples in the database.
3) Integrity.
SQL DDL includes commands for specifying integrity constraints that
the data stored in thedatabase must satify.Updates that violate integrity
constraints are not allowed.
4) View Definition.
SQL DDL includes commands for defining views.
5) Transaction control:
SQL includes commands for specifying beginning & end of transactions.
6) Embedded SQL & Dynamic SQL:
Embedded and Dynamic SQL define how SQL statements can be embedded
within general purpose programming languages such as c, c++, Java, PL/I,
Cobal, Pascal & Fortran.
7) Authorization:
The SQL DDL commands specify access rights to relations & views.
Data Definition:
The set of relations in the database are specified using DDL.
DDL defines the following:
Schema for each relation.
Domain of values associated with each attribute.
Integrity constraints.
The set of indices to be maintained for each relation.
The security and authorization information for each relation.
Physical storage structure for each relation on disk.
Basic domain types:
1. Char (n): fixed length, character string of user specified length
‘n’. Full form: Character.
2. Varchar (n): A variable length character string up to a maximum
length of ‘n’. Full form: Character varying.
3. Int: A finite subset of integers that is machine dependent. Full
form: Integer.
4. Small int: A machine-dependent subset of the integer domain.
2. Find loans whose amount is more than 50000 but less than 100000.
Select loan_number
from loan
where amount between 50000 and 100000;
We can also use not between to exclude the range.
From clause:
From clause defines a Cartesian product of the relations in the
clause. A Natural join is defined in terms of a Cartesian product.
To select customer_name, loan_number and amount from borrower and loan
table.
Relational algebra expression is: Πcustomer_name,loan_no,amount(borrower ⋈ loan)
SQL statement: Select customer_name, borrower, loan_number, amount
from borrower, loan
Where borrower.loan_no=loan.loan_no.
To avoid ambiguity if two attributes have same name in two relations
we use those attributes along with relation name using dot operator.
Tuple variables:
The “as” clause is particularly useful in defining the notion of tuple
variables. A tuple variable in SQL must be associated with a particular
relation. tuple variables are defined in the “from” clause by the way of the
“as” clause.
Ex: Select customer_name, T.loan_number,S.amount
from borrower as T, loan as S
where T.loan_number=S.loan_number
Tuple variables are most useful for comparing two tuples in the same
relation. Ex: To find branch names having assets greater than atleast one
branch located in “Bangalore”
select distinct T.branch_name
from branch as T, branch as S
where T.assets > S.assets and S.branch_city=”Bangalore”
String Operations:
Stings are specified by enclosing them in single quotes. If we want to
use single quote within a string quote within a string, We have to specify
two single quotes.
Ex: “It’s right” is a specified as ”it’’s right”
Strings are compared using “Like” operator we describe two special
characters.
1. Percent(%): The % character matches any substring.
2. Underscore(_): The _ character matches one character.
Examples: list customer names who has ‘A’ in their names.
Select customer_name
From customer
Where customer_name like “%A%”
String functions
Upper() Converts the given string to upper case.
Lower() Converts the given string to lower case.
‘||’ is a concatenation operator to join strings.
Set Operations:
The SQL operations union, intersect and except are three operations
of relational algebra union ∪, intersection ∩, & set difference -. The set
operators automatically eliminate duplicates. If we want to allow
duplicates, we have to use union all, intersect all, except all.
Consider two tuples:
Select customer_name from depositors;
and
Select customer _name from borrower;
Except operation:
To find all customers who have an account but no loan at bank.
(Select distinct customer_name from depositor)
Except
(Select distinct customer_name from borrowers)
To allow duplicates we write:
(Select customer_name from depositor)
Except all
(Select customer_name from borrower)
Aggregate functions:
Aggregate functions take a collection of values as input and return a
single value. SQL offers 5 built in aggregate functions.
Average : avg()
Minimum : min()
Maximum : max()
Total : sum()
Count : count()
Sum & avg operates only on numbers, But min,max & count can operate on
numeric as well as non numeric data also.
Example
Consider employee table with (emp_name, dept_id, salary)
1. To find total no of employees
Select count(*) from employee;
Or
Select count(emp_name) from employee;
2. To find no of employees in each dept
Select dept_id,count(dept_id)
from employee
group by dept_id;
3. To find average and total salary given to all employees
Select avg(salary),sum(salary)
From employee;
4. To find minimum & maximum salary given to all employees
Select min(salary),max(salary)
From employee;
Group by clause groups the similar typles into one in the result
While clause can be used to compare attributes of the relation.
Having clause is used to compare aggregate function result values.
Null values:
Null value indicates value unknown or not exists. We can use keyword
null to specify null values.
Ex: select loan_no
From loan
Where amount is null;
We can also use “is not null” to test the absence of null value.
The result of an arithmetic expression is always null if it contains a
null operand. “is null” & “is not null” checks presence & absence of null
value in expression. All comparisons (>,>=,<,<=,=,) result into a null
value. Apart from this predicates in the where clause also contains and or
not logical operators.
AND
AND NULL
True Null
False False
Null Null
OR
OR null
True True
False Null
null null
NOT: Not of null is null.
A Boolean type data can take values: true, false and unknown (null) as
per SQL:1999.
All aggregate functions except count(*) ignores null values.
Nested Subqueries
A sub query is a select from where expression nested within another
query. Sub queries are useful to perform tests for set membership make set
comparisons and determine set cardinality.
1) Set membership: the ‘in’ connective tests for set membership, where set
is a collection of values produced by a select clause. The ‘not in’
connective tests for the absence of set membership.
Ex: 1) To find all customers who are borrower of the bank and also having
accounts.
Select distinct customer_name
From borrower
Where customer_name in (select customer_name from depositor);
Ex: 2) To find customers who have a loan at the bank but do not have
account.
Select distinct customer-name
from borrower
Where customer_name not in (Select customer_name from borrower);
2) Set comparison:
To compare sets we use two keywords “some” and “all” SQL allows comparisons
with >some, >=some, =some, <>some, <some, <=some,
Ex: >some means “greater than at least one”
The keyword “any” is synonymous to “some” in SQL.
<>some is not same as “not in”
To find the names of all depts., Who gives salary more than salary
given to any employee of “physics” dept.
Emp_name Dept_name Salary
Select dept_name
from emp_SQL
Where salary >some(select salary
from emp_SQL
where dept_name=”physics”);
Similar to some we have keyword “all” <all, <=all, >=all, >all, =all
and <>all. <>all is identical to “not in”
Ex: Find the dept that has the highest average salary.
Aggregate functions in SQL cannot be composed like max(avg
(salary)). We can write a sub query to provide avg(salary) then we can
select maximum among them
Select dept_name
From emp_sal
group by dept_name
having avg (balance)>=all (select avg(balance)
from account
group by branch name);
3) Test for empty relations:
Using SQL we can test whether a sub query has any tuples in its
result. “exists” construct returns true if the argument sub query is non
empty.
Find all the customers who have both an account and a loan at the bank.
Select customer_name
From borrower
Where exists (select *
from depositor
where depositor.Customer_name=borrower.Customer_name);
We can also test for the non_existance of tuples in a sub query by using the
“not exists” construct.
Find all customers who have an account at all branches located in CTA:
Select district S.customer_name
From depositor as S.
Where not exists ((select branch_name
from branch
where branch_city=”CTA”)
except
(select R.branch_name
from depositor as D,account as A
where D.account_no=A.account_no
AND S.customer_name=D.customer_name)).
4) Test for the absence of duplicate tuples.
Unique construct returns the value true if the argument subquery
contains no duplicate tuples.
To find all customers who have at most one account at ‘CTA’ branch.
Select T.customer_name
from depositor as T
where unique (select R.customer_name
from account, depositor as R
where T.customer_name=R.customer_name and
R.account_no=account.account_no and
account.branch_name=”CTA’);
If we use “not unique” in the above query instead of “unique” we can
find all customers who have at least two accounts at “CTA” branch. It is
possible to unique to be true even if there are multiple copies of a tuple
with any one of the attribute is null.
Complex Queries:
Simple queries in SQl consist of single select-from-where statement,
Possibly with group by and having clauses. We can compose complex queries
using 1. Derived relations and 2. with clause.
1. Derived relations:
A subquery expression can be used in the clause of select statement.
If we are using such an expression the result of this subquery must be given
a name, we can also rename the attributes using ‘as’ clause.
Consider subquery to select branch names with average balance
(select branch_name,avg(balance)
from account
group by branch_name)
as branch_avg(branch_name,avg_bal);
To select branch_name and average balance whose branch average is more than
1200.
Select branch_name, avg_balance
from (select branch,avg(balance)
from account
group by branch_name)
as branch_avg(branch_name,avg_bal)
where avg_balance>=1200;
The above queries can be written without “with” clause, but it would
be more complicated and harder to understand. Using with clause makes the
query logic clearer. It also permits a view definition to be used in
multiple places within a query.
Views:
Views allow us to create a personalized collection of relations that
is better matched to a certain user’s intuition. We can also allow some
attributes accessible to some users only and restrict accesing some data
using views. “Any relation that is not part of the logical model, but is
made visible to a user as a virtual relation is called a “view”.
Modification of database
Deletion, updates and insertion are the modification operations on a
database.
1. Deletion:
Using a delete expression we can delet whole tuple but cannot delete
values on only particular attributes.
Syntax:
delete from r where p;
Where ‘p’ is a predicate and ‘r’ represents a relation. The delete
statement first finds all tuples t in ‘r’ for which P(t) is true and then
they deleted, if where clause is omitted all tuple in ‘r’ are deleted.
Delete from loan;
This command deletes all tuples from the loan relation. The delete
command operates on only one relation. If we want to delete tuples from
several relations we must use one delete command for each relation.
The predicate in the where clause may be as complex as the predicate
in the where clause of select statement ie, all comparison operators,
logical operators, set operators, sub-queries every element can be used.
To delete from employee table whose emp-id is 108
Delete from employee
Where emp-id=108;
To delete from employee table whose name starts from ‘p’
Delete from employee
Where first-name like “p%”;
To delete employee information whose salary is less than avg salary
Delete from employee
Where salary <(select avg(salary) from employee);
Although we can delete tuple from only one relation we can reference
any no of relation in the where clause using nested sub-query (select from
where)
2. Insertion:
The simple insert statement is a request to insert one tuple.
Ex: insert into account values (1031,’shimoga’,1500);
This inserts one tuple into account table with account_no=1031,
branch_name=shimoga and balance=1500.
If you are inserting values into the table in different order of
attributes we can write.
Insert into account(branch_name,balance,account_no)
values(‘shimoga’,1500,1031);
You can also insert tuple onto a relation using result of a query. Consider
a table with employee_id, salary and dept
Ex: emp_dept(emp_id,salary,dept_id)
insert into emp_dept
Select emp_id,salary,dept_id
From employee
Whewe salary>25000;
Insert statements using select statement must be carefully eualuated
before execution.
Ex: insert into account
Select * from account;
Update of a view:
Modification on view defined has to be translated to modification to
the actual relation in the logical model of the database, in order to
execute. This may cause many serious problems.
1. Consider loan relation loan(loan_no,branch_name,amount) a view is
defined to select Loan_no & branch_name.
Create view loan_branch as
Select loan_no, branch_name
From loan;
If we try to insert into loan branch view.
Insert into loan_ branch values (‘101’,’CTA’)
This insert statement has to insert above values into table ‘loan’
which consists 3 Attributes & insert statement cannot insert values to
‘Amount’ attribute.
DBMS may reject this operation.
OR
Insert data to the table with null values to ‘Amount’.
These are the two possible ideas to insert to relation through view.
2. Consider one more relation. Borrower(cust_name, loan_no) a view selects
data from both tables.
Create view loan_info as
Transactions:
A transaction consists of a sequence of query and/or update statements. A
transaction begins implicitly when the SQL statement is executed. SQL
statements must end with the any one of the following statements commit or
rollback.
1. commit work: commits the current transactions, it makes the updates
performed by the transaction become permanent in the database. After the
transaction is committed a new transaction is automatically started .
2. rollback work: cause the current transaction to be rolled back. It un-
does all the updates performed by sql statements in the transaction.
The keyword ‘work’ is optional for both the transactions.
For some transactions which are to be atomic, i.e., two or more
transactions need to be executed as single transactions. Such as transfer of
funds from one account to another, SQL:1999 standard has a better
alternative supported in some SQL implementations only Multiple SQL
statements can be enclosed between keywords
Begin atomic … End
All the statements between the keywords then form a single
transaction.
Joined Relations:
Apart from joining tuples by Cartesian product sql provides mechanisms to
join relation such as condition join natural join & various forms of outer
joins.
These joins operations are typically used as sub-query expressions in
the from clause.
Inner join:
Consider two relations
Loan(loan_no, branch_name, amount)
borrower(customer_name, loan_no)
To use inner join we can write the following in the from clause of the
select statements
Select………
from loan inner join borrower on
loan.loan_no = borrower.loan_no;
the expression computes the join of the loan & the borrower relations. the
attributes of the result consist of the attributes of the left hand side
relation followed by the attributes of the right hand side relation.
using ‘as’ clause we can rename the attributes of the result
ex: loan inner join borrower
on loan.loan_no=borrower.loan_no
as Lb (loan_no, branch, amount, cust, cust_loan_no);
Inner join joins tuples of the both relations which are having only
matching tuples in the opposite relations.
Natural join:
Natural join joins two relations without specifying the condition for
joining but it requires both relations have same attribute names which are
to be joined.
Ex: loan_no in laon & loan_no in borrower.
Select …….
from loan natural inner join borrower;
This expression computes the natural join of the two relations using
the only common attribute name loan_no.
Using natural join is similar to inner join but it result will produce
loan_no attribute only once.
The attributes will be as.(loan_no, branch_name, amount,
customer_name)
Where as inner join with joing condition will produce the loan_no
twice.(loan_no, branch, branch_name, amount, customer_name, loan_no)
Loan
Borrower Customer
Loan_no Branch_name Amount Cust_name Loan_no
L-170 Dvg 3000 Ramesh L-170
L-230 Cta 4000 Raju L-230
L-260 Smg 2530 Ravi L-155
Integrity Constraints:
Integrity constraints ensure that changes made to the database by
authorized users do not result in a loss of data consistency. Integrity
constraints guard against accidental damage to the database.
Example of integrity constants:
1. An account balance cannot be null.
2. No two accounts can have the same account no.
3. Every account no in the depositer relation must have a matching
account_no in the account relation.
4. The hourly salary of a bank employee must be at least 6.00 per hour.
Constraints on a single relation
1)NOTNULL, 2)unique, 3)check(predicate)
Referential integrity is a constraint that forces value in one
relation depends on value in other relation.
1. NOTNULL constraints:
‘NULL’ value is a member of all domains & is a legal value to be
inserted to any attribute of a relation. But, for some attribute it is in
appropriate to insert a null value.
Ex: 1)account_no of an account cannot be null
2)Balance in an account cannot be null.
This NOTNULL constraints can be forced in the create table statements
as fallows.
Create table account
(Account_no char(10) not null,
Balance number(12,2) not null);
NOTNULL specification prohibits the insertion of a null value for the
attribute. A primary key of a relation cannot be ‘NULL’ i.e., the primary
key attribute cannot contain a null value.
2. Unique constraints:
Unique (Aj1,Aj2,………,Ajm)
The unique specification says that attributes Aj1,…….,Ajm form a
candidate key. No two tuples in the relation can be equal in all the primary
key attributes. A candidate key attributes can have null value, if it is not
declared as NOTNULL.
3. Check clause:
Check clause in SQL can be applied to relation declarations as well as
to domain declarations. In a relation declaration the clause check(p)
specifies a predicate ‘p’ that must be satisfied by every tuple in a
relation. A comman use of check clause is to ensure that attribute values
satisfy specified condition.
Ex:1. check(salary>=1000)
Implies that salary of an employee must be atleast 1000.
2. To check simulated enumerated type
Create table student
(name char(15) not null,
combination char(5) Check(combination in(“pcm”,”pmcs”,”cmcs”,”cbz”)));
Check clause applied to a domain declaration.
Create domain hourly_wage number(5)
constraint wage_test check(value>=6);
Where hourly wage is used as a domain that attribute can have atleast ‘6’ as
its value.
4. Referential integrity:
Referential integrity ensure that a value that appears in one relation
for a given set of attributes depends on certain set of attributes in
another relation.
“Foreign key” is one of the relational integral constraint.
A branch relation contains branch information. An ‘account’ created in
account relation contains branch_name that should be listed in branch
relation. This can be forced using foreign key.
Ex: the definition of account table
Create table account
(…………..
Foreign key(branch_name) references branch);
In general, let r1(R1) & r2(R2) be relations with primary key k1 & k2
respectively. We say that a subset α of R2 is a foreign key referencing k1
in relation R1. If it is required that for every tuple t2 in r2 there must
be a tuple t1 in r1 such that t1[k1]=t2[α].
Requirements of this form are called referential integrity
constraints, or subset dependencies. By default in SQL a foreign key
reference the primary key attributes of the reference table.
Short form of defining foreign key:
Create table ………
(…………………….
Branch_name char (15) reference branch ,
……………………);
When a referential integrity is violated the normal procedure is to
reject that operation. However a foreign key clause can specify that if a
delete or update operation is executed on a referenced relation instead of
rejecting that the action, system takes steps to change tuple in the
referencing relation to restore the constraint.
Ex: create table account
(……………
Foreign key (branch_name ) reference branch
on delete cascade
on update cascade,
……………);
If any tuple in the referenced relation is deleted the corresponding
rows in the account relation are deleted automatically. Similarly if any
tuple in the referenced branch table is updated the corresponding branch
name in the referencing account table is also updated.
We can also use ‘set null’ instead of cascades which will set the
value to null. We can also set the value to default value of that domain by
‘set default’.
Functional dependency
In a relation R, attribute ‘α’ is functionally dependent on attribute
β of R if only if each β value in R has associated with it precisely one ‘α’
value in R (at any one time)
A functional dependence is a special form of integrity constraint.
Ex: consider loan(loan_no, branch_name, Amount)
The relation loan satisfies: Loan.Loan_no->loan.Amount.
We mean that every legal extension of that relation satisfies that
constraint. Recognizing functional dependencies is an essential part of
understanding the meaning or semantics of the data.
Partial dependency:
A functional dependency α->β is called a partial dependency if there
is a proper subset γ of α that α->β. We say that β is partially dependent on
α.
Functional dependency:
Consider a relation schema R and let αR and βR. the functional
dependency α->β Holds on schema R, if in any legal relation r(R) for all
pairs of tuple α->β
Let t1 and t2 in r such that t1[α]=t2[α],
It is also the case that t1[β]=t2[β].
Given a relation R attribute Y of R is functionally dependent on
attribute X of R if and only if each X value I R has associated with it
prescisely one Y value in R at any instance.
A key is a set of attributes that uniquely identifies an entire tuple.
A functional dependency allows us to express constraints that uniquely
identity the values of certain attributes.
Functional dependencies allow us to express constraints that we cannot
express with Super keys.
Ex: bor_loan(customer_id, loan_no, branch_name, amount)
Candidate key is the combination of customer_id and loan_no. But the
functional dependency is, Loan_no -> amount, branch_name.
Uses of functional dependencies:
1) To test relations to see whether they are legal under a given set of
functional dependencies.
If a relations r is legal under a set F of functional dependencies we
say that r satisfies ‘F’
2) To specify constraint on the set of legal relations. We shall thus
concern ourselves with only those relations that satisfy a given set
of functional dependencies.
If we wish to constrain ourselves to relation on schema R that satisfy
a set F of functional dependencies we say that F holds on R.
Some functional dependencies are said to be ‘trivial’ because they are
satisfied by all relations.
EX: A->A
A functional dependency α->β is ‘trivial’ if β α
Closure of F (F+):
F+ is a closure of set F. the set of all functional dependencies that
can be inferred given the set F, F+ is a super set of F.
name city
name
id
branch address
customer
Works_in
Cust_bank type
Emp_id
employee
name
To ensure our requirement we change it as follows.
name city
branch name
id
address
Cust_bank
Type customer
er_branch
Emp_id
employee
name
This allows more than one personal banker for every customer. But this
is not in BCNF, Because emp_id is not a super key.
But by using two relations.
(customer_id, employee_id, type)
(employee_id, branch_name)
We can achieve BCNF but this is exactly same as first E-R diagram
using works_in relationship. We can express the constraint, that a customer
may have at most one personal banker at a given branch by Functional
Dependency: Customer_id, branch_name->Employee_id
In our BCNF design there is no schema that includes all the attributes
appearing in Functional Dependency. Our design is not dependency preserving.
not only for testing whether α is a super key but also for several other
tasks.
Let α be a set of attributes we call the set of all attributes
functionally determined by α under a set F of functional dependencies the
closure of α under F. we denote it by α+
Algorithm for computing closure of α under F
result:=α;
while(changes to result) do
For each functional dependency β->γ in F do
begin
if β⊆ result then result:=result ∪ γ;
end
Algorithm working:
Compute (AG)+ with the functional dependencies {A->B,A->C,CG->H,CG->I,
B->H}.
A->B causes us to include B in result. To see this fact we observe
that A-> B is in F. Aresult (which is AG), so result:=result ∪ B.
A->C causes result become ABCG.
CG->H causes result become ABCGH.
CG->I causes result to become ABCGHI.
Uses of attribute closure algorithm
To test if α is a super key, we compute α+ and check if α+ contains
all attributes of R.
We can check if a functional dependency α->β holds. By checking βα+.
That is we compute α+ by using attribute closure and then check if it
contains β.
It gives us an alternative way to compute F+: for each γ⊆r, we find
the closure γ+, and for each sγ+, we output a functional dependency
γ->s.
Lossless decomposition
Let R be relation schema, F be a set functional dependency on R.
Let R1 and R2 from a decomposition of R.
Let r(R) be a relation with schema R.
we say that the decomposition is lossless decomposition if for all legal
database instances.
ΠR1(r)⋈ΠR2(r)=r.
In other words if we project γ onto R1 and R2 and compute the natural
join of the projection results we get back exactly r.
R1 and R2 from a lossless decomposition of R, if at least one of the
following functional dependencies is in F+:
R1∩R2->R1
R1∩R2->R2
If R1∩R2 forms a super key of either R1 and R2 the decomposition of R
is lossless decomposition .we can use attribute closure to test efficient
for super key.
A decomposition that is not a lossless decomposition is called a loosy
decomposition (lossy-join decomposition). Lossless decomposition is also
called lossless join decomposition.
Dependency Preservation
Let F be a set of functional dependencies on a schema R, and let
R1,R2….Rn be a decomposition of R. the restriction of F to Ri is the set of
Fi of all functional dependencies of F+ that include only attributes of Ri.
The set of restrictions F1,F2,…..Fn is the set of dependencies that can
be checked efficiently.
Let F’=F1∪F2∪……Fn
F’ is a set of functional dependencies on schema R, but in general F’≠F.
However even if F’≠F it may be F’+=F+.
We say that a decomposition having the property F’+=F is a dependency
preserving decomposition.
Decomposition to 3NF:
Algorithm for finding a dependency-preserving lossless decomposition
into 3NF use set of dependencies FC, cannonical cover of F.
Algorithm
Let Fc be a canonical cover for F;
i:=0;
For each function dependency αβ in Fc Do
if none of the schemes Rj j=1,2…i contains αβ
then begin
i:=i+1;
Ri:=αβ;
End
If none of the schemas Rj j=1,2,…i Contains a candidate key for R
then begin
i:=i+1;
Ri:= any candidate key for R;
End
Return(R1,R2,….Ri)
This algorithm is also called the 3NF Synthesis algorithm since it
takes a set of dependencies and adds one schemas at a Time, instead of
decomposing the initial schema repeatedly.
R1=(customer_id, customer_name)
R2=(loan_no, cust_id, cust_street, cust_city)
If a customer has more than one address, we cannot enforce the
functional dependency, Both the relations are in BCNF, but if a customer has
more than one address we can’t deal with this problem. To deal with it we
must define a new form of constraint called a “multivalued dependency”.
Multivalued dependency:
Consider Functional dependency AB, then we cannot have two tuples
with the same value but different B values. But we can have two tuples with
the same value but different B values using Mulitivalued dependency.
Functional Dependencies are refered to as equality generating
dependencies, and multivalued dependency is reffered to as tuple generating
dependencies.
Let R be a relation schema & αR and βR the multivalued dependency
αβ Holds on R if any legal relation r(R),for all pairs of topples t1 &t2
in r such that t1[α]=t2[α],there exist tuples t3 &t4 in r such that
t1[α]=t2[α]=t3[α]=t4[α]
t3[β]=t1[β]
t3[R-β]=t2[R-β]
t4[β]=t2[β]
t4[R-β]=t1[R-β]
R Α β R-α-β
t1 a1…ai ai+1…aj aj+1……an
t2 a1…ai bi+1…bj bj+1……bn
t3 a1…ai ai+1…aj bj+1……bn
t4 a1…ai bi+1…bj aj+1……an
Above table gives a tabular picture of t1, t2, t3 & t4. The
multivalued dependency αβ says that the relation between α β is
independent of the relationship between α & R-β.
If the multivalued dependency αβ is satisfied by all relations on
schema R then αβ is trivial multivalued dependency on schema R thus
αβ is trivial of βα or βα=R.
5. Transaction Management.
Concept:
Collection of operations that form a single logical unit of work is
called transaction.
A transaction is a unit of program execution that accesses and
possibly updates various data items.
A transaction is initiated by a user program written in high level
data manipulation programming language such as SQL, C++ or JAVA in which
transaction is delimited by begin transaction and end transaction
statements.
Transactions accesses data uses two operations:
Read(x): transfers data item x from the database to a local buffer belonging
to the transaction.
Write(x): transfers the data item x from the local buffer of the transaction
to the database.
ACID Properties of transaction to ensure integrity of data:
1. Atomicity: suppose a transaction Ti has to transfer an amount $50
from account A which has $1000 to account B which has $2000. This operation
is a single transaction as per customers view but it includes two operations
in terms of a database view; 1. Write(A) to debit $50 from account A. and 2.
Write(B) to credit $50 to account B. suppose system fails soon after
write(A) executed but before write(B), then account A will have $950 and B
will have $2000. Thus the database state is inconsistent.
Atomicity ensures that all operations of the transaction must be
executed otherwise none has to be executed.
A transaction management component ensures atomicity; the component
keeps track of the old values of any data on the disk on which a transaction
performs a write & if the transaction does not complete its execution the
Database system restores the old values.
2. Consistency: execution of a transaction in isolation prevents
concurrently executing transactions and there by prevents the inconsistency
of the database.
Ensuring consistency for an individual transaction is the
responsibility of the application programmer who does the transaction.
3. Isolation: even though multiple transactions may execute
concurrently the system guarantees that for every pair of transactions Ti &
Tj, either Tj finishes execution before Ti starts or Tj starts execution after
Ti finishes. Thus each transaction is unaware of concurrently executing
transactions in the system.
Concurrency management component ensures the isolation.
Active
Failed Aborted
Concurrent Executions.
Allowing multiple transactions to update data causes several complications
with consistency of data. It is far easier to insist the transactions run
serially than ensuring consistency in concurrently executing transactions.
However there are two good reasons, that we require concurrency.
1. Improved throughput and resource utilization: A transaction consists
of many activities; I/O activity, CPU activity, Disk activity etc. A
CPU becomes idle when a transaction is in I/O activity this time may
be exploited and some other process may be allowed to execute the CPU
activity. When the first process is executing disk activity and the
second on executing I/O activity the third transaction may be allowed
to execute its CPU activity. As we can execute more transactions at a
time it increases the throughput of the system. Correspondingly the
processor and disk utilization also increases.
T2: read(A);
Temp:=a*0.1;
A:=A-temp;
Write(A);
Read(B);
B:=B+temp;
Write(B);
Suppose A and B contains $1000 and $2000 respectively at initial time;
different possible schedules for executions of two transactions may be
written.
The execution sequences are called schedules they represent the
chronological order in which instructions are executed in the system.
T1 T2 T1 T2
read(A); read(A);
A:=A-50; Temp:=a*0.1;
Write(A); A:=A-temp;
Read(B); Write(A);
B:=B+50; Read(B);
Write(B); B:=B+temp;
read(A); Write(B);
Temp:=a*0.1; read(A);
A:=A-temp; A:=A-50;
Write(A); Write(A);
Read(B); Read(B);
B:=B+temp; B:=B+50;
Write(B); Write(B);
Schedule1: Serial Schedule2: Serial Schedule
Schedule
switching and execute the second the second transaction for some time, and
so on the CPU time is shared among all the transactions. Various
instructions from both transactions may be interleaved, one of the possible
schedule is below.
T1 T2
read(A);
A:=A-50;
Write(A);
read(A);
Temp:=a*0.1;
A:=A-temp;
Write(A);
Read(B);
B:=B+50;
Write(B);
Read(B);
B:=B+temp;
Write(B);
Schedule3: Concurrent
Schedule
The above scheduling preserves the database consistency. However all
the concurrent executions result in correct state of the database. Consider
the following scheduling;
T1 T2
read(A);
A:=A-50;
read(A);
Temp:=a*0.1;
A:=A-temp;
Write(A);
Read(B);
Write(A);
Read(B);
B:=B+50;
Write(B);
B:=B+temp;
Write(B);
Schedule4: Concurrent
Schedule with database
inconsistency
Before executing the above schedules account A has $1000 and Account B
has $2000. After the transactions executed Account A will have $950, and
account B will have $2100. So we could not preserve A+B values.
If control of concurrent execution is left to the operating system,
many possible schedules including the schedules similar to the above
schedules may be possible which lead to inconsistent database state. Hence
concurrency control component of the database system ensures the concurrent
transactions will not leave the database state into inconsistent state.
Serializability:
A transaction is a program, it is computationally difficult to
determine exactly what operation a transaction performs and how operations
of various transactions interact. So we will consider only two operations
read and write. Read(Q) and Write(Q) is an instruction on data item Q.
T1 T2 T2 T1
The single edge from T1-> T2 implies all the instructions of T1 are
executed before the first instruction of T2 is executed. Similarly edge from
T2-> T1 implies all the instructions of T2 are executed before the first
instruction of T1 is executed.
T1 T2
The precedence graph contains the edge from T1->T2 because T1 executes
read(A) before T2 executes write(A). it also contains edge from T2->T1
because T2 executes read(B) before T1 executes write(B).
If the precedence graph for S has a cycle S is not conflict
serializable. If the graph contains no cycles, then the schedule S is
conflict serializable.
Tj Tk Tk
Tm Tj
Tm
Defence Machanisms
Human factors: in advertent assignment of authorized to a wrong class
of users can result in possible security violations, the authorizer is
Authorization:
Authorization is the culmination of the administrative policies of the
organization, expressed as a set of rules that can be used to determine
which user has what type of access of which portion of the database.
The person who is in charge of specifying the authorization is usually
called the authorizer. The authorizer can be distinct from the DBA.
Authorization is usually maintained in the form of a table called an
access matrix, it contains rows called subjects and columns termed as
objects.
Object: an object is something that needs protection, a data field ,a
record, or a file could be considered an object, views also considered as
object.
Granularity: granularity can be choosen to a file. A record or a
data item. The smaller the protected abject finer the degree of specifying
protection but finer the granularity more is the size of the authorization
matrix & over head of enforcing database security.
Subject: a subject is an ‘user’ who is given some rights to access a
data object.
Privilages or access types:
1)Read: allows only reading object .
2)insert: allows inserting new occurance of the object.
3)delete: allows deleting on existing occurance of the object type.
4)update: allows the subject to change the value of the occurance of the
object .
5)add: allows the subject to add new object types (ex: new relations)
6)drop: allows the subject to drop/delete existing object types from the
database.
7)alter: allows the subject to add/delete attributes to/from the existing
type or relation.
8)propogate access control: this right determines if this subject is allowed
to propogate the right over the object to other subjects.
Granting of privilages:
A user who has been granted some form of authorization may be allowed
to pass on this authorization to other users. Passing of authorization from
one user to another can be represented by an ‘authorization graph’, the
nodes of this graph are users and edges indicate privilages given from user-
i to user-j.
U2 U5
U3
Example:
Revoke update (amount) on loan from u1,u2,u3.
Revoke reference (branch_name) on branch from u1.
Authorization on view
A view provides a means to a user with a personalized model of the
database. A view can hide data that a user does not need to see.
Views simplify system usage because they restrict the user’s attention
to the data of interest.
Although a user may be derived direct access to a rational that user
may be allowed to access part of that relation through a view. Thus a
combination of relation level security and view level security limit a
user‘s access to precisely the data that the user needs.
Ex: a clerk in the bank needs to know names of all customers who have
a loan at each and nothing more than this. He has to be restricted access to
loan and borrower relations but he can give access to view, Cust_loan:
Create view cust_loan as
(select branch_name, customer name
From borrower, lone
Where borrower.lone_no=lone.lone_no)
The creater of the view must have read authorization on both relation
borrower and loan.
Security of data: data must be protected while they are being
transmitted. Data may need to be protected from intruders, Who are able to
bypass operating system security. Encryption is one of the techniques to
enforce security.
Encryption: Encryption of data will make the data to be stored in
different way.
Ex: consider substitution of each character with next character