Multi-User Relational Database - UA
Multi-User Relational Database - UA
RELATIONAL DATABASE
MANAGEMENT
NTA LEVEL 6
Introduction to RDBMS
OVERVIEW
• Database is “a collection of related data” and data is a collection of facts and figures that
• For example, if we have data about marks obtained by all students, we can then conclude
• A database management system stores data in such a way that it becomes easier to
2. Some inherent meaning (represents some partial view of a portion of the real world)
4. A largely varying size (from a personal list of addresses to the National Register of Persons)
6. A physical organization of varying complexity (from a manual personal list, managed with
simple files, to huge multi-user databases with geographically distributed data and access)
7. logically-coordinated objectives, data is defined once for a community of users, and accessed
by various applications with specific needs
Data Management
involves both the storage of information and the provision of mechanisms for the
manipulation of information.
• In addition, the system should also provide the safety of the information stored
1. File-based systems
2. Database systems.
File-based Systems
• File-based systems are an early approach to data management where data is
stored and managed in flat files, such as text files, CSV files, or binary files.
These systems were widely used before the advent of modern database
management systems (DBMS). These could be accessed by a computer operator.
• Files of archived data were called tables because they looked like tables used in
traditional file keeping.
When to Use File-Based Systems
Accounts HR Production
Data Data Data
• Data dependence
• Data duplication
• Some of these issues could be traced back to the disadvantages of File-based systems.
• A database allows quick and easy management of data. For example, a company may maintain
details of its employees in various databases.
• At any point of time, data can be retrieved from the database, new data can be added into the
databases and data can be searched based on some criteria in these databases.
• Data storage can be achieved even using simple manual files. For instance, a college has to
maintain information about teachers, students, subjects, and examinations.
• The registers or files are bulky, consume a lot of space, and hence, cannot be kept for many years.
Advantages of database systems
• The main problem with the earlier DBMS packages was that the data was stored in the flat file
format.
• So, the information about different objects was maintained separately in different physical files.
• Hence, the relations between these objects, if any, had to be maintained in a separate physical
file.
• Thus, a single package would consist of too many files and vast functionalities to integrate
them into a single system.
DBMS
DBMS Functions
• DBMS: a collection of general-purpose, application-independent programs providing
services to;
1. Define the structure of a database, i.e., data types and constraints that the data will have to
satisfy
2. Manage the storage of data, safely for long periods of time, on some storage medium
controlled by the DBMS
3. Manipulate a database, with efficient user interfaces to query the database to retrieve
specific data, update the database to react changes in the world, generate reports from the
data
4. Manage database usage: users with their access rights, performance optimization, sharing of
data among several users, security from accidents or unauthorized use
• It can be;
1. Centralized
2. Decentralized
3. Hierarchical
1. Single tier
2. Or multi-tier.
• An n-tier architecture divides the whole system into related but independent n
modules, which can be independently modified, altered, changed, or replaced.
Database Models
Database Models
• Logical
• Network Model
• Relational Model
• Entity-Relationship Model
• Object-Oriented Model
• Hierarchical Database Model
• Physical
• Inverted Index
• Flat File
Flat File Data Model
• In this model, the database consists of only one table or file.
• This model is used for simple databases - for example, to store the roll numbers,
names, subjects, and marks of a group of students.
• A parent record can have several children, but a child can have only one parent.
• To find data stored in this model, the user needs to know the structure of the
tree.
• This model is very efficient when a database contains a large volume of data.
• For example, a bank's customer account system fits the hierarchical model
well because each customer's account is subject to a number of transactions.
Network Data Model
• In the network model, data is stored in sets, instead of the hierarchical tree format.
• The set theory of the network model does not use a single-parent tree hierarchy.
• Integrated Database Management System (IDMS) from Computer Associates International Inc.
and Raima Database Manager (RDM) Server by Raima Inc. are examples of a Network DBMS.
• The network model together with the hierarchical data model was a major data model for
implementing numerous commercial DBMS.
Example
The components of network models
• The components of the language used with network models are as follows:
1. A Data Definition Language (DDL) that is used to create and remove databases and
database objects. It enables the database administrator to define the schema
components.
2. A sub-schema DDL that enables the database administrator to define the database
components.
3. A Data Manipulation Language (DML), which is used to insert, retrieve, and modify
database information. All database users use these commands during the routine
operation of the database.
4. Data Control Language (DCL) is used to administer permissions on the databases and
database objects.
Advantages Of The Network Model
• The lack of query facility took a lot of time of the programmers to produce even the
simplest reports.
• This led to the development of what came to be called the Relational Model database.
• In the Relational Model, unlike the Hierarchical and Network models, there are no physical
links.
• All data is maintained in the form of tables consisting of rows and columns.
• Data in two tables is related through common columns and not physical links.
Marks Table
Students Table
The Students table displays the Roll Number and the Student Name, and the Marks
table displays the Roll Number and Marks obtained by the students.
• The relational database model gives the programmer time to concentrate on the logical
view of the database rather than being bothered about the physical view.
• One of the reasons for the popularity of the relational databases is the querying flexibility.
• An RDBMS uses SQL to translate the user query into the technical code required to
retrieve the requested data.
• Relational model is so easy to handle that even untrained people find it easy to generate
handy reports and queries, without giving much thought to the need to design a proper
database.
Disadvantages Of The Relational Model
• Database Designer
• End User
The components of an RDBMS
1. Entity
• An entity is a person, place, thing, object, event, or even a concept, which can be
distinctly identified. For example, the entities in a university are students, faculty
members, and courses.
• Each entity has certain characteristics known as attributes. For example, the
student entity might include attributes such as student number, name, and grade.
Each attribute should be named appropriately.
• A grouping of related entities becomes an entity set. Each entity set is given a
name. The name of the entity set reflects the contents.
• Thus, the attributes of all the students of the university will be stored in an entity
set called Student.
2. Tables and their Characteristics
• The terms entity set and table are often used interchangeably.
• Each table must have a key known as primary key that uniquely identifies each row.
• All values in a column must conform to the same data format. For example, if the attribute is
assigned a decimal data format, all values in the column representing that attribute must be in
decimals.
• Each column has a specific range of values known as the attribute domain.
• It defines how the data is organized and how the relations among them are associated.
• A database schema defines its entities and the relationship among them.
• It’s the database designers who design the schema to help programmers understand
the database and make it useful.
Example
Schema Categories
• A database schema can be divided broadly into two categories:
• Once the database is operational, it is very difficult to make any changes to it.
• A database instance is a state of operational database with data at any given time.
• A DBMS ensures that its every instance (state) is in a valid state, by diligently following all
the validations, constraints, and conditions that the database designers have imposed.
Data Independence
• The ability to modify schema definition
in one level without affecting schema
definition in the next higher level is
called data independence.
• For example, a change to the internal schema, such as using different file
organization or storage structures, storage devices, or indexing strategy, should be
possible without having to change the conceptual or external schemas.
Logical data independence
• Logical data independence is the ability to modify the logical schema without causing application
program to be rewritten.
• Modifications at the logical level are necessary whenever the logical structure of the database is
altered (for example, when money-market accounts are added to banking system).
• Logical Data independence means if we add some new columns or remove some columns from
table then the user view and programs should not change.
• For example: consider two users A & B. Both are selecting the fields "EmployeeNumber" and
"EmployeeName".
• If user B adds a new column (e.g. salary) to his table, it will not effect the external view for user A,
though the internal schema of the database has been changed for both users A & B.
Data Modelling
Data Modelling
• A data model is a group of conceptual tools that describes data, its relationships, and
semantics.
• Data Modeling is the process of applying an appropriate data model to the data, in
order to organize and structure it.
• Data models help database developers to define the relational tables, primary and
foreign keys, stored procedures, and triggers required in the database.
Steps for Data Modelling
• The data modeler identifies the highest level of relationships in the data.
• The data modeler describes the data and its relationships in detail. The data
• The data modeler specifies how the logical model is to be realized physically.
Conceptual Data Model
• A conceptual data model identifies the highest-level relationships between the
different entities.
• Describes the semantics of a domain, being the scope of the model. For
example, it may be a model of the interest area of an organization or
industry.
• Includes the important entities and the relationships among them.
• No attribute is specified.
• The steps for designing the logical data model are as follows:
• Normalization.
Logical vs Conceptual
• Comparing the logical data model shown above with the conceptual data model diagram,
we see the main differences between the two:
• In a logical data model, primary keys are present, whereas in a conceptual data model,
no primary key is present.
• In a logical data model, all attributes are specified within an entity.
• Relationships between entities are specified using primary keys and foreign keys in a
logical data model.
• In a conceptual data model, the relationships are simply stated, not specified, so we
simply know that two entities are related, but we do not specify what attributes are
used for this relationship
Physical Data Model
• Physical data model represents how the model will be built in the database.
• A physical database model shows all table structures, including column name, column data type,
column constraints, primary key, foreign key, and relationships between tables.
• Physical considerations may cause the physical data model to be quite different from the logical
data model.
• For example, data type for a column may be different between MySQL and SQL Server.
Example
Steps For Physical Data Model Design
requirements.
Physical vs Logical
• Comparing the physical data model shown above with the logical data model
• Data types can be different depending on the actual database being used.
Entity Relationship Model
• At view level, the ER model is considered a good option for designing databases.
• Physical model
• Entity Set
• Attributes
• Relationship
• Relationship set
Entity
• An entity is a real-world object that exists physically and is distinguishable
from other objects.
• For example, a Students set may contain all the students of a school; likewise
a Teachers set may contain all the teachers of a school from all faculties.
• Entity sets that do not have enough attributes to establish a primary key are
called weak entity sets.
• Entity sets that have enough attributes to establish a primary key are called
strong entity sets.
• Example, an assignment and a student can be considered as two separate entities.
• Simple attribute
• Simple attributes are atomic values, which cannot be divided further. For example, a
student's phone number is an atomic value of 10 digits.
• Composite attribute
• Composite attributes are made of more than one simple attribute. For example, a
student's complete stud_name may be broken into stud_first_name + stud_last_name.
• Derived attribute
• Derived attributes are the attributes that do not exist in the physical database, but
their values are derived from other attributes present in the database. For example,
average_salary
• Single-value attribute
since a student usually has one last name that uniquely identifies him/her.
• Multi-value attribute
• Multi-value attributes may contain more than one values. For example, a person can have more than
• Stored Attribute
• An attribute, which cannot be derived from other attribute, is known as stored attribute. For
• Key Attribute
• It is an attribute, that has distinct value for each entity/element in an entity set. For example, Roll
entity set.
• For example, the roll_number of a student makes him/her identifiable among students.
• Super Key − A set of attributes (one or more) that collectively identifies an entity
in an entity set.
• Candidate Key − A minimal super key is called a candidate key. An entity set
• Primary Key − A primary key is one of the candidate keys chosen by the
relationship.
• Self-relationships
• Binary = degree 2
• Ternary = degree 3
• n-ary = degree
Self-relationships
• Relationships between entities of the same entity set are called self-
relationships.
• For example, a manager and his team member, both belong to the employee
entity set.
• Thus, the relation, 'works for', exists between two different employee entities
of the same employee entity set.
Binary relationships
• Relationships that exist between entities of two different entity sets are called binary
relationships.
• The relation exists between two different entities, which belong to two different entity
sets.
• The relation, 'works' exists between all three, the employee, the department,
and the location.
ER Model to Relational Model
Mapping Cardinalities
• One-to-one
• Many-to-one
• Many-to-many
One-to-one
• One entity from entity set A can be associated with at most one entity of entity set B and vice versa.
• Consider the relationship between a vehicle and its registration. Every vehicle has a unique registration.
No two vehicles can have the same registration details. The relation is one-to-one, that is, one vehicle-one
registration.
One-to-many
• This kind of mapping exists when an entity of one set can be associated with more than one
entity of another entity set.
• A customer can have more than one vehicle. Therefore, the mapping is a one to many mapping,
that is, one customer - one or more vehicles.
Many-to-One
• This kind of mapping exists when many entities of one set is associated with an entity of another set. This
association is done irrespective of whether the latter entity is already associated to other or more entities of the
former entity set.
• Consider the relation between a vehicle and its manufacturer. Every vehicle has only one manufacturing
company or coalition associated to it under the relation, 'manufactured by', but the same company or coalition
can manufacture more than one kind of vehicle.
Many-to-Many
• This kind of mapping exists when any number of entities of one set can be associated with any
number of entities of the other entity set.
• Consider the relation between a bank's customer and the customer's accounts. A customer can
have more than one account and an account can have more than one customer associated with it
in case it is a joint account or similar. Therefore, the mapping is many-to-many, that is, one or
more customers associated with one or more accounts.
Entity-Relationship Diagrams
• The E-R diagram is a graphical representation of the E-R model. The E-R diagram, with
the help of various symbols, effectively represents various components of the E-R
model.
Example
• Entity Multivalued Attributes
• Attributes
• One-to-one (1:1)
• One-to-many (1:N)
• Many-to-many (N:N)
Participation Constraints
• For example, a particular student named Mira can be generalized along with all the
students.
• The reverse is called specialization where a person is a student, and that student is
Mira.
Generalization Example
• Pigeon, house sparrow, crow and dove can all be
generalized as Birds.
Specialization
• Specialization is the opposite of generalization. In
specialization, a group of entities is divided into sub-
groups based on their characteristics.
• The structure description of the entire database must be stored in an online catalog,
known as data dictionary, which can be accessed by authorized users. Users can use
the same query language to access the catalog which they use to access the database
itself.
• A database can only be accessed using a language having linear syntax that supports
data definition, data manipulation, and transaction management operations. This
language can be used directly or by means of some application. If the database allows
access to data without any help of this language, then it is considered as a violation.
• All the views of a database, which can theoretically be updated, must also be
updatable by the system.
• Rule 7: High-Level Insert, Update, and Delete Rule
• A database must support high-level insertion, updation, and deletion. This must not be
limited to a single row, that is, it must also support union, intersection and minus operations
to yield sets of data records.
• The data stored in a database must be independent of the applications that access the
database. Any change in the physical structure of a database must not have any impact on
how the data is being accessed by external applications.
• The logical data in a database must be independent of its user’s view (application). Any
change in logical data must not affect the applications using it. For example, if two tables are
merged or one is split into two different tables, there should be no impact or change on the
user application. This is one of the most difficult rule to apply.
• Rule 10: Integrity Independence
• A database must be independent of the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.
• First-order logic uses quantified variables over (non-logical) objects. It allows the
use of sentences that contain variables, so that rather than propositions such as
Socrates is a man one can have expressions in the form X is a man where X is a
variable.
• The content of the database at any given time is a finite (logical) model of the
database, i.e. a set of relations, one per predicate variable, such that all predicates
are satisfied.
• A request for information from the database (a database query) is also a predicate.
Concepts
• Tables − In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.
• Tuple − A single row of a table, which contains a single record for that relation is called a tuple.
• Relation Instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.
• Relation Schema − A relation schema describes the relation name (table name), attributes, and
their names.
• Relation Key − Each row has one or more attributes, known as relation key, which can identify the
row in the relation (table) uniquely.
• Attribute Domain − Every attribute has some pre-defined value scope, known as attribute domain.
Example
Constraints
• Every relation has some conditions that must hold for it to be a valid relation.
• Key constraints
• Domain constraints
• For example:
• The domain of Marital status has a set of possibilities: Married, Single, Divorced
• The domain of day Shift has the set of all possible days : {Mon, Tue, Wed…}.
• The domain of Salary is the set of all floating-point numbers greater than 0 and less than
200,000.
• The domain of First Name is the set of character strings that represents names of people.
• In summary, a Domain is a set of acceptable values that a column is allowed to contain. This is
based on various properties and the data type for the column.
• Age cannot be less than zero and telephone numbers cannot contain a digit outside 0-9.
• A foreign key is a key attribute of a relation that can be referred in other relation.
• For example, a table called Employee has a primary key called employee_id.
• Another table called Employee Details has a foreign key which references
tables.
Normalization
Normalization
• Normalization involves decomposing a table into less redundant (and smaller) tables
without losing information, and then linking the data back together by defining foreign
keys in the old table referencing the primary keys of the new ones.
• Without Normalization, it becomes difficult to handle and update the database, without
facing data loss.
• Insertion, Updating and Deletion Anomalies are very frequent if Database is not
Normalized.
Objectives
• A basic objective of the first normal form defined by Codd in 1970 was to permit data to be queried and
• The objectives of normalization beyond 1NF (First Normal Form) were stated as follows by Codd:
1. To free the collection of relations from undesirable insertion, update and deletion dependencies;
2. To reduce the need for restructuring the collection of relations, as new types of data are
4. To make the collection of relations neutral to the query statistics, where these statistics are liable
to change as time goes by E.F. Codd, "Further Normalization of the Data Base Relational Model“
• The "Faculty and Their Courses" table described in the previous example suffers from this
type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses,
we must delete the last of the records on which that faculty member appears, effectively also
deleting the faculty member, unless we set the Course Code to null in the record itself.
Insertion Anomalies
• There are circumstances in which certain facts cannot be recorded at all.
• For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID,
Faculty Name, Faculty Hire Date, and Course Code—thus we can record the details of any faculty
member who teaches at least one course, but we cannot record the details of a newly hired
faculty member who has not yet been assigned to teach any courses except by setting the
Course Code to null.
Repetition Anomalies
• The data such as Project_id, Project_name, Grade, and Salary repeat many times.
• This repetition hampers both, performance during retrieval of data and the storage
capacity.
• BCNF
First Normal Form
• As per First Normal Form,
A. No two Rows of data must contain repeating group of information i.e each set of column must have
a unique value, such that multiple columns cannot be used to fetch the same row.
B. Each table should be organized into rows, and each row should have a primary key that
distinguishes it as unique.
• The Primary key is usually a single column, but sometimes more than one column can be
combined to create a single primary key.
• Rather than that, we must separate such data into multiple rows.
• Using the First Normal Form, data redundancy increases, as there will be many columns
with same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
• As per the Second Normal Form there must not be any partial dependency of any column on
primary key.
• It means that for a table that has concatenated primary key, each column in the table that is not
part of the primary key must depend upon the entire concatenated key for its existence.
• If any column depends only on one part of the concatenated key, then the table fails Second
normal form.
• In example of First Normal Form there are two rows for Adam, to include multiple subjects that
he has opted for. While this is searchable, and follows First normal form, it is an inefficient use
of space.
• Also in the above Table in First Normal Form, while the candidate key is {Student, Subject}, Age
of Student only depends on Student column, which is incorrect as per Second Normal Form.
• To achieve second normal form, it would be helpful to split out the subjects into an
independent table, and match them up using the student names as foreign keys.
• In Student Table the candidate key will be Student column, because all other column i.e.
Age is dependent on it.
• Now, both the above tables qualifies for Second Normal Form and will never
suffer from Update Anomalies.
• Although there are a few complex cases in which table in Second Normal Form
suffers Update Anomalies, and to handle those scenarios Third Normal Form is
there.
Third Normal Form (3NF)
• Third Normal form applies that every non-prime attribute of table must be dependent on primary key,
or we can say that, there should not be the case that a non-prime attribute is determined by another
non-prime attribute.
• So this transitive functional dependency should be removed from the table and also the table must be
in Second Normal form.
• In this table Student_id is Primary key, but street, city and state depends upon Zip.
• The dependency between zip and other fields is called transitive dependency.
• Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.
• New Student_Detail Table :
• Boyce and Codd Normal Form is a higher version of the Third Normal form.
• This form deals with certain type of anomaly that is not handled by 3NF.
• A 3NF table which does not have multiple overlapping candidate keys is said to
be in BCNF.
• The idea behind it is to add redundant data where we think a it will help us the most.
• We can use extra attributes in an existing table, add new tables, or even create instances
of existing tables.
• The usual goal is to decrease the running time of select queries by making data more
accessible to the queries or by generating summarized reports in separate tables.
When to Use Denormalization
• Read-Heavy Workloads:
• When the database is primarily used for querying and reading data (e.g., reporting systems, analytics, or
dashboards).
• Performance Bottlenecks:
• When normalized tables cause performance issues due to excessive joins or complex queries.
• Data Warehousing:
• When precomputing and storing aggregated or derived data (e.g., totals, averages) can save processing time
during queries.
• When the application requires simpler queries for ease of development or maintenance.
• Real-Time Systems:
• In real-time systems where low-latency responses are critical, denormalization can reduce query execution time.
Why Use Denormalization
• Improved Read Performance:
• By reducing the number of joins and simplifying queries, denormalization significantly speeds up read operations.
• Denormalized tables can make queries easier to write and understand, especially for complex analytical queries.
• In applications like e-commerce, social media, or content management systems, where reads far outnumber writes,
denormalization helps scale the system.
• Denormalization allows tailoring the database schema to specific query patterns, improving efficiency for those use
cases.
• By precomputing and storing redundant data, the database avoids recalculating or aggregating data during queries,
reducing CPU and memory usage.
What Are the Disadvantages of Denormalization?
• Data Redundancy:
• Storing redundant data increases storage requirements.
• Data Consistency:
• Updates, inserts, or deletes become more complex and slower, as the same data may
need to be updated in multiple places.
• Increased Maintenance:
• Ensuring data consistency across denormalized tables requires careful application
logic or triggers.
• Risk of Anomalies:
• Without proper management, denormalization can lead to data anomalies (e.g.,
inconsistent or outdated data).
Example
Denormalized
Normalized
JOIN
• A SQL join clause combines columns from one or more tables in a relational database. It
creates a set that can be saved as a table or used as it is.
• A JOIN is a means for combining columns from one (self-table) or more tables by using
values common to each.
2. LEFT OUTER,
3. RIGHT OUTER,
4. FULL OUTER
5. and CROSS.
• As a special case, a table (base table, view, or joined table) can JOIN to itself in a self-join.
Example
Theta (θ) Join
• Theta join combines tuples from different relations provided they satisfy the theta
condition. The join condition is denoted by the symbol θ.
• Notation
• R1 ⋈θ R2
• R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that
the attributes don’t have anything in common, that is R1 ∩ R2 = Φ.
• Equijoin
• When Theta join uses only equality (=) comparison operator, it is said to be equijoin.
Example
Natural Join (⋈)
• In addition, the attributes must have the same name and domain.
• An inner join includes only those tuples with matching attributes and the rest
are discarded in the resulting relation.
• Therefore, we need to use outer joins to include all the tuples from the
participating relations in the resulting relation.
• All the tuples from the Left relation, R, are included in the
resulting relation.
• If there are tuples in S without any matching tuple in R, then the R-attributes of
resulting relation are made NULL.
Full Outer Join: ( R ⟗S)
• All the tuples from both participating relations are included in the resulting relation. If there
are no matching tuples for both relations, their respective unmatched attributes are made
NULL.
Database Organization
Database Organization
• This topic will deal with the physical organization of the database.
• Secondary Storage − Secondary storage devices are used to store data for future use or as
backup. Secondary storage includes memory devices that are not a part of the CPU chipset or
motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.), hard disks, flash
drives, and magnetic tapes.
• Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such storage
devices are external to the computer system, they are the slowest in speed. These storage
devices are mostly used to take the back up of an entire system. Optical disks and magnetic
tapes are widely used as tertiary storage.
Memory Hierarchy
• A computer system has a well-defined hierarchy of memory.
• A CPU has direct access to it main memory as well as its inbuilt registers.
• The access time of the main memory is obviously less than the CPU speed.
• Cache memory provides the fastest access time and it contains data that is most
frequently accessed by the CPU.
• Larger storage devices offer slow speed and they are less expensive, however they
can store huge volumes of data as compared to CPU registers or cache memory.
Magnetic Disks
• Hard disk drives are the most common secondary storage devices in present computer systems.
• These are called magnetic disks because they use the concept of magnetization to store information.
• These disks are placed vertically on a spindle. A read/write head moves in between the disks and is used to
magnetize or de-magnetize the spot under it.
• A hard disk plate has many concentric circles on it, called tracks. Every track is further divided into sectors.
• RAID consists of an array of disks in which multiple disks are connected together to achieve different
goals.
• RAID 0
• RAID 1
• RAID 2
• RAID 3
• RAID 4
• RAID 5
RAID 0
• This configuration has striping but no redundancy of data.
• If one disk fails, then that affects the entire array and the
chances for data loss or corruption increases.
• With RAID 1, data is copied seamlessly and simultaneously, from one disk to another, creating
a replica, or mirror.
• It's the simplest way to implement fault tolerance and it's relatively low cost.
• A minimum of two disks is required for RAID 1 hardware implementations. With software RAID
1, instead of two physical disks, data can be mirrored between volumes on a single disk.
• One additional point to remember is that RAID 1 cuts total disk capacity in half: If a server with
two 1TB drives is configured with RAID 1, then total storage capacity will be 1TB not 2TB.
Example
RAID 2
• RAID 2 is similar to RAID 5, but instead of disk striping using parity, striping occurs at
the bit-level.
• RAID 2 is seldom deployed because costs to implement are usually prohibitive (a typical
setup requires 10 disks) and gives poor performance with some disk I/O operations.
• RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks.
• Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the
data words are stored on a different set disks.
RAID 3
• RAID 3, which is rarely used in practice, consists of byte-level striping with a dedicated parity disk.
• One of the characteristics of RAID 3 is that it generally cannot service multiple requests simultaneously,
which happens because any single block of data will, by definition, be spread across all members of the
set and will reside in the same location.
• Therefore, any I/O operation requires activity on every disk and usually requires synchronized spindles.
• For this reason, RAID 3 is best for single-user systems with long record applications.
RAID 4
• RAID 4 is a configuration in which disk striping happens at the byte level, rather than at the
bit-level as in RAID 3.
• Note that level 3 uses byte-level striping, whereas level 4 uses block-level striping.
• As a result of its layout, RAID 4 provides good performance of random reads, while the
performance of random writes is low due to the need to write all parity data to a single disk.
• Both level 3 and level 4 require at least three disks to implement RAID.
RAID 5
• RAID 5 is by far the most common RAID configuration for business servers and enterprise NAS devices.
• This RAID level provides better performance than mirroring as well as fault tolerance.
• With RAID 5, data and parity (which is additional data used for recovery) are striped across three or more disks.
• If a disk gets an error or starts to fail, data is recreated from this distributed data and parity block— seamlessly and
automatically.
• Another benefit of RAID 5 is that it allows many NAS and server drives to be "hot-swappable" meaning in case a drive
in the array fails, that drive can be swapped with a new drive without shutting down the server or NAS.
• RAID 5 arrays are generally considered to be a poor choice for use on write-intensive systems because of the
performance impact associated with writing parity information.
• When a disk does fail, it can take a long time to rebuild a RAID 5 array. Performance is usually degraded during the
rebuild time and the array is vulnerable to an additional disk failure until the rebuild is complete.
Example
RAID 6
• RAID 6 is an extension of level 5. In this level, two independent parities are
generated and stored in distributed fashion among multiple disks.
• Two parities provide additional fault tolerance. This level requires at least four
disk drives to implement RAID.
File Structure
• Relative data and information is stored collectively in file formats.
• A disk drive is formatted into several blocks that can store records. File records
• Update Operations
• Retrieval Operations
• Retrieval operations, on the other hand, do not alter the data but retrieve
• Open − A file can be opened in one of the two modes, read mode or write mode.
• Locate − Every file has a file pointer, which tells the current position where the data is to be read or written.
This pointer can be adjusted accordingly. Using find (seek) operation, it can be moved forward or backward.
• Read − By default, when files are opened in read mode, the file pointer points to the beginning of the file.
• Write − User can select to open a file in write mode, which enables them to edit its contents. It can be
deletion, insertion, or modification. The file pointer can be located at the time of opening or can be
• Close − This is the most important operation from the operating system’s point of view. When a request to
• saves the data (if altered) to the secondary storage media, and
• releases all the buffers and file handlers associated with the file.
File Organization
• File Organization defines how file records are mapped onto disk blocks.
• Heap File does not support any ordering, sequencing, or indexing on its own.
• Records are placed in file in the same order as they are inserted. A new record is inserted in the
last page of the file; if there is insufficient space in the last page, a new page is added to the file.
• Once the data block is full, the next record is stored in the new block. This new block need not be
the very next block. This method can select any block in the memory to store the new records. It is
similar to pile file in the sequential method, but here data blocks are not selected sequentially.
Example
• Retrieval
• When a record has to be retrieved from the database, in this method, we need to traverse from the
beginning of the file till we get the requested record.
• Deletion
• Similarly if we want to delete or update a record, first we need to search for the record.
• Again, searching a record is similar to retrieving it- start from the beginning of the file till the record is
fetched.
• In addition, while deleting a record, the record will be deleted from the data block.
• But it will not be freed and it cannot be re-used. Hence as the number of record increases, the memory
size also increases and hence the efficiency.
• For the database to perform better, DBA has to free this unused memory periodically.
Advantages of Heap File Organization
• Very good method of file organization for bulk insertion. i.e.; when
there is a huge number of data needs to load into the database at a
time, then this method of file organization is best suited.
• They are simply inserted one after the other in the memory blocks.
• As the file size grows, linear search for the record becomes time
consuming.
Disadvantages of Heap File Organization
• Here each file/records are stored one after the other in a sequential manner.
1. Records are stored one after the other as they are inserted into the tables.
• This method is called pile file method. When a new record is inserted, it is placed at the end of
the file.
• In the case of any modification or deletion of record, the record will be searched in the
memory blocks.
• Once it is found, it will be marked for deleting and new block of record is entered.
Example
• In the diagram above, R1, R2, R3 etc. are the records. They contain all the attribute of a
row. i.e.; when we say student record, it will have his id, name, address, course, DOB
etc. Similarly R1, R2, R3 etc can be considered as one full set of attributes.
• In the second method, records are sorted (either ascending or
descending) each time they are inserted into the system.
• This method is called sorted file method.
• Sorting of records may be based on the primary key or on
any other columns.
• Whenever a new record is inserted, it will be inserted at
the end of the file and then it will sort – ascending or
descending based on key value and placed at the correct
position.
• In the case of update, it will update the record and then
sort the file to place the updated record in the right place.
Same is the case with delete.
Advantages of Sequential File Organization
• When there are large volumes of data, this method is very fast and efficient.
• This method is helpful when most of the records have to be accessed like
calculating the grade of a student, generating the salary slips etc where we use
• These files can be stored in magnetic tapes which are comparatively cheap.
Disadvantages of Sequential File
Organization
• Sorted file method always involves the effort for sorting the
record.
• The memory location where these records are stored is called as data block or data bucket.
• The hash function can use any of the column value to generate the address.
• Most of the time, hash function uses primary key to generate the hash index – address of the data
block.
• Hash function can be simple mathematical function to any complex mathematical function.
• We can even consider primary key itself as address of the data block. That means each row will be
stored at the data block whose address will be same as primary key. This implies how simple a hash
function can be in database.
Example
• Static
• That means, if we want to generate address for EMP_ID = 103 using mod (5) hash function, it
always result in the same bucket address 3.
• Hence number of data buckets in the memory for this static hashing remains constant throughout.
• In our example, we will have five data buckets in the memory used to store the data.
• Searching a record
• Using the hash function, data bucket address is generated for the hash key. The record is then retrieved
from that location. i.e.; if we want to retrieve whole record for ID 104, and if the hash function is mod (5) on
ID, the address generated would be 4. Then we will directly got to address 4 and retrieve the whole record
for ID 104. Here ID acts as a hash key.
• Inserting a record
• When a new record needs to be inserted into the table, we will generate a address for the new record
based on its hash key. Once the address is generated, the record is stored in that location.
• Delete a record
• Using the hash function we will first fetch the record which is supposed to be deleted. Then we will remove
the records for that address in memory.
• Update a record
• Data record marked for update will be searched using static hash function and then record in that address
is updated.
Bucket overflow
• Suppose we have to insert some records into the file.
• But the data bucket address generated by the hash function is full or the data already exists in that address.
• How do we insert the data? This situation in the static hashing is called bucket overflow.
• This is one of the critical situations/ drawback in this method. Where will we save the data in this case? We
cannot lose the data.
• Closed hashing
• Open Hashing
• Quadratic probing
• Double Hashing
Closed hashing
• In this method we introduce a new data bucket with same address and link it after the full data bucket.
These methods of overcoming the bucket overflow are called closed hashing or overflow chaining.
• Consider we have to insert a new record R2 into the tables. The static hash function generates the data
bucket address as ‘AACDBF’. But this bucket is full to store the new data. What is done in this case is a
new data bucket is added at the end of ‘AACDBF’ data bucket and is linked to it. Then new record R2 is
inserted into the new bucket. Thus it maintains the static hashing address. It can add any number of
new data buckets, when it is full.
Open Hashing
• In this method, next available data block is used to enter the new record, instead of overwriting on
the older one.
• This method is called Open Hashing or linear probing.
• In the below example, R2 is a new record which needs to be inserted. But the hash function
generates address as 237.
• But it is already full.
• So the system searches next available data bucket, 238 and assigns R2 to it.
Other Methods
• Quadratic probing
• This is similar to linear probing. But here, the difference between old and new
bucket is linear. We use quadratic function to determine the new bucket
address.
• Double Hashing
• This is also another method of linear probing. Here the difference is fixed like in
linear probing, but this fixed difference is calculated by using another hash
function. Hence the name is double hashing.
Dynamic Hashing
• This hashing method is used to overcome the problems of static hashing – bucket overflow. In this method of hashing,
data buckets grows or shrinks as the records increases or decreases.
• This method of hashing is also known as extendable hashing method. Let us see an example to understand this
method.
• Consider there are three records R1, R2 and R4 are in the table. These records generate addresses 100100, 010110
and 110110 respectively.
• This method of storing considers only part of this address – especially only first one bit to store the data. So it tries to
load three of them at address 0 and 1.
• So it changes the address have 2 bits rather than 1 bit, and then it updates the existing data to have 2
bit address.
• Now we can see that address of R1 and R2 are changed to reflect the new address and R3 is also
inserted. As the size of the data increases, it tries to insert in the existing buckets. If no buckets are
available, the number of bits is increased to consider larger address, and hence increasing the buckets.
If we delete any record and if the data can be stored with lesser buckets, it shrinks the bucket size.
Advantages of Dynamic hashing
• Performance does not come down as the data grows in the system.
• Since it grows and shrinks with the data, memory is well utilized.
• Good for dynamic databases where data grows and shrinks frequently.
Disadvantages of Dynamic hashing
• This is because, the address of the data will keep changing as buckets grow
and shrink.
• But it might take little time to reach this situation than static hashing.
Clustered File Organization
• In all the file organization methods described above, each file contains single table and are
all stored in different ways in the memory.
• In real life situation, retrieving records from single table is comparatively less.
• Most of the cases, we need to combine/join two or more related tables and retrieve the data.
• In such cases, above all methods will not be faster to give the result.
• Those methods have to traverse each table at a time and then combine the results of each
to give the requested result.
• This is obvious that the time taken for this is more. So what could be done to overcome this
situation?
• In this method two or more table which are frequently used to join and get the
results are stored in the same file called clusters.
• These files will have two or more tables in the same data block and the key columns
which map these tables are stored only once.
• This method hence reduces the cost of searching for various records in different files.
• All the records are found at one place and hence making search efficient.
• Here data are sorted based on the primary key or the key with which we are
searching the data.
• The key with which we are joining the tables is known as cluster key.
Clustering of tables are done when
2. If tables are joined once in a while or full table scan of any one
the table in involved in the query, then we do not cluster the
tables.
• Hash Clusters: - This is also similar to indexed cluster. Here instead of storing the records based on the cluster
key, we generate the hash key value for the cluster key and store the records with same hash key value together
in the memory disk.
Advantages of Clustered File Organization
efficiently
Disadvantages of Clustered File
Organization
• This method is not suitable for very large databases since the performance of
• We cannot use this clusters, if there is any change is joining condition. If the
joining condition changes, the traversing the file takes lot of time.
• This method is not suitable for less frequently joined tables or tables with 1:1
conditions.
Indexing
• Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on
which the indexing has been done.
• Secondary Index − Secondary index may be generated from a field which is a candidate key and has a unique
value in every record, or a non-key with duplicate values.
• Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.
• In dense index, there is an index record for every search key value in the database.
• This makes searching faster but requires more space to store index records itself.
• Index records contain search key value and a pointer to the actual record on the
disk.
Sparse Index
• In sparse index, index records are not created for every search key.
• An index record here contains a search key and an actual pointer to the data on the disk.
• To search a record, we first proceed by index record and reach at the actual location of the
data.
• If the data we are looking for is not where we directly reach by following the index, then the
system starts sequential search until the desired data is found.
Multilevel Index
• Multilevel index is stored on the disk along with the actual database
files.
• As the size of the database grows, so does the size of the indices.
Multi-level Index helps in breaking down the index into several smaller
indices in order to make the outermost level so small that it can be saved in a
single disk block, which can easily be accommodated anywhere in the main
memory.
B+ Tree
• B+ tree is similar to binary search tree, but it can have more
than two leaf nodes.
• We can observe here that it divides the records into two and splits into left node and right node.
• Left node will have all the values less than or equal to root node and the right node will have values
greater than root node.
• The intermediary nodes at level 2 will have only the pointers to the leaf nodes.
• The values shown in the intermediary nodes are only the pointers to next level.
• All the leaf nodes will have the actual records in a sorted order.
• If we have to search for any record, they are all found at leaf node.
• Hence searching any record will take same time because of equidistance of the leaf
nodes.
• Hence searching a record is like a sequential search and does not take much time.
• Suppose a B+ tree has an order of n (it is the number of branches – above tree
structure has 5 branches altogether, hence order is 5), and then it can have n/2 to n
intermediary nodes and n/2 to n-1 leaf nodes.
• Then it can have intermediary nodes ranging from 3 to 5. And it can have leaf nodes
from 3 to 4.
The main goal of B+ tree is:
• Sorted Intermediary and leaf nodes:
• Since it is a balanced tree, all nodes should be sorted.
• No overflow pages:
• B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have some percentage defined while
designing a B+ tree.
• This percentage up to which nodes are filled is called fill factor.
• If a node reaches the fill factor limit, then it is called overflow page.
• If a node is too empty then it is called underflow.
• In our example above, intermediary node with 108 is underflow. And leaf nodes are not partially filled, hence it is an
Searching a record in B+ Tree
• Suppose we want to search 65 in the below B+ tree structure.
• First we will fetch for the intermediary node which will direct to the leaf node that can contain
record for 65.
• It will go to 3rd leaf node after 55. Since it is a balanced tree and that leaf node is
already full, we cannot insert the record there.
• But it should be inserted there without affecting the fill factor, balance and order.
• So the only option here is to split the leaf node. But how do we split the nodes?
• The 3rd leaf node should have values (50, 55, 60, 65, 70) and its current root
node is 50.
• We will split the leaf node in the middle so that its balance is not altered.
• So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes.
• If these two has to be leaf nodes, the intermediary node cannot branch from 50.
• It should have 60 added to it and then we can have pointers to new leaf node.
Delete in B+ tree
• We will traverse to the 1st leaf node and simply delete 15 from that node.
• There is no need for any re-arrangement as the tree is balanced and 15 do not appear in the
intermediary node.
B+ Tree Extensions
• As the number of records grows in the database, the
intermediary and leaf nodes needs to be split and spread
widely to keep the balance of the tree.
• Since it is a balance tree, it searches for the position of the records in the file, and
then it fetches/inserts /deletes the records.
• Below is the simple example of how student details are stored in B+ tree index files.
Example
• Suppose we have a new student Bryan. Where will he fit in the file? He will fit in the 1st leaf node. Since this leaf node is not
full, we can easily add him in the node.
• But what happens if we want to insert another student Ben to this file? Some re-arrangement to the nodes is needed to
maintain the balance of the file.
• As the file grows in the database, the performance remains the same. This is
because all the records are maintained at leaf node and all the nodes are at
equi-distance from root.
• Even though insertion and deletion are little complicated, it can be done in
fraction of seconds.
• Leaf node allows only partial/ half filled, since records are larger than pointers.
B Tree index Files
• B tree index file is similar to B+ tree index files, but it uses binary search concepts.
• In this method, each root will branch to only two nodes and each intermediary node will also have the
data.
• Since all intermediary nodes also have records, it reduces the traversing till leaf node for the data.
2 Since each node has record, there might Good space utilization as intermediary nodes
not be required to traverse till leaf node. contain only pointer to the records and only
leaf nodes contain records.
Disadvantages
B Tree B+ Tree
1 If the tree is very big, then we have to If there is any rearrangement of nodes while
traverse through most of the nodes to get insertion or deletion, then it would be an
the records. overhead.
• Index-organized tables: - Here data itself acts as a index and whole record is stored in the B+ index
file.
• Descending Indexes: - Here index key is stored in the descending order in B+ tree files.
• Reverse key indexes: - In this method of indexing, the index key column value is stored in the
reverse order. For example, say index is created on STD_ID in the STUDENT table. Suppose STD_ID has
values 100,101,102 and 103. Then the reverse key index would be 001, 101,201 and 301 respectively.
• B+ tree Cluster Index: - Here, cluster key of the table is used in the index. Thus, each index in this
method will point to set of records with same cluster keys.
Transaction
Transaction
• A transaction can be defined as a group of tasks. A single task is the minimum processing unit which cannot be divided further.
• Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's account to B's account. This
very simple and small transaction involves several low-level tasks.
• A’s Account
• Open_Account(A)
• Old_Balance = A.balance
• New_Balance = Old_Balance - 500
• A.balance = New_Balance
• Close_Account(A)
• B’s Account
• Open_Account(B)
• Old_Balance = B.balance
• New_Balance = Old_Balance + 500
• B.balance = New_Balance
• Close_Account(B)
ACID Properties
• A transaction is a very small unit of a program and it may contain several low level tasks.
1. Atomicity,
2. Consistency,
3. Isolation,
4. and Durability
• Consistency − The database must remain in a consistent state after any transaction. No transaction
should have any adverse effect on the data residing in the database.
• Isolation - If there are multiple transactions executing simultaneously, then all the transaction should be
processed as if they are single transaction. But individual transaction in it should not alter or affect the
other transaction.
• Durability − The database should be durable enough to hold all its latest updates even if the system fails
or restarts.
• If a transaction updates a chunk of data in a database and commits, then the database will hold the
modified data.
• If a transaction commits but the system fails before the data could be written on to the disk, then that data
will be updated once the system springs back into action.
Serializability
• When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with
some other transaction.
• Schedule − A chronological execution sequence of a transaction is called a schedule.
• A schedule can have many transactions in it, each comprising of a number of instructions/tasks.
• Serial Schedule − It is a schedule in which transactions are aligned in such a way that one
transaction is executed first.
• When the first transaction completes its cycle, then the next transaction is executed.
• This type of schedule is called a serial schedule, as transactions are executed in a serial manner.
• In a multi-transaction environment, serial schedules are considered as a benchmark.
• This execution does no harm if two transactions are mutually independent and
working on different segments of data; but in case these two transactions are
working on the same data, then the results may vary.
• Now suppose that there are two data values: 0 and 10.
• If these transactions are run one after the other, the new values will be 1 and 21 if
transaction A is run first, or 2 and 22 if transaction B is run first.
• But what if the order in which the two transactions are run is different for each value? If
transaction A is run first on the first value and transaction B is run first on the second value,
the new values are 1 and 22.
• The transactions are serializable if 1, 21 and 2, 22 are the only possible results.
• Result Equivalence
• If two schedules produce the same result after execution, they are said to be result equivalent. They may yield
the same result for some value and different results for another set of values. That's why this equivalence is not
generally considered significant.
• View Equivalence
• Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a
similar manner.
• For example −
• If T reads the initial data in S1, then it also reads the initial data in S2.
• If T reads the value written by J in S1, then it also reads the value written by J in S2.
• If T performs the final write on the data value in S1, then it also performs the final write on the data value in
S2.
Conflict Equivalence
• Two schedules would be conflicting if they have the following properties −
• Two schedules having multiple transactions with conflicting operations are said to be conflict
equivalent if and only if −
Both the schedules contain the same set of Transactions.
• Note − View equivalent schedules are view serializable and conflict equivalent schedules are
conflict serializable.
• Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed state.
• Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system fails. A
failed transaction can no longer proceed further.
• Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls back
all its write operations on the database to bring the database back to its original state where it was prior to the execution
of the transaction.
• The database recovery module can select one of the two operations after a transaction aborts −
• Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are now
permanently established on the database system.
Concurrency Control
• In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions.
• Binary Locks − A lock on a data item can be in two states; it is either locked or
unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks based on
their uses.
If a lock is acquired on a data item to perform a write operation, it is an exclusive
lock. Allowing more than one transaction to write on the same data item would
lead the database into an inconsistent state.
Read locks are shared because no data value is being changed.
Types Of Lock Protocols
• There are four types of lock protocols available −
• Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write' operation is
performed. Transactions may unlock the data item after completing the ‘write’ operation.
• Pre-claiming protocols evaluate their operations and create a list of data items on which they need locks.
• Before initiating an execution, the transaction requests the system for all the locks it needs beforehand.
• If all the locks are granted, the transaction executes and releases all the locks when all its operations are over.
If all the locks are not granted, the transaction rolls back and waits until all the locks are granted.
• Two-Phase Locking 2PL
• This locking protocol divides the execution phase of a transaction into three parts.
In the first part, when the transaction starts executing, it seeks permission for the locks it requires.
The second part is where the transaction acquires all the locks.
• As soon as the transaction releases its first lock, the third phase starts.
• In this phase, the transaction cannot demand any new locks; it only releases the acquired locks.
• Two-phase locking has two phases, one is growing, where all the locks are being acquired by the transaction; and the second
phase is shrinking, where the locks held by the transaction are being released.
• To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to an exclusive lock.
• Strict Two-Phase Locking
• After acquiring all the locks in the first phase, the transaction continues to execute normally.
• But in contrast to 2PL, Strict-2PL does not release a lock after using it.
• Strict-2PL holds all the locks until the commit point and releases all the locks at a time.
• The most commonly used concurrency protocol is the timestamp based protocol. This
protocol uses either system time or logical counter as a timestamp.
• Lock-based protocols manage the order between the conflicting pairs among transactions at
the time of execution, whereas timestamp-based protocols start working as soon as a
transaction is created.
• Every transaction has a timestamp associated with it, and the ordering is determined by the
age of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004
is two seconds younger and the priority would be given to the older one.
• In addition, every data item is given the latest read and write-timestamp. This lets the
system know when the last ‘read and write’ operation was performed on the data item.
Timestamp Ordering Protocol
• The timestamp-ordering protocol ensures serializability among transactions in
their conflicting read and write operations.
• This is the responsibility of the protocol system that the conflicting pair of tasks
should be executed according to the timestamp values of the transactions.
• Operation rejected.
• Operation executed.
• Operation rejected.
• This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and
Ti is rolled back.
• For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to complete its task.
• Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T2.
• Thus, all the processes wait for each other to release resources.
• In case a system is stuck in a deadlock, the transactions involved in the deadlock are either rolled back or
restarted.
Deadlock Prevention
• To prevent any deadlock situation in the system, the DBMS aggressively inspects all the
operations, where transactions are about to execute.
• The DBMS inspects the operations and analyzes if they can create a deadlock situation.
• If it finds that a deadlock situation might occur, then that transaction is never allowed
to be executed.
• There are deadlock prevention schemes that use timestamp ordering mechanism of
transactions in order to predetermine a deadlock situation.
Wait-Die Scheme
Wound-Wait Scheme
Wait-Die Scheme
• In this scheme, if a transaction requests to lock a resource (data item), which is already held
with a conflicting lock by another transaction, then one of the two possibilities may occur −
• If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then
Ti is allowed to wait until the data-item is available.
• If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a
random delay but with the same timestamp.
• This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
• In this scheme, if a transaction requests to lock a resource (data item), which is already held with conflicting
lock by some another transaction, one of the two possibilities may occur −
• If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later with a
random delay but with the same timestamp.
• If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.
• This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by
a younger one, the older transaction forces the younger one to abort and release the item.
• In both the cases, the transaction that enters the system at a later stage is aborted.
Deadlock Avoidance
• Methods like "wait-for graph" are available but they are suitable
for only those systems where transactions are lightweight having
fewer instances of resource.
First, do not allow any request for an item, which is already locked by
another transaction. This is not always feasible and may cause starvation,
where a transaction indefinitely waits for a data item and can never acquire
it.
The second option is to roll back one of the transactions. It is not always
the older one. With the help of some relative algorithm, a transaction is
• In addition, it stores all the transactions that are being currently executed.
• Recovery
• When the system recovers from a failure, it can restore the latest dump.
• It can maintain a redo-list and an undo-list as checkpoints.
• It can recover the system by consulting undo-redo lists to restore the state of all transactions
up to the last checkpoint.
Database Backup & Recovery from Catastrophic
Failure
• A catastrophic failure is one where a stable, secondary storage device gets corrupt. With the
storage device, all the valuable data that is stored inside is lost. We have two different strategies
to recover data from such a catastrophic failure −
• Remote backup; Here a backup copy of the database is stored at a remote location from where it
can be restored in case of a catastrophe.
• Alternatively, database backups can be taken on magnetic tapes and stored at a safer place. This
backup can later be transferred onto a freshly installed database to bring it to the point of backup.
• Grown-up databases are too bulky to be frequently backed up. In such cases, we have techniques
where we can restore a database just by looking at its logs. So, all that we need to do here is to
take a backup of all the logs at frequent intervals of time. The database can be backed up once a
week, and the logs being very small can be backed up every day or as frequently as possible.
Remote Backup
• One of them is directly connected to the system and the other one is kept
at a remote place as backup.
• As soon as the primary database storage fails, the backup system senses
the failure and switches the user system to the remote storage.
• Sometimes this is so instant that the users can’t even realize a failure.
Data Recovery
• In computing, data recovery is a process of salvaging inaccessible data from
corrupted or damaged secondary storage, removable media or files, when the
data they store cannot be accessed in a normal way.
• The data is most often salvaged from storage media such as internal or
external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives,
magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices.
• Transaction failure
• System Crash
• Disk Failure
Storage Structure
• We have already described the storage system. In brief, the storage structure can be divided into
two categories −
• Volatile storage − As the name suggests, a volatile storage cannot survive system crashes.
• Volatile storage devices are placed very close to the CPU; normally they are embedded onto
the chipset itself.
• For example, main memory and cache memory are examples of volatile storage. They are
fast but can store only a small amount of information.
• They are huge in data storage capacity, but slower in accessibility. Examples may include
hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity
• It should check the states of all the transactions, which were being executed.
• A transaction may be in the middle of some operation; the DBMS must ensure the
rolled back.
• Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.
• Maintaining shadow paging, where the changes are done on a volatile memory,
and later, the actual database is updated.
Log-based Recovery
• Log is a sequence of records, which maintains the records of actions performed by a transaction. It is
important that the logs are written prior to the actual modification and stored on a stable storage media,
which is failsafe.
• When a transaction enters the system and starts execution, it writes a log about it.
• <Tn, Start>
• <Tn, commit>
• The database can be modified using two approaches −
• At the time of recovery, it would become hard for the recovery system to
backtrack all logs, and then start recovering.
• To ease this situation, most modern DBMS use the concept of 'checkpoints'.
• Checkpoint
• Keeping and maintaining logs in real time and in real environment may fill out all
the memory space available in the system.
• SQL is Structured Query Language, which is a computer language for storing, manipulating
and retrieving data stored in relational database.
• All relational database management systems like MySQL, MS Access, Oracle, Sybase, Informix,
postgres and SQL Server use SQL as standard database language.
• Allows users to define the data in database and manipulate that data.
• Allows to embed within other languages using SQL modules, libraries & pre-compilers.
• Classic query engine handles all non-SQL queries, but SQL query engine won't handle
logical files.
SQL Architecture:
SQL Commands
• The standard SQL commands to interact with relational databases are CREATE, SELECT,
• The Data Definition Language (DDL) manages table and index structure.
• The most basic items of DDL are the CREATE, ALTER, RENAME and DROP
statements:
• The Data Manipulation Language (DML) is the subset of SQL used to add, update
and delete data.
• The acronym CRUD refers to all of the major functions that need to be
implemented in a relational database application to consider it complete.
• DCL is the simplest of the SQL subsets, as it consists of only three commands: GRANT,
• Combined, these three commands provide administrators with the flexibility to set and remove
• MySQL
• MS SQL Server
• ORACLE
• MSACCESS
SQL Syntax
• SQL syntax can vary slightly between different database systems (MySQL, PostgreSQL, SQL Server,
Oracle, etc.).
• Understanding the basic clauses (SELECT, FROM, WHERE, ORDER BY, etc.) is essential for writing
• SQL LIKEClause
• SQL Server offers six categories of data types for your use:
• An operator is a reserved word or a character used primarily in an SQL statement's WHERE clause
• Operators are used to specify conditions in an SQL statement and to serve as conjunctions for
• Arithmetic operators
• Comparison operators
• Logical operators
• Tables: Create,Drop,Delete
• SQL Constraints
• SQL Joins
• SQL Unions
• SQL Alias
• SQL Index
• SQL Views
SQL CREATE Database
• Syntax:
• CREATE DATABASE DatabaseName;
• Always database name should be unique within the RDBMS.
• Example:
• SQL> CREATE DATABASE testDB;
• Make sure you have admin privilege before creating any database.
• Once a database is created, you can check it in the list of databases as follows:
• Syntax:
• DROP DATABASE DatabaseName;
• Example:
operation, you would need to select a database where all the operations would be
performed.
• The SQL USE statement is used to select any existing database in SQL schema.
• Syntax:
• USE DatabaseName;
• Now, if you want to work with AMROOD database, then you can execute the
• );
Create Table Using another Table
• Syntax:
• Following is an example, which would create a table SALARY using CUSTOMERS table and having fields customer ID and customer SALARY:
constraints showing that these fileds can not be NULL while creating records in this table:
• );
• You can verify if your table has been created successfully by looking at the message displayed by the SQL
• The SQL DROP TABLE statement is used to remove a table definition and all
data, indexes, triggers, constraints, and permission specifications for that
table.
• Syntax:
• The SQL INSERT INTO Statement is used to add new rows of data to a table in the
database.
• Syntax:
• You may not need to specify the column(s) name in the SQL query if you are adding values
for all the columns of the table.
• INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);
Example:
• Following statements would create four records in CUSTOMERS table:
• You can create a record in CUSTOMERS table using second syntax as follows:
• FROM second_table_name
• [WHERE condition];
SQL SELECT Query
• SQL SELECT Statement is used to fetch the data from a database table which returns data
in the form of result table. These result tables are called result-sets.
• Syntax:
• Following is an example, which would fetch ID, Name and Salary fields of the customers available in CUSTOMERS
table:
• If the given condition is satisfied, then only it returns specific value from the table. You
would use WHERE clause to filter the records and fetching only necessary records.
• The WHERE clause is not only used in SELECT statement, but it is also used in UPDATE,
DELETE statement, etc., which we would examine in subsequent chapters.
• Syntax:
• FROM table_name
• WHERE [condition]
Example:
• Consider the CUSTOMERS table having the following records:
• Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS table where salary
is greater than 2000:
• FROM CUSTOMERS
• Here, it is important to note that all the strings should be given inside single
quotes ('') where as numeric values should be given without any quote as in
above example:
• SELECT ID, NAME, SALARY
• FROM CUSTOMERS
• These operators provide a means to make multiple comparisons with different operators in the same SQL
statement.
• The AND operator allows the existence of multiple conditions in an SQL statement's WHERE clause.
• Syntax:
• FROM table_name
• Following is an example, which would fetch ID, Name and Salary fields from the
CUSTOMERS table where salary is greater than 2000 AND age is less tan 25 years:
• SELECT ID, NAME, SALARY
• FROM CUSTOMERS
clause.
• Syntax:
• FROM table_name
• Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS table
where salary is greater than 2000 OR age is less than 25 years:
• FROM CUSTOMERS
• The SQL UPDATE Query is used to modify the existing records in a table.
• You can use WHERE clause with UPDATE query to update selected rows, otherwise all
the rows would be affected.
• Syntax:
• UPDATE table_name
• WHERE [condition];
Example:
• Consider the CUSTOMERS table having the following records:
• UPDATE CUSTOMERS
• SET ADDRESS = 'Pune'
• WHERE ID = 6;
Example 2
• If you want to modify all ADDRESS and SALARY column values in CUSTOMERS
table, you do not need to use WHERE clause and UPDATE query would be as
follows:
• UPDATE CUSTOMERS
• The SQL DELETE Query is used to delete the existing records from a table.
• You can use WHERE clause with DELETE query to delete selected rows,
otherwise all the records would be deleted.
• Syntax:
• WHERE ID = 6;
SQL LIKE Clause
• The SQL LIKE clause is used to compare a value to similar values using wildcard operators.
• There are two wildcards used in conjunction with the LIKE operator:
• Syntax:
• The following SQL statement selects all customers with a City starting with "ber":
• The following SQL statement selects all customers with a City containing the
pattern "es":
• The following SQL statement selects all customers with a City starting with any character,
followed by "erlin":
• The following SQL statement selects all customers with a City starting with "L", followed by
• The following SQL statement selects all customers with a City starting with "a", "b", or "c":
• The following SQL statement selects all customers with a City NOT starting with "b", "s", or "p":
• For example MySQL supports LIMIT clause to fetch limited number of records and
Oracle uses ROWNUM to fetch limited number of records.
• Syntax:
• The basic syntax of TOP clause with SELECT statement would be as follows:
• SELECT * • Example
• The SQL ORDER BY clause is used to sort the data in ascending or descending
order, based on one or more columns. Some database sorts query results in
ascending order by default.
• Syntax:
• Following is an example, which would sort the result in descending order by NAME:
• The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the ORDER BY
clause.
• Syntax:
• The basic syntax of GROUP BY clause is given below. The GROUP BY clause must follow the conditions
in the WHERE clause and must precede the ORDER BY clause if one is used.
• FROM table_name
• WHERE [ conditions ]
• If you want to know the total amount of salary on each customer, then GROUP BY query
would be as follows:
• SELECT NAME, SUM(SALARY) FROM CUSTOMERS
• GROUP BY NAME;
• Now, let us have following table where CUSTOMERS table has the following records
with duplicate names:
• Now again, if you want to know the total amount of salary on each customer, then
GROUP BY query would be as follows:
• SELECT NAME, SUM(SALARY) FROM CUSTOMERS
• GROUP BY NAME;
SQL Distinct Keyword
• The SQL DISTINCT keyword is used in conjunction with SELECT statement to eliminate
all the duplicate records and fetching only unique records.
• There may be a situation when you have multiple duplicate records in a table. While
fetching such records, it makes more sense to fetch only unique records instead of
fetching duplicate records.
• Syntax:
• Now, let us use DISTINCT keyword with the above SELECT query and see
the result:
• SELECT DISTINCT SALARY FROM CUSTOMERS
• ORDER BY SALARY;
Example 2
• To fetch the rows with own preferred order, the SELECT query would be as follows:
• ORDER BY
• (CASE ADDRESS
• These are used to limit the type of data that can go into a table.
• This ensures the accuracy and reliability of the data in the database.
• Column level constraints are applied only to one column where as table level constraints are applied to the whole table.
• Example:
• Example:
• Example:
• You can also use the following syntax, which supports naming the constraint in
• If you are using MySQL, then you can use the following syntax:
• A table can have only one primary key, which may consist of single or multiple fields.
• When multiple fields are used as a primary key, they are called a composite key.
• If a table has a primary key defined on any field(s), then you can not have two
records having the same value of that field(s).
• Note: You would use these concepts while creating database tables.
Create Primary Key:
• Here is the syntax to define ID attribute as a primary key in a CUSTOMERS table.
• CREATE TABLE CUSTOMERS(
• ID INT NOT NULL,
• NAME VARCHAR (20) NOT NULL,
• AGE INT NOT NULL,
• ADDRESS CHAR (25) ,
• SALARY DECIMAL (18, 2),
• PRIMARY KEY (ID)
• );
• To create a PRIMARY KEY constraint on the "ID" column when CUSTOMERS table already
exists, use the following SQL syntax:
• ALTER TABLE CUSTOMER ADD PRIMARY KEY (ID);
FOREIGN Key:
• A foreign key is a key used to link two tables together. This is sometimes called a
referencing key.
• Primary key field from one table and insert it into the other table where it becomes a
foreign key i.e., Foreign Key is a column or a combination of columns, whose values
• The relationship between 2 tables matches the Primary Key in one of the tables
• If a table has a primary key defined on any field(s), then you can not have two records
• ORDERS table:
• CREATE TABLE ORDERS (
• ID INT NOT NULL,
• DATE DATETIME,
• CUSTOMER_ID INT references CUSTOMERS(ID),
• AMOUNT double,
• PRIMARY KEY (ID)
• );
• If ORDERS table has already been created, and the foreign key has not yet been, use the syntax for specifying a
foreign key by altering a table.
• ALTER TABLE ORDERS
• ADD FOREIGN KEY (Customer_ID) REFERENCES CUSTOMERS (ID);
DROP a FOREIGN KEY Constraint:
• Example:
• For example, the following SQL creates a new table called CUSTOMERS and adds five columns. Here, we add a CHECK
with AGE column, so that you can not have any CUSTOMER below 18 years:
• );
INDEX:
• The INDEX is used to create and retrieve data from the database very quickly.
• When index is created, it is assigned a ROWID for each row before it sorts out the data.
• Proper indexes are good for performance in large databases, but you need to be careful while creating index.
• Selection of fields depends on what you are using in your SQL queries.
• You can create index on single or multiple columns using the followwng syntax:
• To create an INDEX on AGE column, to optimize the search on customers for a particular age, following is the SQL
syntax:
• ON CUSTOMERS ( AGE );
Dropping Constraints:
• Any constraint that you have defined can be dropped using the ALTER TABLE command with the DROP
CONSTRAINT option.
• For example, to drop the primary key constraint in the EMPLOYEES table, you can use the following
command:
• Some implementations may provide shortcuts for dropping certain constraints. For example, to drop the
primary key constraint for a table in Oracle, you can use the following command:
• Some implementations allow you to disable constraints. Instead of permanently dropping a constraint
from the database, you may want to temporarily disable the constraint, and then enable it later.
SQL Joins
• The SQL Joins clause is used to combine records from two or more tables in a database.
• A JOIN is a means for combining fields from two tables by using values common to each.
• Example let us join these two tables in our SELECT statement as follows:
• FROM CUSTOMERS
• ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
LEFT JOIN
• The SQL LEFT JOIN returns all rows from the left table, even if there are no matches in the right
table.
• This means that if the ON clause matches 0 (zero) records in right table, the join will still return a row
in the result, but with NULL in each column from right table.
• This means that a left join returns all the values from the left table, plus matched values from the
right table or NULL in case of no matching join predicate.
• Syntax:
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
RIGHT JOIN
• The SQL RIGHT JOIN returns all rows from the right table, even if there are no matches in the left table.
• This means that if the ON clause matches 0 (zero) records in left table, the join will still return a row in
the result, but with NULL in each column from left table.
• This means that a right join returns all the values from the right table, plus matched values from the left
table or NULL in case of no matching join predicate.
• Syntax:
• FROM table1
• ON table1.common_filed = table2.common_field;
Example:
• SELECT ID, NAME, AMOUNT, DATE
• FROM CUSTOMERS
• ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
FULL JOIN
• The SQL FULL JOIN combines the results of both left and right outer joins.
• The joined table will contain all records from both tables, and fill in NULLs
for missing matches on either side.
• Syntax:
• FROM CUSTOMERS
• ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
SELF JOIN
• The SQL SELF JOIN is used to join a table to itself as if the table were two tables,
temporarily renaming at least one table in the SQL statement.
• Syntax:
• Syntax:
• To use UNION, each SELECT must have the same number of columns selected, the same number of column
expressions, the same data type, and have them in the same order, but they do not have to be the same
length.
• Syntax:
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
• UNION
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
The UNION ALL Clause:
• The UNION ALL operator is used to combine the results of two SELECT statements including duplicate rows.
• The same rules that apply to UNION apply to the UNION ALL operator.
• Syntax:
• [WHERE condition]
• UNION ALL
• [WHERE condition]
Example:
• SELECT ID, NAME, AMOUNT, DATE
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
• UNION ALL
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
INTERSECT Clause
• The SQL INTERSECT clause/operator is used to combine two SELECT statements, but returns rows only
from the first SELECT statement that are identical to a row in the second SELECT statement. This
means INTERSECT returns only common rows returned by the two SELECT statements.
• Just as with the UNION operator, the same rules apply when using the INTERSECT operator. MySQL
does not support INTERSECT operator
• Syntax:
• The basic syntax of INTERSECT is as follows:
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
• INTERSECT
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
Example:
• SELECT ID, NAME, AMOUNT, DATE
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
• INTERSECT
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
EXCEPT Clause
• The SQL EXCEPT clause/operator is used to combine two SELECT statements and returns rows
from the first SELECT statement that are not returned by the second SELECT statement.
• This means EXCEPT returns only rows, which are not available in second SELECT statement.
• Just as with the UNION operator, the same rules apply when using the EXCEPT operator. MySQL
does not support EXCEPT operator.
• Syntax:
• The basic syntax of EXCEPT is as follows:
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
• EXCEPT
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
Example:
• SELECT ID, NAME, AMOUNT, DATE
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID
• EXCEPT
• FROM CUSTOMERS
• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
SQL Alias Syntax
• You can rename a table or a column temporarily by giving another name known as alias.
• The use of table aliases means to rename a table in a particular SQL statement.
• The renaming is a temporary change and the actual table name does not change in the database.
• The column aliases are used to rename a table's columns for the purpose of a particular SQL
query.
• Syntax:
• The basic syntax of table alias is as follows:
• SELECT column1, column2....
• FROM table_name AS alias_name
• WHERE [condition];
• The basic syntax of column alias is as follows:
• SELECT column_name AS alias_name
• FROM table_name
• WHERE [condition];
Example:
• SELECT C.ID, C.NAME, C.AGE, O.AMOUNT
• FROM CUSTOMERS
• A view can contain all rows of a table or select rows from a table.
• A view can be created from one or many tables which depends on the written SQL query to create a
view.
• Views, which are kind of virtual tables, allow users to do the following:
• Structure data in a way that users or classes of users find natural or intuitive.
• Restrict access to the data such that a user can see and (sometimes) modify exactly what they
need and no more.
• Summarize data from various tables which can be used to generate reports.
Creating Views:
• Database views are created using the CREATE VIEW statement. Views can be created from
a single table, multiple tables, or another view.
• To create a view, a user must have the appropriate system privilege according to the specific
implementation.
• You can include multiple tables in your SELECT statement in very similar way as you use
them in normal SQL SELECT query.
Example:
• CREATE VIEW CUSTOMERS_VIEW AS
• FROM CUSTOMERS;
• UPDATE CUSTOMERS_VIEW
• SET AGE = 35
• WHERE name='Ramesh';
• The WHERE clause places conditions on the selected columns, whereas the
HAVING clause places conditions on groups created by the GROUP BY clause.
• Syntax:
• WHERE [ conditions ]
• HAVING [ conditions ]
• FROM CUSTOMERS
• GROUP BY age