0% found this document useful (0 votes)
13 views384 pages

Multi-User Relational Database - UA

This document provides an overview of Multi-User Relational Database Management Systems (RDBMS), detailing the structure, management, and advantages of databases compared to file-based systems. It discusses various database models, including hierarchical, network, and relational models, highlighting their characteristics, advantages, and disadvantages. Additionally, it outlines the roles of users, components of RDBMS, and the significance of database schemas in organizing and managing data.

Uploaded by

abdulrazakkemkem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
13 views384 pages

Multi-User Relational Database - UA

This document provides an overview of Multi-User Relational Database Management Systems (RDBMS), detailing the structure, management, and advantages of databases compared to file-based systems. It discusses various database models, including hierarchical, network, and relational models, highlighting their characteristics, advantages, and disadvantages. Additionally, it outlines the roles of users, components of RDBMS, and the significance of database schemas in organizing and managing data.

Uploaded by

abdulrazakkemkem
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PPTX, PDF, TXT or read online on Scribd
You are on page 1/ 384

MULTI-USER

RELATIONAL DATABASE
MANAGEMENT

NTA LEVEL 6
Introduction to RDBMS
OVERVIEW
• Database is “a collection of related data” and data is a collection of facts and figures that

can be processed to produce information.

• Mostly data represents recordable facts.

• Data aids in producing information, which is based on facts.

• For example, if we have data about marks obtained by all students, we can then conclude

about toppers and average marks.

• A database management system stores data in such a way that it becomes easier to

retrieve, manipulate, and produce information.


Database
• Database = a collection of related data with;
1. A logically coherent structure (can be characterized as a whole)

2. Some inherent meaning (represents some partial view of a portion of the real world)

3. A specific purpose, an intended group of users and applications

4. A largely varying size (from a personal list of addresses to the National Register of Persons)

5. A scope or content of varying breadth (from a personal list of addresses to a multimedia


encyclopedia)

6. A physical organization of varying complexity (from a manual personal list, managed with
simple files, to huge multi-user databases with geographically distributed data and access)

7. logically-coordinated objectives, data is defined once for a community of users, and accessed
by various applications with specific needs
Data Management

• Data management deals with managing large amount of information, which

involves both the storage of information and the provision of mechanisms for the

manipulation of information.

• In addition, the system should also provide the safety of the information stored

under various circumstances, such as multiple user access and so on.

• The two different approaches of managing data are;

1. File-based systems

2. Database systems.
File-based Systems
• File-based systems are an early approach to data management where data is
stored and managed in flat files, such as text files, CSV files, or binary files.
These systems were widely used before the advent of modern database
management systems (DBMS). These could be accessed by a computer operator.

• Files of archived data were called tables because they looked like tables used in
traditional file keeping.
When to Use File-Based Systems

1.Small, Simple Applications:


• When the data requirements are minimal, and the application does not need
advanced features like transactions or concurrency control.

2.Temporary or Prototype Systems:


• For quick prototyping or temporary solutions before migrating to a database system.

3.Data Exchange or Interoperability:


• When data needs to be shared between systems in a standardized format (e.g., CSV,
JSON).

4.Low Budget or Resource Constraints:


• When there is no budget or infrastructure for a full-fledged database system.
How it works?

Accounts HRM Production


Department Department Department

Accounts HR Production
Data Data Data

a) Each department maintain there own set of data.


b) There is no link between those data pools.
Advantages of file based system

• No need of external storage

• No need of highly technical person to handle the database.

• Processing speed is high as compare to DBMS

• Low cost (ex: Equipments)


Disadvantages of File based system
• Provide less security.

• High complexity in updating of database

• Data separation and isolation

• Data dependence

• Data duplication

• Incompatible data (different file formats)

• Lack of flexibility in organizing and querying the data

• Increased number of different application programs


Database Systems
• Database Systems evolved in the late 1960s to address common issues in applications handling
large volumes of data, which are also data intensive.

• Some of these issues could be traced back to the disadvantages of File-based systems.

• Databases are used to store data in an efficient and organized manner.

• A database allows quick and easy management of data. For example, a company may maintain
details of its employees in various databases.

• At any point of time, data can be retrieved from the database, new data can be added into the
databases and data can be searched based on some criteria in these databases.

• Data storage can be achieved even using simple manual files. For instance, a college has to
maintain information about teachers, students, subjects, and examinations.

• The registers or files are bulky, consume a lot of space, and hence, cannot be kept for many years.
Advantages of database systems

• The amount of redundancy in the stored data can be reduced

• No more inconsistencies in data

• The stored data can be shared

• Standards can be set and followed

• Data Integrity can be maintained

• Security of data can be implemented


Comparison with File based Systems

Feature File-Based Systems Database Systems

Data Management Manual, application-controlled Centralized, DBMS-controlled

Data Integrity Limited or none Enforced through constraints

Data Redundancy High Minimized through normalization

Scalability Poor High

Robust (authentication, access


Security Limited
control)

Concurrency Control Difficult to manage Built-in mechanisms

Query Capabilities Limited Advanced (SQL, indexing, etc.)


Database Management System (DBMS)
• A DBMS can be defined as a collection of related records and a set of programs that access and
manipulate these records.

• A DBMS enables the user to enter, store, and manage data.

• The main problem with the earlier DBMS packages was that the data was stored in the flat file
format.

• So, the information about different objects was maintained separately in different physical files.

• Hence, the relations between these objects, if any, had to be maintained in a separate physical
file.

• Thus, a single package would consist of too many files and vast functionalities to integrate
them into a single system.
DBMS
DBMS Functions
• DBMS: a collection of general-purpose, application-independent programs providing
services to;
1. Define the structure of a database, i.e., data types and constraints that the data will have to
satisfy

2. Manage the storage of data, safely for long periods of time, on some storage medium
controlled by the DBMS

3. Manipulate a database, with efficient user interfaces to query the database to retrieve
specific data, update the database to react changes in the world, generate reports from the
data

4. Manage database usage: users with their access rights, performance optimization, sharing of
data among several users, security from accidents or unauthorized use

5. Monitor and analyze database usage


Characteristics
• Real-world entity
• Relation-based tables
• Isolation of data and application
• Less redundancy
• Consistency
• Query Language
• ACID (Atomicity, Consistency, Isolation, and Durability)Properties
• Multiuser and Concurrent Access
• Multiple views
• Security
DBMS Users
• A typical DBMS has users with different rights and permissions who use it for
different purposes. Some users retrieve data and some back it up.

• The users of a DBMS can be broadly categorized as follows:


DBMS Architecture
• The design of a DBMS depends on its architecture.

• It can be;

1. Centralized
2. Decentralized
3. Hierarchical

• The architecture of a DBMS can be seen as either ;

1. Single tier
2. Or multi-tier.

• An n-tier architecture divides the whole system into related but independent n
modules, which can be independently modified, altered, changed, or replaced.
Database Models
Database Models
• Logical
• Network Model
• Relational Model
• Entity-Relationship Model
• Object-Oriented Model
• Hierarchical Database Model

• Physical
• Inverted Index
• Flat File
Flat File Data Model
• In this model, the database consists of only one table or file.

• This model is used for simple databases - for example, to store the roll numbers,
names, subjects, and marks of a group of students.

• This model cannot handle very complex data.

• It can cause redundancy when data is repeated more than once.


Hierarchical Data Model

• In the Hierarchical Model, different records are inter-related through hierarchical


or tree-like structures.

• In this model, relationships are thought of in terms of children and parents.

• A parent record can have several children, but a child can have only one parent.

• To find data stored in this model, the user needs to know the structure of the
tree.

• The Windows Registry is an example of a hierarchical database storing


configuration settings and options on Microsoft Windows operating systems.
Example
Advantages of the hierarchical model

• The advantages of a hierarchical model are as follows:


 Data is held in a common database so data sharing becomes easier, and
security is provided and enforced by a DBMS.
 Data independence is provided by a DBMS, which reduces the effort and
costs in maintaining the program.

• This model is very efficient when a database contains a large volume of data.

• For example, a bank's customer account system fits the hierarchical model
well because each customer's account is subject to a number of transactions.
Network Data Model
• In the network model, data is stored in sets, instead of the hierarchical tree format.

• This solves the problem of data redundancy.

• The set theory of the network model does not use a single-parent tree hierarchy.

• It allows a child to have more than one parent.

• Thus, the records are physically linked through linked-lists.

• Integrated Database Management System (IDMS) from Computer Associates International Inc.
and Raima Database Manager (RDM) Server by Raima Inc. are examples of a Network DBMS.

• The network model together with the hierarchical data model was a major data model for
implementing numerous commercial DBMS.
Example
The components of network models
• The components of the language used with network models are as follows:

1. A Data Definition Language (DDL) that is used to create and remove databases and
database objects. It enables the database administrator to define the schema
components.

2. A sub-schema DDL that enables the database administrator to define the database
components.

3. A Data Manipulation Language (DML), which is used to insert, retrieve, and modify
database information. All database users use these commands during the routine
operation of the database.

4. Data Control Language (DCL) is used to administer permissions on the databases and
database objects.
Advantages Of The Network Model

• The advantages of such a structure are specified as


follows:
The relationships are easier to implement in the network
database model than in the hierarchical model.
This model enforces database integrity.

This model achieves sufficient data independence.


Disadvantages Of The Network Model

• The disadvantages are specified as follows:


The databases in this model are difficult to design.
The programmer has to be very familiar with the internal structures to
access the database.
The model provides a navigational data access environment. Hence, to
move from A to E in the sequence A-B-C-D-E, the user has to move through
B, C, and D to get to E.

• This model is difficult to implement and maintain. Computer programmers,


rather than end users, utilize this model.
Relational Data Model
• As the information needs grew and more sophisticated databases and applications were
required, database design, management, and use became too cumbersome.

• The lack of query facility took a lot of time of the programmers to produce even the
simplest reports.

• This led to the development of what came to be called the Relational Model database.

• The term 'Relation' is derived from the set theory of mathematics.

• In the Relational Model, unlike the Hierarchical and Network models, there are no physical
links.

• All data is maintained in the form of tables consisting of rows and columns.

• Data in two tables is related through common columns and not physical links.

• Operators are provided for operating on rows in tables.


Example
Example 2

Marks Table
Students Table
The Students table displays the Roll Number and the Student Name, and the Marks
table displays the Roll Number and Marks obtained by the students.

Displaying Student Names and Marks Obtained Above 50


Advantages Of The Relational Model

• The relational database model gives the programmer time to concentrate on the logical
view of the database rather than being bothered about the physical view.

• One of the reasons for the popularity of the relational databases is the querying flexibility.

• Most of the relational databases use Structured Query Language (SQL).

• An RDBMS uses SQL to translate the user query into the technical code required to
retrieve the requested data.

• Relational model is so easy to handle that even untrained people find it easy to generate
handy reports and queries, without giving much thought to the need to design a proper
database.
Disadvantages Of The Relational Model

• Though the model hides all the complexities of the


system, it tends to be slower than the other database
systems.

• As compared to all other models, the relational data


model is the most popular and widely used.
Relational Database Management System (RDBMS)

• The Relational Model is an attempt to simplify database structures.

• It represents all data in the database as simple row-column tables


of data values.

• An RDBMS is a software program that helps to create, maintain, and


manipulate a relational database.

• A relational database is a database divided into logical units called


tables, where tables are related to one another within the database.
Example
Terms related to RDBMS

• There are certain terms that are mostly used in an RDBMS.


These are described as follows:
1. Data is presented as a collection of relations.
2. Each relation is depicted as a table.
3. Columns are attributes.
4. Rows ('tuples') represent entities.
5. Every table has a set of attributes that are taken together as a 'key'
(technically, a 'superkey'), which uniquely identifies each entity.
Example
RDBMS Users

• Database Administrator (DBA)

• Database Designer

• System Analysts and Application Programmers

• DBMS Designers and Implementers

• End User
The components of an RDBMS
1. Entity

• An entity is a person, place, thing, object, event, or even a concept, which can be
distinctly identified. For example, the entities in a university are students, faculty
members, and courses.

• Each entity has certain characteristics known as attributes. For example, the
student entity might include attributes such as student number, name, and grade.
Each attribute should be named appropriately.

• A grouping of related entities becomes an entity set. Each entity set is given a
name. The name of the entity set reflects the contents.

• Thus, the attributes of all the students of the university will be stored in an entity
set called Student.
2. Tables and their Characteristics

• A table contains a group of related entities that is an entity set.

• The terms entity set and table are often used interchangeably.

• A table is also called a relation.

• The rows are known as tuples.

• The columns are known as attributes.


The Characteristics Of A Table
• A two-dimensional structure composed of rows and columns is perceived as a table.

• Each tuple represents a single entity within the entity set.

• Each column has a distinct name.

• Each row/column intersection represents a single data value.

• Each table must have a key known as primary key that uniquely identifies each row.

• All values in a column must conform to the same data format. For example, if the attribute is
assigned a decimal data format, all values in the column representing that attribute must be in
decimals.

• Each column has a specific range of values known as the attribute domain.

• Each row carries information describing one entity occurrence.

• The order of the rows and columns is immaterial in a DBMS.


Differences between a DBMS and an RDBMS
Data Schema
• A database schema is the skeleton structure that represents the logical view of the
entire database.

• It defines how the data is organized and how the relations among them are associated.

• It formulates all the constraints that are to be applied on the data.

• A database schema defines its entities and the relationship among them.

• It contains a descriptive detail of the database, which can be depicted by means of


schema diagrams.

• It’s the database designers who design the schema to help programmers understand
the database and make it useful.
Example
Schema Categories
• A database schema can be divided broadly into two categories:

• Physical Database Schema: This schema pertains to the actual storage


of data and its form of storage like files, indices, etc. It defines how the
data will be stored in a secondary storage.

• Logical Database Schema: This schema defines all the logical


constraints that need to be applied on the data stored. It defines tables,
views, and integrity constraints.
Database Instance
• A database state at a specific time defined through the currently existing
content and relationship and their attributes is called a database instance

• The following illustration shows that a database scheme could be looked at


like a template or building plan for one or several database instances.
Schema vs Instance
• Database schema is the skeleton of database. It is designed when the database doesn't
exist at all.

• Once the database is operational, it is very difficult to make any changes to it.

• A database schema does not contain any data or information.

• A database instance is a state of operational database with data at any given time.

• It contains a snapshot of the database.

• Database instances tend to change with time.

• A DBMS ensures that its every instance (state) is in a valid state, by diligently following all
the validations, constraints, and conditions that the database designers have imposed.
Data Independence
• The ability to modify schema definition
in one level without affecting schema
definition in the next higher level is
called data independence.

• There are two levels of data


independence, they are;
• Physical data independence

• Logical data independence.


Physical data independence
• Physical data independence is the ability to modify the physical schema without
causing application programs to be rewritten.

• Modifications at the physical level are occasionally necessary to improve


performance.

• It means we change the physical storage/level without affecting the conceptual or


external view of the data.

• The new changes are absorbed by mapping techniques

• For example, a change to the internal schema, such as using different file
organization or storage structures, storage devices, or indexing strategy, should be
possible without having to change the conceptual or external schemas.
Logical data independence
• Logical data independence is the ability to modify the logical schema without causing application
program to be rewritten.

• Modifications at the logical level are necessary whenever the logical structure of the database is
altered (for example, when money-market accounts are added to banking system).

• Logical Data independence means if we add some new columns or remove some columns from
table then the user view and programs should not change.

• For example: consider two users A & B. Both are selecting the fields "EmployeeNumber" and
"EmployeeName".

• If user B adds a new column (e.g. salary) to his table, it will not effect the external view for user A,
though the internal schema of the database has been changed for both users A & B.
Data Modelling
Data Modelling

• A data model is a group of conceptual tools that describes data, its relationships, and
semantics.

• Data Modeling is the process of applying an appropriate data model to the data, in
order to organize and structure it.

• Data modeling is as essential to database development as are planning and designing


to any project development.

• Data models help database developers to define the relational tables, primary and
foreign keys, stored procedures, and triggers required in the database.
Steps for Data Modelling

1. Conceptual Data Modeling

• The data modeler identifies the highest level of relationships in the data.

2. Logical Data Modeling

• The data modeler describes the data and its relationships in detail. The data

modeler creates a logical model of the database.

3. Physical Data Modeling

• The data modeler specifies how the logical model is to be realized physically.
Conceptual Data Model
• A conceptual data model identifies the highest-level relationships between the
different entities.

• Features of conceptual data model include:

• Describes the semantics of a domain, being the scope of the model. For
example, it may be a model of the interest area of an organization or
industry.
• Includes the important entities and the relationships among them.

• No attribute is specified.

• No primary key is specified.


• We can see that the only information shown via the conceptual data model is the entities
that describe the data and the relationships between those entities.

• No other information is shown through the conceptual data model.


Logical Data Model
• A logical data model describes the data in as much detail as possible, without
regard to how they will be physical implemented in the database.

• Features of a logical data model include:


• Includes all entities and relationships among them.
• All attributes for each entity are specified.
• The primary key for each entity is specified.
• Foreign keys (keys identifying the relationship between different entities)
are specified.
• Normalization occurs at this level.
Steps For Designing The Logical Data Model

• The steps for designing the logical data model are as follows:

• Specify primary keys for all entities.

• Find the relationships between different entities.

• Find all attributes for each entity.

• Resolve many-to-many relationships.

• Normalization.
Logical vs Conceptual

• Comparing the logical data model shown above with the conceptual data model diagram,
we see the main differences between the two:
• In a logical data model, primary keys are present, whereas in a conceptual data model,
no primary key is present.
• In a logical data model, all attributes are specified within an entity.

• No attributes are specified in a conceptual data model.

• Relationships between entities are specified using primary keys and foreign keys in a
logical data model.
• In a conceptual data model, the relationships are simply stated, not specified, so we
simply know that two entities are related, but we do not specify what attributes are
used for this relationship
Physical Data Model
• Physical data model represents how the model will be built in the database.

• A physical database model shows all table structures, including column name, column data type,
column constraints, primary key, foreign key, and relationships between tables.

• Features of a physical data model include:

• Specification all tables and columns.

• Foreign keys are used to identify relationships between tables.

• Denormalization may occur based on user requirements.

• Physical considerations may cause the physical data model to be quite different from the logical
data model.

• Physical data model will be different for different RDBMS.

• For example, data type for a column may be different between MySQL and SQL Server.
Example
Steps For Physical Data Model Design

• The steps for physical data model design are as follows:

• Convert entities into tables.

• Convert relationships into foreign keys.

• Convert attributes into columns.

• Modify the physical data model based on physical constraints /

requirements.
Physical vs Logical

• Comparing the physical data model shown above with the logical data model

diagram, we see the main differences between the two:

• Entity names are now table names.

• Attributes are now column names.

• Data type for each column is specified.

• Data types can be different depending on the actual database being used.
Entity Relationship Model

• The ER model defines the conceptual view of a database.

• It works around real-world entities and the associations among them.

• At view level, the ER model is considered a good option for designing databases.

• Data models can be classified into three different groups:

• Object-based logical models

• Record-based logical models

• Physical model

• The Entity-Relationship (E-R) model belongs to the first classification.


Example
The model is based on a simple idea.
• Data can be perceived as real-world objects called entities and the relationships that exist
between them.
• For example, the data about employees working for an organization can be perceived as a
collection of employees and a collection of the various departments that form the
organization.
• Both employee and department are real-world objects.
• An employee belongs to a department. Thus, the relation 'belongs to' links an employee to a
particular department.
Basic components of ER Model

• An E-R model consists of five basic components. They are as


follows:
• Entity

• Entity Set

• Attributes

• Relationship

• Relationship set
Entity
• An entity is a real-world object that exists physically and is distinguishable
from other objects.

• For example, employee, department, student, customer, vehicle, and account


are entities.

• An entity set is a collection of similar types of entities. An entity set may


contain entities with attribute sharing similar values.

• For example, a Students set may contain all the students of a school; likewise
a Teachers set may contain all the teachers of a school from all faculties.

• Entity sets need not be disjoint.


Attributes

• Attributes are features that an entity has. Attributes help distinguish


every entity from another.

• For example, the attributes of a student would be roll_number, name,


stream, semester, and so on.

• There exists a domain or range of values that can be assigned to


attributes.

• For example, a student's name cannot be a numeric value. It has to be


alphabetic. A student's age cannot be negative, etc.
Entity Sets
• Entity Set Types:

• Entity sets that do not have enough attributes to establish a primary key are
called weak entity sets.
• Entity sets that have enough attributes to establish a primary key are called
strong entity sets.
• Example, an assignment and a student can be considered as two separate entities.

• The assignment entity is described by the attributes assignment_number


and subject.
• The student entity is described by roll_number, name, and semester.
• The assignment entities can be grouped to form an assignment entity set and the student entities can be
grouped to form a student entity set. The entity sets are associated by the relation 'submitted by'.
• The attributes, assignment_number and
subject, are not enough to identify an
assignment entity uniquely.

• The roll_number attribute alone is enough to


uniquely identify any student entity.
Therefore, roll_number is a primary key for
the student entity set.
• The assignment entity set is a weak entity set since
it lacks a primary key

• The student entity set is a strong entity set due to


the presence of the roll_number attribute.
Types of Attributes

• Simple attribute
• Simple attributes are atomic values, which cannot be divided further. For example, a
student's phone number is an atomic value of 10 digits.

• Composite attribute
• Composite attributes are made of more than one simple attribute. For example, a
student's complete stud_name may be broken into stud_first_name + stud_last_name.

• Derived attribute
• Derived attributes are the attributes that do not exist in the physical database, but
their values are derived from other attributes present in the database. For example,
average_salary
• Single-value attribute

• Single-value attributes contain single value. For example: Social_Security_Number or stu_LastName

since a student usually has one last name that uniquely identifies him/her.

• Multi-value attribute

• Multi-value attributes may contain more than one values. For example, a person can have more than

one phone number, email_address, etc.

• Stored Attribute

• An attribute, which cannot be derived from other attribute, is known as stored attribute. For

example, BirthDate of employee.

• Key Attribute

• It is an attribute, that has distinct value for each entity/element in an entity set. For example, Roll

number in a Student Entity Type.


• These attribute types can come together in a way like:

• simple single-valued attributes

• simple multi-valued attributes

• composite single-valued attributes

• composite multi-valued attributes


Entity-Set and Keys
• Key is an attribute or collection of attributes that uniquely identifies an entity among

entity set.

• For example, the roll_number of a student makes him/her identifiable among students.

• Super Key − A set of attributes (one or more) that collectively identifies an entity

in an entity set.

• Candidate Key − A minimal super key is called a candidate key. An entity set

may have more than one candidate key.

• Primary Key − A primary key is one of the candidate keys chosen by the

database designer to uniquely identify the entity set.


Relationship
• The association among entities is called a
relationship.
• For example, an employee works at a department, a
student enrolls in a course. Here, Works at and
Enrolls are called relationships.
• Relationship Set

• A set of relationships of similar type is called a


relationship set.
• Like entities, a relationship too can have attributes.

• These attributes are called descriptive attributes.


Degree of Relationship

• The number of participating entities in a relationship defines the degree of the

relationship.

• Self-relationships

• Binary = degree 2

• Ternary = degree 3

• n-ary = degree
Self-relationships
• Relationships between entities of the same entity set are called self-
relationships.

• For example, a manager and his team member, both belong to the employee
entity set.

• The team member works for the manager.

• Thus, the relation, 'works for', exists between two different employee entities
of the same employee entity set.
Binary relationships
• Relationships that exist between entities of two different entity sets are called binary
relationships.

• For example, an employee belongs to a department.

• The relation exists between two different entities, which belong to two different entity
sets.

• The employee entity belongs to an employee entity set.

• The department entity belongs to a department entity set.


Ternary relationships
• Relationships that exist between three entities of different entity sets are
called ternary relationships.

• For example, an employee works in the accounts department at the regional


branch.

• The relation, 'works' exists between all three, the employee, the department,
and the location.
ER Model to Relational Model
Mapping Cardinalities

• Relationships can also be classified as per mapping cardinalities.

• The different mapping cardinalities are as follows:

• One-to-one

• Many-to-one

• Many-to-many
One-to-one
• One entity from entity set A can be associated with at most one entity of entity set B and vice versa.

• Consider the relationship between a vehicle and its registration. Every vehicle has a unique registration.
No two vehicles can have the same registration details. The relation is one-to-one, that is, one vehicle-one

registration.
One-to-many
• This kind of mapping exists when an entity of one set can be associated with more than one
entity of another entity set.

• Consider the relation between a customer and the customer's vehicles.

• A customer can have more than one vehicle. Therefore, the mapping is a one to many mapping,
that is, one customer - one or more vehicles.
Many-to-One
• This kind of mapping exists when many entities of one set is associated with an entity of another set. This
association is done irrespective of whether the latter entity is already associated to other or more entities of the
former entity set.

• Consider the relation between a vehicle and its manufacturer. Every vehicle has only one manufacturing
company or coalition associated to it under the relation, 'manufactured by', but the same company or coalition
can manufacture more than one kind of vehicle.
Many-to-Many
• This kind of mapping exists when any number of entities of one set can be associated with any
number of entities of the other entity set.

• Consider the relation between a bank's customer and the customer's accounts. A customer can
have more than one account and an account can have more than one customer associated with it
in case it is a joint account or similar. Therefore, the mapping is many-to-many, that is, one or
more customers associated with one or more accounts.
Entity-Relationship Diagrams
• The E-R diagram is a graphical representation of the E-R model. The E-R diagram, with
the help of various symbols, effectively represents various components of the E-R
model.
Example
• Entity Multivalued Attributes

• Attributes

• Composite Attribute Derived


Attribute
Relationship Notations

• A relationship where two entities are participating is called a binary relationship.

• One-to-one (1:1)

• One-to-many (1:N)

• Many-to-many (N:N)
Participation Constraints

• Total Participation − Each entity is involved in the


relationship. Total participation is represented by double lines.

• Partial participation − Not all entities are involved in the


relationship. Partial participation is represented by single lines.
ER Diagram
Steps to construct an E-R diagram are as follows:
• Step 1: Gather data
• The bank is a collection of accounts used by customers to save money.

• Step 2: Identify entities


• 1. Customer
• 2. Account

• Step 3: Identify the attributes


• 1. Customer: customer_name, customer_address, customer_contact
• 2. Account: account_number, account_owner, balance_amount

• Step 4: Sort entity sets


• 1. Customer entity set: weak entity set
• 2. Account entity set: strong entity set

• Step 5: Sort attributes


• 1. Customer entity set: customer_address - composite, customer_contact - multi-valued
• 2. Account entity set: account_number , primary key, account_owner – multi-valued

• Step 6: Identify relations


• A customer 'saves in' an account. The relation is 'saves in'.
Bank ER Diagram
Generalization Aggregation
• The ER Model has the power of expressing database entities in a conceptual
hierarchical manner.

• Going up in this structure is called generalization, where entities are clubbed

together to represent a more generalized view.

• For example, a particular student named Mira can be generalized along with all the
students.

• The entity shall be a student, and further, the student is a person.

• The reverse is called specialization where a person is a student, and that student is
Mira.
Generalization Example
• Pigeon, house sparrow, crow and dove can all be
generalized as Birds.
Specialization
• Specialization is the opposite of generalization. In
specialization, a group of entities is divided into sub-
groups based on their characteristics.

• Take a group ‘Person’ for example. A person has name,


date of birth, gender, etc.

• These properties are common in all persons, human


beings.

• But in a company, persons can be identified as


employee, employer, customer, or vendor, based on
what role they play in the company.
Inheritance
• Inheritance is an important feature of
Generalization and Specialization. It allows
lower-level entities to inherit the attributes of
higher-level entities.

• For example, the attributes of a Person class


such as name, age, and gender can be inherited
by lower-level entities such as Student or
Teacher.
Codd's 12 Rules
• These rules can be applied on any database system that manages stored data using only its
relational capabilities. This is a foundation rule, which acts as a base for all the other rules.

• Rule 1: Information Rule


• The data stored in a database, may it be user data or metadata, must be a value of
some table cell. Everything in a database must be stored in a table format.

• Rule 2: Guaranteed Access Rule


• Every single data element (value) is guaranteed to be accessible logically with a
combination of table-name, primary-key (row value), and attribute-name (column
value). No other means, such as pointers, can be used to access data.

• Rule 3: Systematic Treatment of NULL Values


• The NULL values in a database must be given a systematic and uniform treatment. This
is a very important rule because a NULL can be interpreted as one the following − data
• Rule 4: Active Online Catalog

• The structure description of the entire database must be stored in an online catalog,
known as data dictionary, which can be accessed by authorized users. Users can use
the same query language to access the catalog which they use to access the database
itself.

• Rule 5: Comprehensive Data Sub-Language Rule

• A database can only be accessed using a language having linear syntax that supports
data definition, data manipulation, and transaction management operations. This
language can be used directly or by means of some application. If the database allows
access to data without any help of this language, then it is considered as a violation.

• Rule 6: View Updating Rule

• All the views of a database, which can theoretically be updated, must also be
updatable by the system.
• Rule 7: High-Level Insert, Update, and Delete Rule

• A database must support high-level insertion, updation, and deletion. This must not be
limited to a single row, that is, it must also support union, intersection and minus operations
to yield sets of data records.

• Rule 8: Physical Data Independence

• The data stored in a database must be independent of the applications that access the
database. Any change in the physical structure of a database must not have any impact on
how the data is being accessed by external applications.

• Rule 9: Logical Data Independence

• The logical data in a database must be independent of its user’s view (application). Any
change in logical data must not affect the applications using it. For example, if two tables are
merged or one is split into two different tables, there should be no impact or change on the
user application. This is one of the most difficult rule to apply.
• Rule 10: Integrity Independence
• A database must be independent of the application that uses it. All its integrity constraints can be
independently modified without the need of any change in the application. This rule makes a
database independent of the front-end application and its interface.

• Rule 11: Distribution Independence


• The end-user must not be able to see that the data is distributed over various locations. Users should
always get the impression that the data is located at one site only. This rule has been regarded as
the foundation of distributed database systems.

• Rule 12: Non-Subversion Rule


• If a system has an interface that provides access to low-level records, then the interface must not be
able to subvert the system and bypass security and integrity constraints.
Relation Data Model
Relation Data Model
• The relational model (RM) for database management is an approach to managing
data using a structure and language consistent with first-order predicate logic, first
described in 1969 by Edgar F. Codd.

• First-order logic uses quantified variables over (non-logical) objects. It allows the
use of sentences that contain variables, so that rather than propositions such as
Socrates is a man one can have expressions in the form X is a man where X is a
variable.

• In the relational model of a database, all data is represented in terms of tuples,


grouped into relations.

• A database organized in terms of the relational model is a relational database.


Purpose of Relational Model

• The purpose of the relational model is to provide a declarative method for


specifying data and queries.

• The relational model's central idea is to describe a database as a collection of


predicates over a finite set of predicate variables, describing constraints on the
possible values and combinations of values.

• The content of the database at any given time is a finite (logical) model of the
database, i.e. a set of relations, one per predicate variable, such that all predicates
are satisfied.

• A request for information from the database (a database query) is also a predicate.
Concepts
• Tables − In relational data model, relations are saved in the format of Tables. This format stores the
relation among entities. A table has rows and columns, where rows represents records and columns
represent the attributes.

• Tuple − A single row of a table, which contains a single record for that relation is called a tuple.

• Relation Instance − A finite set of tuples in the relational database system represents relation
instance. Relation instances do not have duplicate tuples.

• Relation Schema − A relation schema describes the relation name (table name), attributes, and
their names.

• Relation Key − Each row has one or more attributes, known as relation key, which can identify the
row in the relation (table) uniquely.

• Attribute Domain − Every attribute has some pre-defined value scope, known as attribute domain.
Example
Constraints
• Every relation has some conditions that must hold for it to be a valid relation.

• These conditions are called Relational Integrity Constraints.

• There are three main integrity constraints :−

• Key constraints

• Domain constraints

• Referential integrity constraints


Key Constraints
• There must be at least one minimal subset of attributes in the relation, which can
identify a tuple uniquely.
• This minimal subset of attributes is called key for that relation.
• If there are more than one such minimal subsets, these are called candidate keys.
• Key constraints force that −
• In a relation with a key attribute, no two tuples can have identical values for key
attributes.
• A key attribute can not have NULL values.
• Key constraints are also referred to as Entity Constraints.
Domain Constraints
• A domain is the original sets of atomic values used to model data. By atomic, we mean that each
value in the domain is indivisible as far as the relational model is concerned.

• For example:

• The domain of Marital status has a set of possibilities: Married, Single, Divorced

• The domain of day Shift has the set of all possible days : {Mon, Tue, Wed…}.

• The domain of Salary is the set of all floating-point numbers greater than 0 and less than
200,000.

• The domain of First Name is the set of character strings that represents names of people.

• In summary, a Domain is a set of acceptable values that a column is allowed to contain. This is
based on various properties and the data type for the column.

• Age cannot be less than zero and telephone numbers cannot contain a digit outside 0-9.

• Every attribute is bound to have a specific range of values.


Referential integrity Constraints
• Referential integrity constraints work on the concept of Foreign Keys.

• A foreign key is a key attribute of a relation that can be referred in other relation.

• Referential integrity constraint states that if a relation refers to a key attribute of

a different or same relation, then that key element must exist.

• For example, a table called Employee has a primary key called employee_id.

• Another table called Employee Details has a foreign key which references

employee_id in order to uniquely identify the relationship between both the

tables.
Normalization
Normalization
• Normalization involves decomposing a table into less redundant (and smaller) tables
without losing information, and then linking the data back together by defining foreign
keys in the old table referencing the primary keys of the new ones.

• The objective is to isolate data so that additions, deletions, and modifications of an


attribute can be made in just one table and then propagated through the rest of the
database using the defined foreign keys.

• Without Normalization, it becomes difficult to handle and update the database, without
facing data loss.

• Insertion, Updating and Deletion Anomalies are very frequent if Database is not
Normalized.
Objectives
• A basic objective of the first normal form defined by Codd in 1970 was to permit data to be queried and

manipulated using a "universal data sub-language" grounded in first-order logic.

• The objectives of normalization beyond 1NF (First Normal Form) were stated as follows by Codd:

1. To free the collection of relations from undesirable insertion, update and deletion dependencies;

2. To reduce the need for restructuring the collection of relations, as new types of data are

introduced, and thus increase the life span of application programs;

3. To make the relational model more informative to users;

4. To make the collection of relations neutral to the query statistics, where these statistics are liable

to change as time goes by E.F. Codd, "Further Normalization of the Data Base Relational Model“

• The sections below give details of each of these objectives.


Update anomalies
• The same information can be expressed on multiple rows; therefore updates to
the table may result in logical inconsistencies.

• For example, each record in an "Employees' Skills" table might contain an


Employee ID, Employee Address, and Skill; thus a change of address for a
particular employee will potentially need to be applied to multiple records (one
for each skill).
Deletion Anomalies
• Under certain circumstances, deletion of data representing certain facts necessitates deletion
of data representing completely different facts.

• The "Faculty and Their Courses" table described in the previous example suffers from this
type of anomaly, for if a faculty member temporarily ceases to be assigned to any courses,
we must delete the last of the records on which that faculty member appears, effectively also
deleting the faculty member, unless we set the Course Code to null in the record itself.
Insertion Anomalies
• There are circumstances in which certain facts cannot be recorded at all.

• For example, each record in a "Faculty and Their Courses" table might contain a Faculty ID,
Faculty Name, Faculty Hire Date, and Course Code—thus we can record the details of any faculty
member who teaches at least one course, but we cannot record the details of a newly hired
faculty member who has not yet been assigned to teach any courses except by setting the
Course Code to null.
Repetition Anomalies

• The data such as Project_id, Project_name, Grade, and Salary repeat many times.

• This repetition hampers both, performance during retrieval of data and the storage
capacity.

• This repetition of data is called the repetition anomaly.


Normalization Rules

• Normalization rule are divided into following normal form.

• First Normal Form

• Second Normal Form

• Third Normal Form

• BCNF
First Normal Form
• As per First Normal Form,
A. No two Rows of data must contain repeating group of information i.e each set of column must have
a unique value, such that multiple columns cannot be used to fetch the same row.

B. Each table should be organized into rows, and each row should have a primary key that
distinguishes it as unique.

• The Primary key is usually a single column, but sometimes more than one column can be
combined to create a single primary key.

• For example consider a table which is not in First normal form


• In First Normal Form, any row must not have a column in which more than one value is
saved, like separated with commas.

• Rather than that, we must separate such data into multiple rows.

• Student Table following 1NF will be :

• Primary Key (Student,Subject)

• Using the First Normal Form, data redundancy increases, as there will be many columns
with same data in multiple rows but each row as a whole will be unique.
Second Normal Form (2NF)
• As per the Second Normal Form there must not be any partial dependency of any column on
primary key.

• It means that for a table that has concatenated primary key, each column in the table that is not
part of the primary key must depend upon the entire concatenated key for its existence.

• If any column depends only on one part of the concatenated key, then the table fails Second
normal form.

• In example of First Normal Form there are two rows for Adam, to include multiple subjects that
he has opted for. While this is searchable, and follows First normal form, it is an inefficient use
of space.

• Also in the above Table in First Normal Form, while the candidate key is {Student, Subject}, Age
of Student only depends on Student column, which is incorrect as per Second Normal Form.
• To achieve second normal form, it would be helpful to split out the subjects into an
independent table, and match them up using the student names as foreign keys.

• New Student Table following 2NF will be :

• In Student Table the candidate key will be Student column, because all other column i.e.
Age is dependent on it.

• New Subject Table introduced for 2NF will be :


• In Subject Table the candidate key will be {Student, Subject} column.

• Now, both the above tables qualifies for Second Normal Form and will never
suffer from Update Anomalies.

• Although there are a few complex cases in which table in Second Normal Form
suffers Update Anomalies, and to handle those scenarios Third Normal Form is
there.
Third Normal Form (3NF)
• Third Normal form applies that every non-prime attribute of table must be dependent on primary key,
or we can say that, there should not be the case that a non-prime attribute is determined by another
non-prime attribute.

• So this transitive functional dependency should be removed from the table and also the table must be
in Second Normal form.

• For example, consider a table with following fields.

• In this table Student_id is Primary key, but street, city and state depends upon Zip.

• The dependency between zip and other fields is called transitive dependency.

• Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key.
• New Student_Detail Table :

• The advantage of removing transitive dependency is,


• Amount of data duplication is reduced.

• Data integrity achieved.


Boyce and Codd Normal Form (BCNF)

• Boyce and Codd Normal Form is a higher version of the Third Normal form.

• This form deals with certain type of anomaly that is not handled by 3NF.

• A 3NF table which does not have multiple overlapping candidate keys is said to
be in BCNF.

• For a table to be in BCNF, following conditions must be satisfied:

• R must be in 3rd Normal Form

• and, for each functional dependency ( X -> Y ), X should be a super Key.


Denormalization

• Denormalization is a strategy used on previously-normalized database to increase


performance.

• The idea behind it is to add redundant data where we think a it will help us the most.

• We can use extra attributes in an existing table, add new tables, or even create instances
of existing tables.

• The usual goal is to decrease the running time of select queries by making data more
accessible to the queries or by generating summarized reports in separate tables.
When to Use Denormalization
• Read-Heavy Workloads:

• When the database is primarily used for querying and reading data (e.g., reporting systems, analytics, or
dashboards).

• Performance Bottlenecks:

• When normalized tables cause performance issues due to excessive joins or complex queries.

• Data Warehousing:

• When precomputing and storing aggregated or derived data (e.g., totals, averages) can save processing time
during queries.

• Simplified Query Logic:

• When the application requires simpler queries for ease of development or maintenance.

• Real-Time Systems:

• In real-time systems where low-latency responses are critical, denormalization can reduce query execution time.
Why Use Denormalization
• Improved Read Performance:

• By reducing the number of joins and simplifying queries, denormalization significantly speeds up read operations.

• Reduced Complexity for Queries:

• Denormalized tables can make queries easier to write and understand, especially for complex analytical queries.

• Scalability for Read-Heavy Applications:

• In applications like e-commerce, social media, or content management systems, where reads far outnumber writes,
denormalization helps scale the system.

• Optimized for Specific Use Cases:

• Denormalization allows tailoring the database schema to specific query patterns, improving efficiency for those use
cases.

• Reduced Load on the Database:

• By precomputing and storing redundant data, the database avoids recalculating or aggregating data during queries,
reducing CPU and memory usage.
What Are the Disadvantages of Denormalization?

• Data Redundancy:
• Storing redundant data increases storage requirements.
• Data Consistency:
• Updates, inserts, or deletes become more complex and slower, as the same data may
need to be updated in multiple places.
• Increased Maintenance:
• Ensuring data consistency across denormalized tables requires careful application
logic or triggers.
• Risk of Anomalies:
• Without proper management, denormalization can lead to data anomalies (e.g.,
inconsistent or outdated data).
Example

Denormalized
Normalized
JOIN
• A SQL join clause combines columns from one or more tables in a relational database. It
creates a set that can be saved as a table or used as it is.

• A JOIN is a means for combining columns from one (self-table) or more tables by using
values common to each.

• ANSI-standard SQL specifies five types of JOIN:


1. INNER,

2. LEFT OUTER,

3. RIGHT OUTER,

4. FULL OUTER

5. and CROSS.

• As a special case, a table (base table, view, or joined table) can JOIN to itself in a self-join.
Example
Theta (θ) Join
• Theta join combines tuples from different relations provided they satisfy the theta
condition. The join condition is denoted by the symbol θ.

• Notation

• R1 ⋈θ R2

• R1 and R2 are relations having attributes (A1, A2, .., An) and (B1, B2,.. ,Bn) such that
the attributes don’t have anything in common, that is R1 ∩ R2 = Φ.

• Theta join can use all kinds of comparison operators.

• Equijoin

• When Theta join uses only equality (=) comparison operator, it is said to be equijoin.
Example
Natural Join (⋈)

• Natural join does not use any comparison operator.

• It does not concatenate the way a Cartesian product does.

• We can perform a Natural Join only if there is at least one common


attribute that exists between two relations.

• In addition, the attributes must have the same name and domain.

• Natural join acts on those matching attributes where the values of


attributes in both the relations are same.
Example
Outer Joins
• Theta Join, Equijoin, and Natural Join are called inner joins.

• An inner join includes only those tuples with matching attributes and the rest
are discarded in the resulting relation.

• Therefore, we need to use outer joins to include all the tuples from the
participating relations in the resulting relation.

• There are three kinds of outer joins −


• left outer join,
• right outer join,
• and full outer join.
Left Outer Join(RS)

• All the tuples from the Left relation, R, are included in the
resulting relation.

• If there are tuples in R without any matching tuple in the Right


relation S, then the S-attributes of the resulting relation are made
NULL.
Example
Courses HoD
Right Outer Join: ( R⟖S )
• All the tuples from the Right relation, S, are included in the resulting relation.

• If there are tuples in S without any matching tuple in R, then the R-attributes of
resulting relation are made NULL.
Full Outer Join: ( R ⟗S)
• All the tuples from both participating relations are included in the resulting relation. If there
are no matching tuples for both relations, their respective unmatched attributes are made
NULL.
Database Organization
Database Organization

• This topic will deal with the physical organization of the database.

1. Storage and File Structure

2. Indexing and Hashing

3. Transaction and Concurrency

4. Backup and Recovery


DBMS - Storage System
• Databases are stored in file formats, which contain records.

• At physical level, the actual data is stored in electromagnetic


format on some device.

• These storage devices can be broadly categorized into three types



• Primary Storage − The memory storage that is directly accessible to the CPU comes under
this category. CPU's internal memory (registers), fast memory (cache), and main memory (RAM)
are directly accessible to the CPU, as they are all placed on the motherboard or CPU chipset.
This storage is typically very small, ultra-fast, and volatile. Primary storage requires continuous
power supply in order to maintain its state. In case of a power failure, all its data is lost.

• Secondary Storage − Secondary storage devices are used to store data for future use or as
backup. Secondary storage includes memory devices that are not a part of the CPU chipset or
motherboard, for example, magnetic disks, optical disks (DVD, CD, etc.), hard disks, flash
drives, and magnetic tapes.

• Tertiary Storage − Tertiary storage is used to store huge volumes of data. Since such storage
devices are external to the computer system, they are the slowest in speed. These storage
devices are mostly used to take the back up of an entire system. Optical disks and magnetic
tapes are widely used as tertiary storage.
Memory Hierarchy
• A computer system has a well-defined hierarchy of memory.

• A CPU has direct access to it main memory as well as its inbuilt registers.

• The access time of the main memory is obviously less than the CPU speed.

• To minimize this speed mismatch, cache memory is introduced.

• Cache memory provides the fastest access time and it contains data that is most
frequently accessed by the CPU.

• The memory with the fastest access is the costliest one.

• Larger storage devices offer slow speed and they are less expensive, however they
can store huge volumes of data as compared to CPU registers or cache memory.
Magnetic Disks
• Hard disk drives are the most common secondary storage devices in present computer systems.

• These are called magnetic disks because they use the concept of magnetization to store information.

• Hard disks consist of metal disks coated with magnetizable material.

• These disks are placed vertically on a spindle. A read/write head moves in between the disks and is used to
magnetize or de-magnetize the spot under it.

• A magnetized spot can be recognized as 0 (zero) or 1 (one).

• Hard disks are formatted in a well-defined order to store data efficiently.

• A hard disk plate has many concentric circles on it, called tracks. Every track is further divided into sectors.

• A sector on a hard disk typically stores 512 bytes of data.


Redundant Array of Independent Disks
• RAID or Redundant Array of Independent Disks, is a technology to connect multiple secondary storage
devices and use them as a single storage media.

• RAID consists of an array of disks in which multiple disks are connected together to achieve different
goals.

• RAID levels define the use of disk arrays.

• RAID 0

• RAID 1

• RAID 2

• RAID 3

• RAID 4

• RAID 5
RAID 0
• This configuration has striping but no redundancy of data.

• It offers the best performance but no fault-tolerance.

• If one disk fails, then that affects the entire array and the
chances for data loss or corruption increases.

• Each disk receives a block of data to write/read in parallel. It


enhances the speed and performance of the storage device.

• There is no parity and backup in Level 0.

• A minimum of two disks is required.


RAID 1
• RAID 1 is a fault-tolerance configuration known as "disk mirroring."

• With RAID 1, data is copied seamlessly and simultaneously, from one disk to another, creating
a replica, or mirror.

• If one disk gets fried, the other can keep working.

• It's the simplest way to implement fault tolerance and it's relatively low cost.

• The downside is that RAID 1 causes a slight drag on performance.

• A minimum of two disks is required for RAID 1 hardware implementations. With software RAID
1, instead of two physical disks, data can be mirrored between volumes on a single disk.

• One additional point to remember is that RAID 1 cuts total disk capacity in half: If a server with
two 1TB drives is configured with RAID 1, then total storage capacity will be 1TB not 2TB.
Example
RAID 2
• RAID 2 is similar to RAID 5, but instead of disk striping using parity, striping occurs at
the bit-level.

• RAID 2 is seldom deployed because costs to implement are usually prohibitive (a typical
setup requires 10 disks) and gives poor performance with some disk I/O operations.

• RAID 2 records Error Correction Code using Hamming distance for its data, striped on
different disks.

• Like level 0, each data bit in a word is recorded on a separate disk and ECC codes of the
data words are stored on a different set disks.
RAID 3
• RAID 3, which is rarely used in practice, consists of byte-level striping with a dedicated parity disk.

• One of the characteristics of RAID 3 is that it generally cannot service multiple requests simultaneously,
which happens because any single block of data will, by definition, be spread across all members of the
set and will reside in the same location.

• Therefore, any I/O operation requires activity on every disk and usually requires synchronized spindles.

• For this reason, RAID 3 is best for single-user systems with long record applications.
RAID 4
• RAID 4 is a configuration in which disk striping happens at the byte level, rather than at the
bit-level as in RAID 3.

• Note that level 3 uses byte-level striping, whereas level 4 uses block-level striping.

• As a result of its layout, RAID 4 provides good performance of random reads, while the
performance of random writes is low due to the need to write all parity data to a single disk.

• Both level 3 and level 4 require at least three disks to implement RAID.
RAID 5
• RAID 5 is by far the most common RAID configuration for business servers and enterprise NAS devices.

• This RAID level provides better performance than mirroring as well as fault tolerance.

• With RAID 5, data and parity (which is additional data used for recovery) are striped across three or more disks.

• If a disk gets an error or starts to fail, data is recreated from this distributed data and parity block— seamlessly and
automatically.

• Another benefit of RAID 5 is that it allows many NAS and server drives to be "hot-swappable" meaning in case a drive
in the array fails, that drive can be swapped with a new drive without shutting down the server or NAS.

• RAID 5 arrays are generally considered to be a poor choice for use on write-intensive systems because of the
performance impact associated with writing parity information.

• When a disk does fail, it can take a long time to rebuild a RAID 5 array. Performance is usually degraded during the
rebuild time and the array is vulnerable to an additional disk failure until the rebuild is complete.
Example
RAID 6
• RAID 6 is an extension of level 5. In this level, two independent parities are
generated and stored in distributed fashion among multiple disks.

• Two parities provide additional fault tolerance. This level requires at least four
disk drives to implement RAID.
File Structure
• Relative data and information is stored collectively in file formats.

• A file is a sequence of records stored in binary format.

• A disk drive is formatted into several blocks that can store records. File records

are mapped onto those disk blocks.


File Operations

• Operations on database files can be broadly classified into two categories −

• Update Operations

• Retrieval Operations

• Update operations change the data values by insertion, deletion, or update.

• Retrieval operations, on the other hand, do not alter the data but retrieve

them after optional conditional filtering.

• In both types of operations, selection plays a significant role.


• Several operations, which can be done on files.

• Open − A file can be opened in one of the two modes, read mode or write mode.

• Locate − Every file has a file pointer, which tells the current position where the data is to be read or written.

This pointer can be adjusted accordingly. Using find (seek) operation, it can be moved forward or backward.

• Read − By default, when files are opened in read mode, the file pointer points to the beginning of the file.

• Write − User can select to open a file in write mode, which enables them to edit its contents. It can be

deletion, insertion, or modification. The file pointer can be located at the time of opening or can be

dynamically changed if the operating system allows to do so.

• Close − This is the most important operation from the operating system’s point of view. When a request to

close a file is generated, the operating system

• removes all the locks (if in shared mode),

• saves the data (if altered) to the secondary storage media, and

• releases all the buffers and file handlers associated with the file.
File Organization
• File Organization defines how file records are mapped onto disk blocks.

• We have four types of File Organization to organize file records −


Heap File Organization
• When a file is created using Heap File Organization, the Operating System allocates memory area
to that file without any further accounting details.

• File records can be placed anywhere in that memory area.

• It is the responsibility of the software to manage the records.

• Heap File does not support any ordering, sequencing, or indexing on its own.

• Records are placed in file in the same order as they are inserted. A new record is inserted in the
last page of the file; if there is insufficient space in the last page, a new page is added to the file.

• This makes insertion very efficient.

• Once the data block is full, the next record is stored in the new block. This new block need not be
the very next block. This method can select any block in the memory to store the new records. It is
similar to pile file in the sequential method, but here data blocks are not selected sequentially.
Example
• Retrieval

• When a record has to be retrieved from the database, in this method, we need to traverse from the
beginning of the file till we get the requested record.

• Hence fetching the records in very huge tables, it is time consuming.

• This is because there is no sorting or ordering of the records.

• We need to check all the data.

• Deletion

• Similarly if we want to delete or update a record, first we need to search for the record.

• Again, searching a record is similar to retrieving it- start from the beginning of the file till the record is
fetched.

• In addition, while deleting a record, the record will be deleted from the data block.

• But it will not be freed and it cannot be re-used. Hence as the number of record increases, the memory
size also increases and hence the efficiency.

• For the database to perform better, DBA has to free this unused memory periodically.
Advantages of Heap File Organization

• Very good method of file organization for bulk insertion. i.e.; when
there is a huge number of data needs to load into the database at a
time, then this method of file organization is best suited.

• They are simply inserted one after the other in the memory blocks.

• It is suited for very small files as the fetching of records is faster in


them.

• As the file size grows, linear search for the record becomes time
consuming.
Disadvantages of Heap File Organization

• This method is inefficient for larger databases as it takes time to

search/modify the record.

• Proper memory management is required to boost the performance.

• Otherwise there would be lots of unused memory blocks lying and

memory size will simply be growing.


Sequential File Organization
• It is one of the simple methods of file organization.

• Here each file/records are stored one after the other in a sequential manner.

• This can be achieved in two ways:

1. Records are stored one after the other as they are inserted into the tables.
• This method is called pile file method. When a new record is inserted, it is placed at the end of
the file.

• In the case of any modification or deletion of record, the record will be searched in the
memory blocks.

• Once it is found, it will be marked for deleting and new block of record is entered.
Example

• In the diagram above, R1, R2, R3 etc. are the records. They contain all the attribute of a
row. i.e.; when we say student record, it will have his id, name, address, course, DOB
etc. Similarly R1, R2, R3 etc can be considered as one full set of attributes.
• In the second method, records are sorted (either ascending or
descending) each time they are inserted into the system.
• This method is called sorted file method.
• Sorting of records may be based on the primary key or on
any other columns.
• Whenever a new record is inserted, it will be inserted at
the end of the file and then it will sort – ascending or
descending based on key value and placed at the correct
position.
• In the case of update, it will update the record and then
sort the file to place the updated record in the right place.
Same is the case with delete.
Advantages of Sequential File Organization

• The design is very simple compared other file organization.

• There is no much effort involved to store the data.

• When there are large volumes of data, this method is very fast and efficient.

• This method is helpful when most of the records have to be accessed like

calculating the grade of a student, generating the salary slips etc where we use

all the records for our calculations

• This method is good in case of report generation or statistical calculations.

• These files can be stored in magnetic tapes which are comparatively cheap.
Disadvantages of Sequential File
Organization
• Sorted file method always involves the effort for sorting the
record.

• Each time any insert/update/ delete transaction is performed, file


is sorted.

• Hence identifying the record, inserting/ updating/ deleting the


record, and then sorting them always takes some time and may
make system slow.
Hash File Organization
• Hash File organization method is the one where data is stored at the data blocks whose address is
generated by using hash function.

• The memory location where these records are stored is called as data block or data bucket.

• This data bucket is capable of storing one or more records.

• The hash function can use any of the column value to generate the address.

• Most of the time, hash function uses primary key to generate the hash index – address of the data
block.

• Hash function can be simple mathematical function to any complex mathematical function.

• We can even consider primary key itself as address of the data block. That means each row will be
stored at the data block whose address will be same as primary key. This implies how simple a hash
function can be in database.
Example

• Using primary key


Types Of Hash File

• There are two types of hash file organizations –

• Static

• and Dynamic Hashing


Static Hashing
• In this method of hashing, the resultant data bucket address will be always same.

• That means, if we want to generate address for EMP_ID = 103 using mod (5) hash function, it
always result in the same bucket address 3.

• There will not be any changes to the bucket address here.

• Hence number of data buckets in the memory for this static hashing remains constant throughout.

• In our example, we will have five data buckets in the memory used to store the data.
• Searching a record

• Using the hash function, data bucket address is generated for the hash key. The record is then retrieved
from that location. i.e.; if we want to retrieve whole record for ID 104, and if the hash function is mod (5) on
ID, the address generated would be 4. Then we will directly got to address 4 and retrieve the whole record
for ID 104. Here ID acts as a hash key.

• Inserting a record

• When a new record needs to be inserted into the table, we will generate a address for the new record
based on its hash key. Once the address is generated, the record is stored in that location.

• Delete a record

• Using the hash function we will first fetch the record which is supposed to be deleted. Then we will remove
the records for that address in memory.

• Update a record

• Data record marked for update will be searched using static hash function and then record in that address
is updated.
Bucket overflow
• Suppose we have to insert some records into the file.

• But the data bucket address generated by the hash function is full or the data already exists in that address.

• How do we insert the data? This situation in the static hashing is called bucket overflow.

• This is one of the critical situations/ drawback in this method. Where will we save the data in this case? We
cannot lose the data.

• There are various methods to overcome this situation.

• Most commonly used methods are listed below:

• Closed hashing

• Open Hashing

• Quadratic probing

• Double Hashing
Closed hashing
• In this method we introduce a new data bucket with same address and link it after the full data bucket.
These methods of overcoming the bucket overflow are called closed hashing or overflow chaining.

• Consider we have to insert a new record R2 into the tables. The static hash function generates the data
bucket address as ‘AACDBF’. But this bucket is full to store the new data. What is done in this case is a
new data bucket is added at the end of ‘AACDBF’ data bucket and is linked to it. Then new record R2 is
inserted into the new bucket. Thus it maintains the static hashing address. It can add any number of
new data buckets, when it is full.
Open Hashing
• In this method, next available data block is used to enter the new record, instead of overwriting on
the older one.
• This method is called Open Hashing or linear probing.
• In the below example, R2 is a new record which needs to be inserted. But the hash function
generates address as 237.
• But it is already full.
• So the system searches next available data bucket, 238 and assigns R2 to it.
Other Methods

• Quadratic probing

• This is similar to linear probing. But here, the difference between old and new
bucket is linear. We use quadratic function to determine the new bucket
address.

• Double Hashing

• This is also another method of linear probing. Here the difference is fixed like in
linear probing, but this fixed difference is calculated by using another hash
function. Hence the name is double hashing.
Dynamic Hashing
• This hashing method is used to overcome the problems of static hashing – bucket overflow. In this method of hashing,
data buckets grows or shrinks as the records increases or decreases.

• This method of hashing is also known as extendable hashing method. Let us see an example to understand this
method.

• Consider there are three records R1, R2 and R4 are in the table. These records generate addresses 100100, 010110
and 110110 respectively.

• This method of storing considers only part of this address – especially only first one bit to store the data. So it tries to
load three of them at address 0 and 1.

• What will happen to R3 here? There is no bucket space for R3.


• The bucket has to grow dynamically to accommodate R3.

• So it changes the address have 2 bits rather than 1 bit, and then it updates the existing data to have 2
bit address.

• Then it tries to accommodate R3.

• Now we can see that address of R1 and R2 are changed to reflect the new address and R3 is also
inserted. As the size of the data increases, it tries to insert in the existing buckets. If no buckets are
available, the number of bits is increased to consider larger address, and hence increasing the buckets.
If we delete any record and if the data can be stored with lesser buckets, it shrinks the bucket size.
Advantages of Dynamic hashing
• Performance does not come down as the data grows in the system.

• It simply increases the memory size to accommodate the data.

• Since it grows and shrinks with the data, memory is well utilized.

• There will not be any unused memory lying.

• Good for dynamic databases where data grows and shrinks frequently.
Disadvantages of Dynamic hashing

• As the data size increases, the bucket size is also increased.

• These addresses will be maintained in bucket address tables.

• This is because, the address of the data will keep changing as buckets grow
and shrink.

• When there is a huge increase in data, maintaining this bucket address


table becomes tedious.

• Bucket overflow situation will occur in this case too.

• But it might take little time to reach this situation than static hashing.
Clustered File Organization

• In all the file organization methods described above, each file contains single table and are
all stored in different ways in the memory.

• In real life situation, retrieving records from single table is comparatively less.

• Most of the cases, we need to combine/join two or more related tables and retrieve the data.

• In such cases, above all methods will not be faster to give the result.

• Those methods have to traverse each table at a time and then combine the results of each
to give the requested result.

• This is obvious that the time taken for this is more. So what could be done to overcome this
situation?
• In this method two or more table which are frequently used to join and get the
results are stored in the same file called clusters.

• These files will have two or more tables in the same data block and the key columns
which map these tables are stored only once.

• This method hence reduces the cost of searching for various records in different files.

• All the records are found at one place and hence making search efficient.

• Here data are sorted based on the primary key or the key with which we are
searching the data.

• Also, clusters are formed based on the join condition.

• The key with which we are joining the tables is known as cluster key.
Clustering of tables are done when

1. There is a frequent need for joining the tables with same


condition.

2. If tables are joined once in a while or full table scan of any one
the table in involved in the query, then we do not cluster the
tables.

3. If there is 1: M relationship between the tables, then we can


cluster the tables.
Types Of Cluster File Organization
• Indexed Clusters: - Here records are grouped based on the cluster key and stored together. Our example below
to illustrate STUDENT-COURSE cluster is an indexed cluster. The records are grouped based on the cluster key –
COURSE_ID and all the related records are stored together.

• Hash Clusters: - This is also similar to indexed cluster. Here instead of storing the records based on the cluster
key, we generate the hash key value for the cluster key and store the records with same hash key value together
in the memory disk.
Advantages of Clustered File Organization

• This method is best suited when there is frequent request for

joining the tables with same joining condition.

• When there is a 1:M mapping between the tables, it results

efficiently
Disadvantages of Clustered File
Organization
• This method is not suitable for very large databases since the performance of

this method on them is low.

• We cannot use this clusters, if there is any change is joining condition. If the

joining condition changes, the traversing the file takes lot of time.

• This method is not suitable for less frequently joined tables or tables with 1:1

conditions.
Indexing
• Indexing is a data structure technique to efficiently retrieve records from the database files based on some attributes on
which the indexing has been done.

• Indexing in database systems is similar to what we see in books.

• Indexing is defined based on its indexing attributes.

• Indexing can be of the following types −


• Primary Index − Primary index is defined on an ordered data file. The data file is ordered on a key field. The key
field is generally the primary key of the relation.

• Secondary Index − Secondary index may be generated from a field which is a candidate key and has a unique
value in every record, or a non-key with duplicate values.

• Clustering Index − Clustering index is defined on an ordered data file. The data file is ordered on a non-key field.

• Ordered Indexing is of two types −


• Dense Index
• Sparse Index
Dense Index

• In dense index, there is an index record for every search key value in the database.

• This makes searching faster but requires more space to store index records itself.

• Index records contain search key value and a pointer to the actual record on the
disk.
Sparse Index
• In sparse index, index records are not created for every search key.

• An index record here contains a search key and an actual pointer to the data on the disk.

• To search a record, we first proceed by index record and reach at the actual location of the
data.

• If the data we are looking for is not where we directly reach by following the index, then the
system starts sequential search until the desired data is found.
Multilevel Index

• Index records comprise search-key values and data pointers.

• Multilevel index is stored on the disk along with the actual database
files.

• As the size of the database grows, so does the size of the indices.

• There is an immense need to keep the index records in the main


memory so as to speed up the search operations.

• If single-level index is used, then a large size index cannot be kept in


memory which leads to multiple disk accesses.
Example

Multi-level Index helps in breaking down the index into several smaller
indices in order to make the outermost level so small that it can be saved in a
single disk block, which can easily be accommodated anywhere in the main
memory.
B+ Tree
• B+ tree is similar to binary search tree, but it can have more
than two leaf nodes.

• It stores all the records only at the leaf node. Intermediary


nodes will have pointers to the leaf nodes. They do not contain
any data/records.

• Consider a student table below.

• The key value here is STUDENT_ID. And each record contains


the details of each student along with its key value and the
index/pointer to the next value. In a B+ tree it can be
represented as below.
• Please note that the leaf node 100 means, it has name and address of student with ID 100, as we
saw in R1, R2, R3 etc. above.

• We can observe here that it divides the records into two and splits into left node and right node.

• Left node will have all the values less than or equal to root node and the right node will have values
greater than root node.

• The intermediary nodes at level 2 will have only the pointers to the leaf nodes.

• The values shown in the intermediary nodes are only the pointers to next level.

• All the leaf nodes will have the actual records in a sorted order.
• If we have to search for any record, they are all found at leaf node.

• Hence searching any record will take same time because of equidistance of the leaf
nodes.

• Also they are all sorted.

• Hence searching a record is like a sequential search and does not take much time.

• Suppose a B+ tree has an order of n (it is the number of branches – above tree
structure has 5 branches altogether, hence order is 5), and then it can have n/2 to n
intermediary nodes and n/2 to n-1 leaf nodes.

• In our example above, n= 5 i.e.; it has 5 branches from root.

• Then it can have intermediary nodes ranging from 3 to 5. And it can have leaf nodes
from 3 to 4.
The main goal of B+ tree is:
• Sorted Intermediary and leaf nodes:
• Since it is a balanced tree, all nodes should be sorted.

• Fast traversal and Quick Search:


• One should be able to traverse through the nodes very fast. That means, if we have to search for any particular
record, we should be able pass through the intermediary node very easily.
• This is achieved by sorting the pointers at intermediary nodes and the records in the leaf nodes.
• Any record should be fetched very quickly. This is made by maintaining the balance in the tree and keeping all the
nodes at same distance.

• No overflow pages:
• B+ tree allows all the intermediary and leaf nodes to be partially filled – it will have some percentage defined while
designing a B+ tree.
• This percentage up to which nodes are filled is called fill factor.
• If a node reaches the fill factor limit, then it is called overflow page.
• If a node is too empty then it is called underflow.
• In our example above, intermediary node with 108 is underflow. And leaf nodes are not partially filled, hence it is an
Searching a record in B+ Tree
• Suppose we want to search 65 in the below B+ tree structure.

• First we will fetch for the intermediary node which will direct to the leaf node that can contain
record for 65.

• So we find branch between 50 and 75 nodes in the intermediary node.

• Then we will be redirected to the third leaf node at the end.

• Here DBMS will perform sequential search to find 65.


Insertion in B+ tree
• Suppose we have to insert a record 60 in below structure.

• It will go to 3rd leaf node after 55. Since it is a balanced tree and that leaf node is
already full, we cannot insert the record there.

• But it should be inserted there without affecting the fill factor, balance and order.

• So the only option here is to split the leaf node. But how do we split the nodes?
• The 3rd leaf node should have values (50, 55, 60, 65, 70) and its current root
node is 50.

• We will split the leaf node in the middle so that its balance is not altered.

• So we can group (50, 55) and (60, 65, 70) into 2 leaf nodes.

• If these two has to be leaf nodes, the intermediary node cannot branch from 50.

• It should have 60 added to it and then we can have pointers to new leaf node.
Delete in B+ tree

• Suppose we have to delete 15 from above tree.

• We will traverse to the 1st leaf node and simply delete 15 from that node.

• There is no need for any re-arrangement as the tree is balanced and 15 do not appear in the
intermediary node.
B+ Tree Extensions
• As the number of records grows in the database, the
intermediary and leaf nodes needs to be split and spread
widely to keep the balance of the tree.

• This is called as B+ tree extensions.

• As it spreads out widely, the searching of records becomes


faster.

• The main goal of creating B+ tree is faster traversal of


records.

• As the branches spreads out, it requires less I/O on disk to


get the record.

• Record that needs to be fetched are fetched in logarithmic


fraction of time.
• Searching, inserting and deleting a record is done in the same way we have seen
above.

• Since it is a balance tree, it searches for the position of the records in the file, and
then it fetches/inserts /deletes the records.

• In case it finds that tree will be unbalanced because of insert/delete/update, it does


the proper re-arrangement of nodes so that definition of B+ tree is not changed.

• Below is the simple example of how student details are stored in B+ tree index files.
Example
• Suppose we have a new student Bryan. Where will he fit in the file? He will fit in the 1st leaf node. Since this leaf node is not
full, we can easily add him in the node.

• But what happens if we want to insert another student Ben to this file? Some re-arrangement to the nodes is needed to
maintain the balance of the file.

• Same thing happens when we perform delete too.


Benefits of B+ Tree index files

• As the file grows in the database, the performance remains the same. This is
because all the records are maintained at leaf node and all the nodes are at
equi-distance from root.

• In addition, if there is any overflow, it automatically re-organizes the structure.

• Even though insertion and deletion are little complicated, it can be done in
fraction of seconds.

• Leaf node allows only partial/ half filled, since records are larger than pointers.
B Tree index Files
• B tree index file is similar to B+ tree index files, but it uses binary search concepts.

• In this method, each root will branch to only two nodes and each intermediary node will also have the
data.

• And leaf node will have lowest level of data.

• However, in this method also, records will be sorted.

• Since all intermediary nodes also have records, it reduces the traversing till leaf node for the data.

• A simple B tree can be represented as below:


Example of Simple Insert
• Insert bryan
Difference between B Tree and B+ Tree Index Files
B Tree B+ Tree
1 Here each node will have only two branches and Intermediary nodes contain only
each node will have some records. Hence here no pointers / address to the leaf nodes.
need to traverse till leaf node to get the data. All leaf nodes will have records and all
are at same distance from the root.

2 It has more height compared to width. Most width is more compared to


height.
3 Number of nodes at any intermediary level 'l' is Each intermediary node can have n/2
2l. Each of the intermediary nodes will have only to n children. Only root node will have
2 sub nodes. 2 children.
4 Records are in sorted order Records are in sorted order
5 Even a leaf node level will have 2l nodes. Hence Leaf node stores (n-1)/2 to n-1 values
total nodes in the B Tree are 2 l+1 - 1.
Advantages
B Tree B+ Tree
1 It might have fewer nodes compared to Automatically Adjust the nodes to fit the new
B+ tree as each node will have data. record. Similarly it re-organizes the nodes in
the case of delete, if required. Hence it does
not alter the definition of B+ tree.

2 Since each node has record, there might Good space utilization as intermediary nodes
not be required to traverse till leaf node. contain only pointer to the records and only
leaf nodes contain records.
Disadvantages
B Tree B+ Tree
1 If the tree is very big, then we have to If there is any rearrangement of nodes while
traverse through most of the nodes to get insertion or deletion, then it would be an
the records. overhead.

2 Insertion and deletion of nodes will have


re-arrangements like in B+ tree. But it will
be more complicated as it has to balance
the binary nodes.
3 Implementation of B tree is little difficult
compared to B+ tree
B+ Tree indexing
• This is the standard index in the database where primary key or the most frequently used search key
column in the table used to index. It has the same feature as discussed above. Hence it is efficient in
retrieving the data. These indexes can be stored in different forms in a B+ tree. Depending on the way they
are organized, there are 4 types of B+ tree indexes.

• Index-organized tables: - Here data itself acts as a index and whole record is stored in the B+ index
file.

• Descending Indexes: - Here index key is stored in the descending order in B+ tree files.

• Reverse key indexes: - In this method of indexing, the index key column value is stored in the
reverse order. For example, say index is created on STD_ID in the STUDENT table. Suppose STD_ID has
values 100,101,102 and 103. Then the reverse key index would be 001, 101,201 and 301 respectively.

• B+ tree Cluster Index: - Here, cluster key of the table is used in the index. Thus, each index in this
method will point to set of records with same cluster keys.
Transaction
Transaction
• A transaction can be defined as a group of tasks. A single task is the minimum processing unit which cannot be divided further.

• Let’s take an example of a simple transaction. Suppose a bank employee transfers Rs 500 from A's account to B's account. This
very simple and small transaction involves several low-level tasks.

• A’s Account
• Open_Account(A)
• Old_Balance = A.balance
• New_Balance = Old_Balance - 500
• A.balance = New_Balance
• Close_Account(A)

• B’s Account
• Open_Account(B)
• Old_Balance = B.balance
• New_Balance = Old_Balance + 500
• B.balance = New_Balance
• Close_Account(B)
ACID Properties

• A transaction is a very small unit of a program and it may contain several low level tasks.

• A transaction in a database system must maintain

1. Atomicity,
2. Consistency,
3. Isolation,
4. and Durability

• Commonly known as ACID properties − in order to ensure accuracy, completeness, and


data integrity.
• Atomicity − This property states that a transaction must be treated as an atomic unit, that is, either all of
its operations are executed or none. There must be no state in a database where a transaction is left
partially completed.

• Consistency − The database must remain in a consistent state after any transaction. No transaction
should have any adverse effect on the data residing in the database.

• Isolation - If there are multiple transactions executing simultaneously, then all the transaction should be
processed as if they are single transaction. But individual transaction in it should not alter or affect the
other transaction.

• Durability − The database should be durable enough to hold all its latest updates even if the system fails
or restarts.

• If a transaction updates a chunk of data in a database and commits, then the database will hold the
modified data.

• If a transaction commits but the system fails before the data could be written on to the disk, then that data
will be updated once the system springs back into action.
Serializability
• When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with
some other transaction.
• Schedule − A chronological execution sequence of a transaction is called a schedule.

• A schedule can have many transactions in it, each comprising of a number of instructions/tasks.

• Serial Schedule − It is a schedule in which transactions are aligned in such a way that one
transaction is executed first.

• When the first transaction completes its cycle, then the next transaction is executed.

• Transactions are ordered one after the other.

• This type of schedule is called a serial schedule, as transactions are executed in a serial manner.
• In a multi-transaction environment, serial schedules are considered as a benchmark.

• The execution sequence of an instruction in a transaction cannot be changed, but


two transactions can have their instructions executed in a random fashion.

• This execution does no harm if two transactions are mutually independent and
working on different segments of data; but in case these two transactions are
working on the same data, then the results may vary.

• This ever-varying result may bring the database to an inconsistent state.

• To resolve this problem, we allow parallel execution of a transaction schedule, if its


transactions are either serializable or have some equivalence relation among them.
Example
• For example, suppose transaction A multiplies data values by 2 and transaction B adds 1 to
data values.

• Now suppose that there are two data values: 0 and 10.

• If these transactions are run one after the other, the new values will be 1 and 21 if
transaction A is run first, or 2 and 22 if transaction B is run first.

• But what if the order in which the two transactions are run is different for each value? If
transaction A is run first on the first value and transaction B is run first on the second value,
the new values are 1 and 22.

• If this order is reversed, the new values are 2 and 21.

• The transactions are serializable if 1, 21 and 2, 22 are the only possible results.

• The transactions are not serializable if 1, 22 or 2, 21 is a possible result.


Equivalence Schedules
• An equivalence schedule can be of the following types −

• Result Equivalence

• If two schedules produce the same result after execution, they are said to be result equivalent. They may yield
the same result for some value and different results for another set of values. That's why this equivalence is not
generally considered significant.

• View Equivalence

• Two schedules would be view equivalence if the transactions in both the schedules perform similar actions in a
similar manner.

• For example −
• If T reads the initial data in S1, then it also reads the initial data in S2.
• If T reads the value written by J in S1, then it also reads the value written by J in S2.
• If T performs the final write on the data value in S1, then it also performs the final write on the data value in
S2.
Conflict Equivalence
• Two schedules would be conflicting if they have the following properties −

 Both belong to separate transactions.

 Both accesses the same data item.

 At least one of them is "write" operation.

• Two schedules having multiple transactions with conflicting operations are said to be conflict
equivalent if and only if −
 Both the schedules contain the same set of Transactions.

 The order of conflicting pairs of operation is maintained in both the schedules.

• Note − View equivalent schedules are view serializable and conflict equivalent schedules are
conflict serializable.

• All conflict serializable schedules are view serializable too.


States of Transactions
• A transaction in a database can be in one of the following states −
• Active − In this state, the transaction is being executed. This is the initial state of every transaction.

• Partially Committed − When a transaction executes its final operation, it is said to be in a partially committed state.

• Failed − A transaction is said to be in a failed state if any of the checks made by the database recovery system fails. A
failed transaction can no longer proceed further.

• Aborted − If any of the checks fails and the transaction has reached a failed state, then the recovery manager rolls back
all its write operations on the database to bring the database back to its original state where it was prior to the execution
of the transaction.

• Transactions in this state are called aborted.

• The database recovery module can select one of the two operations after a transaction aborts −

 Re-start the transaction

 Kill the transaction

• Committed − If a transaction executes all its operations successfully, it is said to be committed. All its effects are now
permanently established on the database system.
Concurrency Control
• In a multiprogramming environment where multiple transactions can be executed
simultaneously, it is highly important to control the concurrency of transactions.

• We have concurrency control protocols to ensure atomicity, isolation, and


serializability of concurrent transactions.

• Concurrency control protocols can be broadly divided into two categories

• Lock based protocols

• Time stamp based protocols


Lock-based Protocols
• Database systems equipped with lock-based protocols use a mechanism by which any
transaction cannot read or write data until it acquires an appropriate lock on it.

• Locks are of two kinds −

• Binary Locks − A lock on a data item can be in two states; it is either locked or
unlocked.
• Shared/exclusive − This type of locking mechanism differentiates the locks based on
their uses.
 If a lock is acquired on a data item to perform a write operation, it is an exclusive
lock. Allowing more than one transaction to write on the same data item would
lead the database into an inconsistent state.
 Read locks are shared because no data value is being changed.
Types Of Lock Protocols
• There are four types of lock protocols available −

• Simplistic Lock Protocol

• Simplistic lock-based protocols allow transactions to obtain a lock on every object before a 'write' operation is
performed. Transactions may unlock the data item after completing the ‘write’ operation.

• Pre-claiming Lock Protocol

• Pre-claiming protocols evaluate their operations and create a list of data items on which they need locks.

• Before initiating an execution, the transaction requests the system for all the locks it needs beforehand.

• If all the locks are granted, the transaction executes and releases all the locks when all its operations are over.
If all the locks are not granted, the transaction rolls back and waits until all the locks are granted.
• Two-Phase Locking 2PL

• This locking protocol divides the execution phase of a transaction into three parts.
 In the first part, when the transaction starts executing, it seeks permission for the locks it requires.

 The second part is where the transaction acquires all the locks.

• As soon as the transaction releases its first lock, the third phase starts.

• In this phase, the transaction cannot demand any new locks; it only releases the acquired locks.

• Two-phase locking has two phases, one is growing, where all the locks are being acquired by the transaction; and the second
phase is shrinking, where the locks held by the transaction are being released.

• To claim an exclusive (write) lock, a transaction must first acquire a shared (read) lock and then upgrade it to an exclusive lock.
• Strict Two-Phase Locking

• The first phase of Strict-2PL is same as 2PL.

• After acquiring all the locks in the first phase, the transaction continues to execute normally.

• But in contrast to 2PL, Strict-2PL does not release a lock after using it.

• Strict-2PL holds all the locks until the commit point and releases all the locks at a time.

• Strict-2PL does not have cascading abort as 2PL does.


Timestamp-based Protocols

• The most commonly used concurrency protocol is the timestamp based protocol. This
protocol uses either system time or logical counter as a timestamp.

• Lock-based protocols manage the order between the conflicting pairs among transactions at
the time of execution, whereas timestamp-based protocols start working as soon as a
transaction is created.

• Every transaction has a timestamp associated with it, and the ordering is determined by the
age of the transaction. A transaction created at 0002 clock time would be older than all other
transactions that come after it. For example, any transaction 'y' entering the system at 0004
is two seconds younger and the priority would be given to the older one.

• In addition, every data item is given the latest read and write-timestamp. This lets the
system know when the last ‘read and write’ operation was performed on the data item.
Timestamp Ordering Protocol
• The timestamp-ordering protocol ensures serializability among transactions in
their conflicting read and write operations.

• This is the responsibility of the protocol system that the conflicting pair of tasks
should be executed according to the timestamp values of the transactions.

• The timestamp of transaction Ti is denoted as TS(Ti).

• Read time-stamp of data-item X is denoted by R-timestamp(X).

• Write time-stamp of data-item X is denoted by W-timestamp(X).


• Timestamp ordering protocol works as follows −

• If a transaction Ti issues a read(X) operation −

• If TS(Ti) < W-timestamp(X)

• Operation rejected.

• If TS(Ti) >= W-timestamp(X)

• Operation executed.

• All data-item timestamps updated.

• If a transaction Ti issues a write(X) operation −

• If TS(Ti) < R-timestamp(X)

• Operation rejected.

• If TS(Ti) < W-timestamp(X)

• Operation rejected and Ti rolled back.

• Otherwise, operation executed.


Thomas' Write Rule

• This rule states if TS(Ti) < W-timestamp(X), then the operation is rejected and
Ti is rolled back.

• Time-stamp ordering rules can be modified to make the schedule view


serializable.

• Instead of making Ti rolled back, the 'write' operation itself is ignored.


Deadlock
• In a multi-process system, deadlock is an unwanted situation that arises in a shared resource environment,
where a process indefinitely waits for a resource that is held by another process.

• For example, assume a set of transactions {T0, T1, T2, ...,Tn}. T0 needs a resource X to complete its task.

• Resource X is held by T1, and T1 is waiting for a resource Y, which is held by T2.

• T2 is waiting for resource Z, which is held by T0.

• Thus, all the processes wait for each other to release resources.

• In this situation, none of the processes can finish their task.

• This situation is known as a deadlock.

• Deadlocks are not healthy for a system.

• In case a system is stuck in a deadlock, the transactions involved in the deadlock are either rolled back or
restarted.
Deadlock Prevention
• To prevent any deadlock situation in the system, the DBMS aggressively inspects all the
operations, where transactions are about to execute.

• The DBMS inspects the operations and analyzes if they can create a deadlock situation.

• If it finds that a deadlock situation might occur, then that transaction is never allowed
to be executed.

• There are deadlock prevention schemes that use timestamp ordering mechanism of
transactions in order to predetermine a deadlock situation.
 Wait-Die Scheme

 Wound-Wait Scheme
Wait-Die Scheme
• In this scheme, if a transaction requests to lock a resource (data item), which is already held
with a conflicting lock by another transaction, then one of the two possibilities may occur −

• If TS(Ti) < TS(Tj) − that is Ti, which is requesting a conflicting lock, is older than Tj − then
Ti is allowed to wait until the data-item is available.

• If TS(Ti) > TS(tj) − that is Ti is younger than Tj − then Ti dies. Ti is restarted later with a
random delay but with the same timestamp.

• This scheme allows the older transaction to wait but kills the younger one.
Wound-Wait Scheme
• In this scheme, if a transaction requests to lock a resource (data item), which is already held with conflicting
lock by some another transaction, one of the two possibilities may occur −

• If TS(Ti) < TS(Tj), then Ti forces Tj to be rolled back − that is Ti wounds Tj. Tj is restarted later with a
random delay but with the same timestamp.

• If TS(Ti) > TS(Tj), then Ti is forced to wait until the resource is available.

• This scheme, allows the younger transaction to wait; but when an older transaction requests an item held by
a younger one, the older transaction forces the younger one to abort and release the item.

• In both the cases, the transaction that enters the system at a later stage is aborted.
Deadlock Avoidance

• Aborting a transaction is not always a practical approach.

• Instead, deadlock avoidance mechanisms can be used to detect


any deadlock situation in advance.

• Methods like "wait-for graph" are available but they are suitable
for only those systems where transactions are lightweight having
fewer instances of resource.

• In a bulky system, deadlock prevention techniques may work well.


Wait-for Graph
• This is a simple method available to track if any deadlock situation
may arise.

• For each transaction entering into the system, a node is created.

• When a transaction Ti requests for a lock on an item, say X, which is


held by some other transaction Tj, a directed edge is created from Ti
to Tj.

• If Tj releases item X, the edge between them is dropped and Ti locks


the data item.

• The system maintains this wait-for graph for every transaction


waiting for some data items held by others.

• The system keeps checking if there's any cycle in the graph.


• Here, we can use any of the two following approaches −

First, do not allow any request for an item, which is already locked by

another transaction. This is not always feasible and may cause starvation,

where a transaction indefinitely waits for a data item and can never acquire

it.

The second option is to roll back one of the transactions. It is not always

feasible to roll back the younger transaction, as it may be important than

the older one. With the help of some relative algorithm, a transaction is

chosen, which is to be aborted. This transaction is known as the victim and

the process is known as victim selection.


Data Backup
• Backup is the activity of copying files or databases so that they will be
preserved in case of equipment failure or other catastrophe.

• Backup is usually a routine part of the operation of large businesses


with mainframes as well as the administrators of smaller business
computers.

• For personal computer users, backup is also necessary but often


neglected.

• The retrieval of files you backed up is called restoring them.


Loss of Volatile Storage
• A volatile storage like RAM stores all the active logs, disk buffers, and
related data.

• In addition, it stores all the transactions that are being currently executed.

• What happens if such a volatile storage crashes abruptly? It would


obviously take away all the logs and active copies of the database.

• It makes recovery almost impossible, as everything that is required to


recover the data is lost.
• Following techniques may be adopted in case of loss of volatile storage −
• We can have checkpoints at multiple stages so as to save the contents of the database
periodically.
• A state of active database in the volatile memory can be periodically dumped onto a stable
storage, which may also contain logs and active transactions and buffer blocks.
• <dump> can be marked on a log file, whenever the database contents are dumped from a
non-volatile memory to a stable one.

• Recovery
• When the system recovers from a failure, it can restore the latest dump.
• It can maintain a redo-list and an undo-list as checkpoints.
• It can recover the system by consulting undo-redo lists to restore the state of all transactions
up to the last checkpoint.
Database Backup & Recovery from Catastrophic
Failure
• A catastrophic failure is one where a stable, secondary storage device gets corrupt. With the
storage device, all the valuable data that is stored inside is lost. We have two different strategies
to recover data from such a catastrophic failure −

• Remote backup; Here a backup copy of the database is stored at a remote location from where it
can be restored in case of a catastrophe.

• Alternatively, database backups can be taken on magnetic tapes and stored at a safer place. This
backup can later be transferred onto a freshly installed database to bring it to the point of backup.

• Grown-up databases are too bulky to be frequently backed up. In such cases, we have techniques
where we can restore a database just by looking at its logs. So, all that we need to do here is to
take a backup of all the logs at frequent intervals of time. The database can be backed up once a
week, and the logs being very small can be backed up every day or as frequently as possible.
Remote Backup

• Remote backup provides a sense of security in case the primary location


where the database is located gets destroyed.

• Remote backup can be offline or real-time or online.

• In case it is offline, it is maintained manually.


• Online backup systems are more real-time and lifesavers for database
administrators and investors.

• An online backup system is a mechanism where every bit of the real-time


data is backed up simultaneously at two distant places.

• One of them is directly connected to the system and the other one is kept
at a remote place as backup.

• As soon as the primary database storage fails, the backup system senses
the failure and switches the user system to the remote storage.

• Sometimes this is so instant that the users can’t even realize a failure.
Data Recovery
• In computing, data recovery is a process of salvaging inaccessible data from
corrupted or damaged secondary storage, removable media or files, when the
data they store cannot be accessed in a normal way.

• The data is most often salvaged from storage media such as internal or
external hard disk drives (HDDs), solid-state drives (SSDs), USB flash drives,
magnetic tapes, CDs, DVDs, RAID subsystems, and other electronic devices.

• Recovery may be required due to physical damage to the storage device or


logical damage to the file system that prevents it from being mounted by the
host operating system (OS).
Crash Recovery

• DBMS is a highly complex system with hundreds of transactions


being executed every second.

• The durability and robustness of a DBMS depends on its complex


architecture and its underlying hardware and system software.

• If it fails or crashes amid transactions, it is expected that the system


would follow some sort of algorithm or techniques to recover lost
data.
Failure Classification

• To see where the problem has occurred, we generalize a failure into

various categories, as follows −

• Transaction failure

• System Crash

• Disk Failure
Storage Structure

• We have already described the storage system. In brief, the storage structure can be divided into
two categories −

• Volatile storage − As the name suggests, a volatile storage cannot survive system crashes.

• Volatile storage devices are placed very close to the CPU; normally they are embedded onto
the chipset itself.

• For example, main memory and cache memory are examples of volatile storage. They are
fast but can store only a small amount of information.

• Non-volatile storage − These memories are made to survive system crashes.

• They are huge in data storage capacity, but slower in accessibility. Examples may include
hard-disks, magnetic tapes, flash memory, and non-volatile (battery backed up) RAM.
Recovery and Atomicity

• When a system crashes, it may have several transactions being


executed and various files opened for them to modify the data items.

• Transactions are made of various operations, which are atomic in nature.

• But according to ACID properties of DBMS, atomicity of transactions as a


whole must be maintained, that is, either all the operations are executed
or none.
• When a DBMS recovers from a crash, it should maintain the following −

• It should check the states of all the transactions, which were being executed.

• A transaction may be in the middle of some operation; the DBMS must ensure the

atomicity of the transaction in this case.

• It should check whether the transaction can be completed now or it needs to be

rolled back.

• No transactions would be allowed to leave the DBMS in an inconsistent state.


• There are two types of techniques, which can help a DBMS in recovering as well as
maintaining the atomicity of a transaction −

• Maintaining the logs of each transaction, and writing them onto some stable
storage before actually modifying the database.

• Maintaining shadow paging, where the changes are done on a volatile memory,
and later, the actual database is updated.
Log-based Recovery
• Log is a sequence of records, which maintains the records of actions performed by a transaction. It is
important that the logs are written prior to the actual modification and stored on a stable storage media,
which is failsafe.

• Log-based recovery works as follows −

• The log file is kept on a stable storage media.

• When a transaction enters the system and starts execution, it writes a log about it.

• <Tn, Start>

• When the transaction modifies an item X, it write logs as follows −

• <Tn, X, V1, V2>

• It reads Tn has changed the value of X, from V1 to V2.

• When the transaction finishes, it logs −

• <Tn, commit>
• The database can be modified using two approaches −

• Deferred database modification − All logs are written on


to the stable storage and the database is updated when a
transaction commits.

• Immediate database modification − Each log follows an


actual database modification.

• That is, the database is modified immediately after every


operation.
Recovery with Concurrent Transactions
• When more than one transaction are being executed in parallel, the logs are
interleaved.

• At the time of recovery, it would become hard for the recovery system to
backtrack all logs, and then start recovering.

• To ease this situation, most modern DBMS use the concept of 'checkpoints'.

• Checkpoint

• Keeping and maintaining logs in real time and in real environment may fill out all
the memory space available in the system.

• As time passes, the log file may grow too big to be


Structured Query Language
SQL Overview
• What is SQL?

• SQL is Structured Query Language, which is a computer language for storing, manipulating
and retrieving data stored in relational database.

• SQL is the standard language for Relation Database System.

• All relational database management systems like MySQL, MS Access, Oracle, Sybase, Informix,
postgres and SQL Server use SQL as standard database language.

• Also, they are using different dialects, such as:


• MS SQL Server using T-SQL,
• Oracle using PL/SQL,
• MS Access version of SQL is called JET SQL (native format) etc.
Why SQL?

• Allows users to access data in relational database management systems.

• Allows users to describe the data.

• Allows users to define the data in database and manipulate that data.

• Allows to embed within other languages using SQL modules, libraries & pre-compilers.

• Allows users to create and drop databases and tables.

• Simply Easy Learning

• Allows users to create view, stored procedure, functions in a database.

• Allows users to set permissions on tables, procedures and views


SQL Process:
• When you are executing an SQL command for any RDBMS, the system determines the
best way to carry out your request and SQL engine figures out how to interpret the task.

• There are various components included in the process.

• These components are


• Query Dispatcher,
• Optimization Engines,
• Classic Query Engine
• and SQL Query Engine, etc.

• Classic query engine handles all non-SQL queries, but SQL query engine won't handle
logical files.
SQL Architecture:
SQL Commands

• The standard SQL commands to interact with relational databases are CREATE, SELECT,

INSERT, UPDATE, DELETE and DROP.

• These commands can be classified into groups based on their nature:

• DDL -Data Definition Language:

• DML -Data Manipulation Language:

• DCL -Data Control Language:

• DQL -Data Query Language:


DDL -Data Definition Language:

• The Data Definition Language (DDL) manages table and index structure.

• The most basic items of DDL are the CREATE, ALTER, RENAME and DROP

statements:

• CREATE creates an object (a table, for example) in the database.

• DROP deletes an object in the database, usually irretrievably.

• ALTER modifies the structure an existing object in various ways—for example,

adding a column to an existing table.


DML -Data Manipulation Language

• The Data Manipulation Language (DML) is the subset of SQL used to add, update
and delete data.

• The acronym CRUD refers to all of the major functions that need to be
implemented in a relational database application to consider it complete.

• Each letter in the acronym can be mapped to a standard SQL statement:


DCL -Data Control Language
• The Data Control Language (DCL) is a subset of the Structured Query Language (SQL) that

allows database administrators to configure security access to relational databases.

• DCL is the simplest of the SQL subsets, as it consists of only three commands: GRANT,

REVOKE, and DENY.

• Combined, these three commands provide administrators with the flexibility to set and remove

database permissions in an extremely granular fashion.


DQL -Data Query Language

• Data Query Language is used to extract data from the database.

• It doesn't modify any data in the database. It describes only one


query: SELECT.
SQL RDBMS Databases

• This would help you to compare their basic features

• MySQL

• MS SQL Server

• ORACLE

• MSACCESS
SQL Syntax
• SQL syntax can vary slightly between different database systems (MySQL, PostgreSQL, SQL Server,

Oracle, etc.).

• Case-sensitivity may also vary between systems.

• Understanding the basic clauses (SELECT, FROM, WHERE, ORDER BY, etc.) is essential for writing

effective SQL queries.

• Joins are critical for combining data from multiple tables.

• Aggregate functions and GROUP BY are used for summarizing data.

• Indexes improve query performance.

• Views simplify complex queries and improve data security.


Basic SQL Syntax:
• SQL SELECT Statement:

• SQL DISTINCT Clause:

• SQL WHERE Clause

• SQL AND/OR Clause:


• SQL IN Clause

• SQL BETWEEN Clause

• SQL LIKEClause

• SQL ORDER BY Clause


• SQL GROUP BY Clause

• SQL COUNT Clause

• SQL HAVING Clause


• SQL CREATE TABLE Statement:

• SQL DROP TABLE Statement:

• SQL CREATE INDEX Statement

• SQL DROP INDEX Statement:


• SQL DESC Statement:

• SQL TRUNCATE TABLE Statement:

• SQL ALTER TABLE Statement:

• SQLALTER TABLE Statement (Rename):

• SQL INSERT INTO Statement:

• SQL UPDATE Statement:


• SQL DELETE Statement:

• SQL CREATE DATABASE Statement:

• SQL DROP DATABASE Statement:

• SQL USE Statement:

• SQL COMMIT Statement:

• SQL ROLLBACK Statement:


SQL Data Types
• SQL data type is an attribute that specifies type of data of any object.

• SQL Server offers six categories of data types for your use:

• Exact Numeric Data Types:

• Approximate Numeric Data Types:

• Date and Time Data Types:

• Character Strings Data Types:

• Unicode Character Strings Data Types:

• Binary Data Types:

• Misc Data Types:


Exact Numeric Data Types
Approximate Numeric Data Types
Date and Time Data Types
Character Strings Data Types
Unicode Character Strings Data Types
Binary Data Types
Misc Data Types
SQL Operators

• An operator is a reserved word or a character used primarily in an SQL statement's WHERE clause

to perform operation(s), such as comparisons and arithmetic operations.

• Operators are used to specify conditions in an SQL statement and to serve as conjunctions for

multiple conditions in a statement.

• Arithmetic operators

• Comparison operators

• Logical operators

• Operators used to negate conditions


SQL Arithmetic Operators
SQL Comparison Operators
SQL Logical Operators
Working with SQL
• Database: Create,Drop,Delete

• Tables: Create,Drop,Delete

• SQL Query: Insert,Select,Update,Delete

• SQL Clause: Top,Where,Or,And,Order By,Group By,Having, Distinct

• SQL Constraints

• SQL Joins

• SQL Unions

• SQL Alias

• SQL Index

• SQL Views
SQL CREATE Database
• Syntax:
• CREATE DATABASE DatabaseName;
• Always database name should be unique within the RDBMS.

• Example:
• SQL> CREATE DATABASE testDB;
• Make sure you have admin privilege before creating any database.

• Once a database is created, you can check it in the list of databases as follows:

• SQL> SHOW DATABASES;


DROP or DELETE Database

• Syntax:
• DROP DATABASE DatabaseName;

• Always database name should be unique within the RDBMS.

• Example:

• If you want to delete an existing database <testDB>, then DROP


DATABASE statement would be as follows:
• SQL> DROP DATABASE testDB;
SQL SELECT Database
• When you have multiple databases in your SQL Schema, then before starting your

operation, you would need to select a database where all the operations would be
performed.

• The SQL USE statement is used to select any existing database in SQL schema.

• Syntax:

• USE DatabaseName;

• Now, if you want to work with AMROOD database, then you can execute the

following SQL command and start working with AMROOD database:

• SQL> USE AMROOD;


SQL CREATE Table
• Syntax:

• CREATE TABLE table_name(


• column1 datatype,
• column2 datatype,
• column3 datatype,
• .....
• columnN datatype,
• PRIMARY KEY( one or more columns )

• );
Create Table Using another Table
• Syntax:

• CREATE TABLE NEW_TABLE_NAME AS


• SELECT [ column1, column2...columnN ]
• FROM EXISTING_TABLE_NAME
• [ WHERE ]
• Example:

• Following is an example, which would create a table SALARY using CUSTOMERS table and having fields customer ID and customer SALARY:

• SQL> CREATE TABLE SALARY AS


• SELECT ID, SALARY
• FROM CUSTOMERS;
Example:
• Following is an example, which creates a CUSTOMERS table with ID as primary key and NOT NULL are the

constraints showing that these fileds can not be NULL while creating records in this table:

• SQL> CREATE TABLE CUSTOMERS(

• ID INT NOT NULL,

• NAME VARCHAR (20) NOT NULL,

• AGE INT NOT NULL,

• ADDRESS CHAR (25) ,

• SALARY DECIMAL (18, 2),

• PRIMARY KEY (ID)

• );

• You can verify if your table has been created successfully by looking at the message displayed by the SQL

server, otherwise you can use DESC command as follows:

• SQL> DESC CUSTOMERS;


SQL DROP or DELETE Table

• The SQL DROP TABLE statement is used to remove a table definition and all
data, indexes, triggers, constraints, and permission specifications for that
table.

• Syntax:

• DROP TABLE table_name;


• Example

• SQL> DROP TABLE CUSTOMERS;


SQL INSERT Query

• The SQL INSERT INTO Statement is used to add new rows of data to a table in the
database.

• Syntax:

• There are two basic syntaxes of INSERT INTO statement as follows:

• INSERT INTO TABLE_NAME (column1, column2, column3,...columnN)]

• VALUES (value1, value2, value3,...valueN);

• You may not need to specify the column(s) name in the SQL query if you are adding values
for all the columns of the table.
• INSERT INTO TABLE_NAME VALUES (value1,value2,value3,...valueN);
Example:
• Following statements would create four records in CUSTOMERS table:

• INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)

• VALUES (1, 'Ramesh', 32, 'Ahmedabad', 2000.00 );

• INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)

• VALUES (2, 'Khilan', 25, 'Delhi', 1500.00 );

• INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)

• VALUES (3, 'kaushik', 23, 'Kota', 2000.00 );

• INSERT INTO CUSTOMERS (ID,NAME,AGE,ADDRESS,SALARY)

• VALUES (4, 'Chaitali', 25, 'Mumbai', 6500.00 );

• You can create a record in CUSTOMERS table using second syntax as follows:

• INSERT INTO CUSTOMERS

• VALUES (7, 'Muffy', 24, 'Indore', 10000.00 );


Populate one table using another table:
• You can populate data into a table through select statement over another
table provided another table has a set of fields, which are required to populate
first table.

• Here is the syntax:


• INSERT INTO first_table_name [(column1, column2, ... columnN)]

• SELECT column1, column2, ...columnN

• FROM second_table_name

• [WHERE condition];
SQL SELECT Query

• SQL SELECT Statement is used to fetch the data from a database table which returns data
in the form of result table. These result tables are called result-sets.

• Syntax:

• SELECT column1, column2, columnN FROM table_name;


• If you want to fetch all the fields available in the field, then you can use the following
syntax:

• SELECT * FROM table_name;


Example:
• Consider the CUSTOMERS table having the following records:

• Following is an example, which would fetch ID, Name and Salary fields of the customers available in CUSTOMERS
table:

• SQL> SELECT ID, NAME, SALARY FROM CUSTOMERS;


• Relation Algebra:
• ΠID,NAME,SALARY (CUSTOMERS)
• Relation Calculus

• {T.ID,T.NAME,T.SALARY| CUSTOMERS (T)}


• {<id,name,salary>|<id,name,salary> ϵ Customers}
SQL WHERE Clause
• The SQL WHERE clause is used to specify a condition while fetching the data from single
table or joining with multiple tables.

• If the given condition is satisfied, then only it returns specific value from the table. You
would use WHERE clause to filter the records and fetching only necessary records.

• The WHERE clause is not only used in SELECT statement, but it is also used in UPDATE,
DELETE statement, etc., which we would examine in subsequent chapters.

• Syntax:

• SELECT column1, column2, columnN

• FROM table_name

• WHERE [condition]
Example:
• Consider the CUSTOMERS table having the following records:

• Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS table where salary
is greater than 2000:

• SELECT ID, NAME, SALARY

• FROM CUSTOMERS

• WHERE SALARY > 2000;


Example 2
• Following is an example, which would fetch ID, Name and Salary fields from the
CUSTOMERS table for a customer with name Hardik.

• Here, it is important to note that all the strings should be given inside single
quotes ('') where as numeric values should be given without any quote as in
above example:
• SELECT ID, NAME, SALARY

• FROM CUSTOMERS

• WHERE NAME = 'Hardik';


SQL AND and OR Operators
• The SQL AND and OR operators are used to combine multiple conditions to narrow data in an SQL
statement.

• These two operators are called conjunctive operators.

• These operators provide a means to make multiple comparisons with different operators in the same SQL
statement.

• The AND Operator:

• The AND operator allows the existence of multiple conditions in an SQL statement's WHERE clause.

• Syntax:

• The basic syntax of AND operator with WHERE clause is as follows:


• SELECT column1, column2, columnN

• FROM table_name

• WHERE [condition1] AND [condition2]...AND [conditionN];


Example:
• Consider the CUSTOMERS table having the following records:

• Following is an example, which would fetch ID, Name and Salary fields from the
CUSTOMERS table where salary is greater than 2000 AND age is less tan 25 years:
• SELECT ID, NAME, SALARY

• FROM CUSTOMERS

• WHERE SALARY > 2000 AND age < 25;


The OR Operator:

• The OR operator is used to combine multiple conditions in an SQL statement's WHERE

clause.

• Syntax:

• The basic syntax of OR operator with WHERE clause is as follows:

• SELECT column1, column2, columnN

• FROM table_name

• WHERE [condition1] OR [condition2]...OR [conditionN]


Example:

• Consider the CUSTOMERS table having the following records:

• Following is an example, which would fetch ID, Name and Salary fields from the CUSTOMERS table
where salary is greater than 2000 OR age is less than 25 years:

• SELECT ID, NAME, SALARY

• FROM CUSTOMERS

• WHERE SALARY > 2000 OR age < 25;


SQL UPDATE Query

• The SQL UPDATE Query is used to modify the existing records in a table.

• You can use WHERE clause with UPDATE query to update selected rows, otherwise all
the rows would be affected.

• Syntax:

• The basic syntax of UPDATE query with WHERE clause is as follows:

• UPDATE table_name

• SET column1 = value1, column2 = value2...., columnN = valueN

• WHERE [condition];
Example:
• Consider the CUSTOMERS table having the following records:

• Following is an example, which would update ADDRESS for a customer whose ID is 6:

• UPDATE CUSTOMERS
• SET ADDRESS = 'Pune'
• WHERE ID = 6;
Example 2

• If you want to modify all ADDRESS and SALARY column values in CUSTOMERS
table, you do not need to use WHERE clause and UPDATE query would be as
follows:
• UPDATE CUSTOMERS

• SET ADDRESS = 'Pune', SALARY = 1000.00;


SQL DELETE Query

• The SQL DELETE Query is used to delete the existing records from a table.

• You can use WHERE clause with DELETE query to delete selected rows,
otherwise all the records would be deleted.

• Syntax:

• The basic syntax of DELETE query with WHERE clause is as follows:


• DELETE FROM table_name
• WHERE [condition];
Example:
• Consider the CUSTOMERS table having the following records:

• Following is an example, which would DELETE a customer, whose ID is 6:

• DELETE FROM CUSTOMERS

• WHERE ID = 6;
SQL LIKE Clause
• The SQL LIKE clause is used to compare a value to similar values using wildcard operators.

• There are two wildcards used in conjunction with the LIKE operator:

• The percent sign (%)

• The underscore (_)


• The percent sign represents zero, one, or multiple characters. The underscore represents a
single number or character.

• The symbols can be used in combinations.

• Syntax:

• The basic syntax of % and _ is as follows:


Examples
Example:
• Let us take a real example, consider the CUSTOMERS table having the
following records:

• Following is an example, which would display all the records from


CUSTOMERS table where SALARY starts with 200:
• SELECT * FROM CUSTOMERS

• WHERE SALARY LIKE '200%';


SQL Wildcards
• In SQL, wildcard characters are used with the SQL LIKE operator.

• SQL wildcards are used to search for data within a table.

• With SQL, the wildcards are:


Using the SQL % Wildcard

• The following SQL statement selects all customers with a City starting with "ber":

• SELECT * FROM Customers

WHERE City LIKE 'ber%';

• The following SQL statement selects all customers with a City containing the

pattern "es":

• SELECT * FROM Customers

WHERE City LIKE '%es%';


Using the SQL _ Wildcard

• The following SQL statement selects all customers with a City starting with any character,

followed by "erlin":

• SELECT * FROM Customers

WHERE City LIKE '_erlin';

• The following SQL statement selects all customers with a City starting with "L", followed by

any character, followed by "n", followed by any character, followed by "on":

• SELECT * FROM Customers

WHERE City LIKE 'L_n_on';


Using the SQL [charlist] Wildcard
• The following SQL statement selects all customers with a City starting with "b", "s", or "p":

• SELECT * FROM Customers


WHERE City LIKE '[bsp]%';

• The following SQL statement selects all customers with a City starting with "a", "b", or "c":

• SELECT * FROM Customers


WHERE City LIKE '[a-c]%';

• The following SQL statement selects all customers with a City NOT starting with "b", "s", or "p":

• SELECT * FROM Customers


WHERE City LIKE '[!bsp]%';
or
SELECT * FROM Customers
WHERE City NOT LIKE '[bsp]%';
SQL TOP Clause
• The SQL TOP clause is used to fetch a TOP N number or X percent records from a table.

• Note: All the databases do not support TOP clause.

• For example MySQL supports LIMIT clause to fetch limited number of records and
Oracle uses ROWNUM to fetch limited number of records.

• Syntax:

• The basic syntax of TOP clause with SELECT statement would be as follows:

• MS SQL Server and MS Acess:


• SELECT TOP number|percent column_name(s)
• FROM table_name
• WHERE [condition]
SQL SELECT TOP Equivalent in MySQL and Oracle

• MySQL Syntax • Oracle Syntax

• SELECT column_name(s) • SELECT column_name(s)

FROM table_name FROM table_name


LIMIT number; WHERE ROWNUM

• Example <= number;

• SELECT * • Example

FROM Persons • SELECT *

LIMIT 5; FROM Persons

WHERE ROWNUM <=5;


Using TOP MS SQL/ACESS

• SELECT TOP 50 • SELECT TOP 5 *


PERCENT * FROM FROM Customers;
Customers;
SQL ORDER BY Clause

• The SQL ORDER BY clause is used to sort the data in ascending or descending
order, based on one or more columns. Some database sorts query results in
ascending order by default.

• Syntax:

• The basic syntax of ORDER BY clause is as follows:


• SELECT column-list
• FROM table_name
• [WHERE condition]
• [ORDER BY column1, column2, .. columnN] [ASC | DESC];
Example
• Consider the CUSTOMERS table having the following records:

• Following is an example, which would sort the result in descending order by NAME:

• SELECT * FROM CUSTOMERS

• ORDER BY NAME DESC;


SQL Group By
• The SQL GROUP BY clause is used in collaboration with the SELECT statement to arrange identical

data into groups.

• The GROUP BY clause follows the WHERE clause in a SELECT statement and precedes the ORDER BY

clause.

• Syntax:

• The basic syntax of GROUP BY clause is given below. The GROUP BY clause must follow the conditions

in the WHERE clause and must precede the ORDER BY clause if one is used.

• SELECT column1, column2

• FROM table_name

• WHERE [ conditions ]

• GROUP BY column1, column2

• ORDER BY column1, column2


Example
• Consider the CUSTOMERS table having the following records

• If you want to know the total amount of salary on each customer, then GROUP BY query
would be as follows:
• SELECT NAME, SUM(SALARY) FROM CUSTOMERS

• GROUP BY NAME;
• Now, let us have following table where CUSTOMERS table has the following records
with duplicate names:

• Now again, if you want to know the total amount of salary on each customer, then
GROUP BY query would be as follows:
• SELECT NAME, SUM(SALARY) FROM CUSTOMERS

• GROUP BY NAME;
SQL Distinct Keyword
• The SQL DISTINCT keyword is used in conjunction with SELECT statement to eliminate
all the duplicate records and fetching only unique records.

• There may be a situation when you have multiple duplicate records in a table. While
fetching such records, it makes more sense to fetch only unique records instead of
fetching duplicate records.

• Syntax:

• The basic syntax of DISTINCT keyword to eliminate duplicate records is as follows:


• SELECT DISTINCT column1, column2,.....columnN
• FROM table_name
• WHERE [condition]
Example:
• Consider the CUSTOMERS table having the following records:

• Now, let us use DISTINCT keyword with the above SELECT query and see
the result:
• SELECT DISTINCT SALARY FROM CUSTOMERS

• ORDER BY SALARY;
Example 2
• To fetch the rows with own preferred order, the SELECT query would be as follows:

• SELECT * FROM CUSTOMERS

• ORDER BY

• (CASE ADDRESS

• WHEN 'DELHI' THEN 1

• WHEN 'BHOPAL' THEN 2

• WHEN 'KOTA' THEN 3

• WHEN 'AHMADABAD' THEN 4

• WHEN 'MP' THEN 5

• ELSE 100 END) ASC, ADDRESS DESC;


SQL Constraints
• Constraints are the rules enforced on data columns on table.

• These are used to limit the type of data that can go into a table.

• This ensures the accuracy and reliability of the data in the database.

• Constraints could be column level or table level.

• Column level constraints are applied only to one column where as table level constraints are applied to the whole table.

• Following are commonly used constraints available in SQL:


• NOT NULL Constraint: Ensures that a column cannot have NULL value.
• DEFAULT Constraint: Provides a default value for a column when none is specified.
• UNIQUE Constraint: Ensures that all values in a column are different.
• PRIMARY Key: Uniquely identified each rows/records in a database table.
• FOREIGN Key: Uniquely identified a row/record in any other database table.
• CHECK Constraint: The CHECK constraint ensures that all values in a column satisfy certain conditions.
• INDEX: Use to create and retrieve data from the database very quickly.
NOT NULL Constraint:
• By default, a column can hold NULL values. If you do
not want a column to have a NULL value, then you need
to define such constraint on this column specifying that
NULL is now not allowed for that column.

• A NULL is not the same as no data, rather, it represents


unknown data.

• Example:

• For example, the following SQL creates a new table


called CUSTOMERS and adds five columns, three of
which, ID and NAME and AGE, specify not to accept
NULLs:
Example 2

• If CUSTOMERS table has already been created, then to add a

NOT NULL constraint to SALARY column in Oracle and

MySQL, you would write a statement similar to the following:

• ALTER TABLE CUSTOMERS

• MODIFY SALARY DECIMAL (18, 2) NOT NULL;


DEFAULT Constraint:

• The DEFAULT constraint provides a default value


to a column when the INSERT INTO statement
does not provide a specific value.

• Example:

• For example, the following SQL creates a new


table called CUSTOMERS and adds five columns.

• Here, SALARY column is set to 5000.00 by default,


so in case INSERT INTO statement does not
provide a value for this column, then by default
this column would be set to 5000.00.
UNIQUE Constraint:

• The UNIQUE Constraint prevents two records from


having identical values in a particular column. In the
CUSTOMERS table, for example, you might want to
prevent two or more people from having identical age.

• Example:

• For example, the following SQL creates a new table


called CUSTOMERS and adds five columns. Here, AGE
column is set to UNIQUE, so that you can not have two
records with same age:
• If CUSTOMERS table has already been created, then to add a UNIQUE constraint

to AGE column, you would write a statement similar to the following:

• ALTER TABLE CUSTOMERS

• MODIFY COLUMN AGE INT NOT NULL UNIQUE;

• You can also use the following syntax, which supports naming the constraint in

multiple columns as well:

• ALTER TABLE CUSTOMERS

• ADD CONSTRAINT myUniqueConstraint UNIQUE(AGE, SALARY);


DROP a UNIQUE Constraint:

• To drop a UNIQUE constraint, use the following SQL:

• ALTER TABLE CUSTOMERS

• DROP CONSTRAINT myUniqueConstraint;

• If you are using MySQL, then you can use the following syntax:

• ALTER TABLE CUSTOMERS

• DROP INDEX myUniqueConstraint;


PRIMARY Key:

• A primary key is a field in a table which uniquely identifies each row/record in a


database table.

• Primary keys must contain unique values.

• A primary key column cannot have NULL values.

• A table can have only one primary key, which may consist of single or multiple fields.

• When multiple fields are used as a primary key, they are called a composite key.

• If a table has a primary key defined on any field(s), then you can not have two
records having the same value of that field(s).

• Note: You would use these concepts while creating database tables.
Create Primary Key:
• Here is the syntax to define ID attribute as a primary key in a CUSTOMERS table.
• CREATE TABLE CUSTOMERS(
• ID INT NOT NULL,
• NAME VARCHAR (20) NOT NULL,
• AGE INT NOT NULL,
• ADDRESS CHAR (25) ,
• SALARY DECIMAL (18, 2),
• PRIMARY KEY (ID)
• );

• To create a PRIMARY KEY constraint on the "ID" column when CUSTOMERS table already
exists, use the following SQL syntax:
• ALTER TABLE CUSTOMER ADD PRIMARY KEY (ID);
FOREIGN Key:

• A foreign key is a key used to link two tables together. This is sometimes called a

referencing key.

• Primary key field from one table and insert it into the other table where it becomes a

foreign key i.e., Foreign Key is a column or a combination of columns, whose values

match a Primary Key in a different table.

• The relationship between 2 tables matches the Primary Key in one of the tables

with a Foreign Key in the second table.

• If a table has a primary key defined on any field(s), then you can not have two records

having the same value of that field(s).


Example:
• CUSTOMERS table:
• CREATE TABLE CUSTOMERS(
• ID INT NOT NULL,
• NAME VARCHAR (20) NOT NULL,
• AGE INT NOT NULL,
• ADDRESS CHAR (25) ,
• SALARY DECIMAL (18, 2),
• PRIMARY KEY (ID)
• );

• ORDERS table:
• CREATE TABLE ORDERS (
• ID INT NOT NULL,
• DATE DATETIME,
• CUSTOMER_ID INT references CUSTOMERS(ID),
• AMOUNT double,
• PRIMARY KEY (ID)
• );

• If ORDERS table has already been created, and the foreign key has not yet been, use the syntax for specifying a
foreign key by altering a table.
• ALTER TABLE ORDERS
• ADD FOREIGN KEY (Customer_ID) REFERENCES CUSTOMERS (ID);
DROP a FOREIGN KEY Constraint:

• To drop a FOREIGN KEY constraint, use the following SQL:

• ALTER TABLE ORDERS

• DROP FOREIGN KEY;


CHECK Constraint:
• The CHECK Constraint enables a condition to check the value being entered into a record. If the condition evaluates to
false, the record violates the constraint and isn’t entered into the table.

• Example:

• For example, the following SQL creates a new table called CUSTOMERS and adds five columns. Here, we add a CHECK
with AGE column, so that you can not have any CUSTOMER below 18 years:

• CREATE TABLE CUSTOMERS(

• ID INT NOT NULL,

• NAME VARCHAR (20) NOT NULL,

• AGE INT NOT NULL CHECK (AGE >= 18),

• ADDRESS CHAR (25) ,

• SALARY DECIMAL (18, 2),

• PRIMARY KEY (ID)

• );
INDEX:
• The INDEX is used to create and retrieve data from the database very quickly.

• Index can be created by using single or group of columns in a table.

• When index is created, it is assigned a ROWID for each row before it sorts out the data.

• Proper indexes are good for performance in large databases, but you need to be careful while creating index.

• Selection of fields depends on what you are using in your SQL queries.

• You can create index on single or multiple columns using the followwng syntax:

• CREATE INDEX index_name

• ON table_name ( column1, column2.....);

• To create an INDEX on AGE column, to optimize the search on customers for a particular age, following is the SQL

syntax:

• CREATE INDEX idx_age

• ON CUSTOMERS ( AGE );
Dropping Constraints:
• Any constraint that you have defined can be dropped using the ALTER TABLE command with the DROP

CONSTRAINT option.

• For example, to drop the primary key constraint in the EMPLOYEES table, you can use the following

command:

• ALTER TABLE EMPLOYEES DROP CONSTRAINT EMPLOYEES_PK;

• Some implementations may provide shortcuts for dropping certain constraints. For example, to drop the

primary key constraint for a table in Oracle, you can use the following command:

• ALTER TABLE EMPLOYEES DROP PRIMARY KEY;

• Some implementations allow you to disable constraints. Instead of permanently dropping a constraint

from the database, you may want to temporarily disable the constraint, and then enable it later.
SQL Joins

• The SQL Joins clause is used to combine records from two or more tables in a database.

• A JOIN is a means for combining fields from two tables by using values common to each.

• Example let us join these two tables in our SELECT statement as follows:

• SELECT ID, NAME, AGE, AMOUNT

• FROM CUSTOMERS, ORDERS

• WHERE CUSTOMERS.ID = ORDERS.CUSTOMER_ID;


INNER JOIN
• The most frequently used and important of the joins is the INNER JOIN. They are also
referred to as an EQUIJOIN.
• The INNER JOIN creates a new result table by combining column values of two tables (table1
and table2) based upon the join-predicate.
• The query compares each row of table1 with each row of table2 to find all pairs of rows
which satisfy the join-predicate.
• When the join-predicate is satisfied, column values for each matched pair of rows of A and B
are combined into a result row.
• Syntax:
• The basic syntax of INNER JOIN is as follows:
• SELECT table1.column1, table2.column2...
• FROM table1
• INNER JOIN table2
• ON table1.common_filed = table2.common_field;
Example:
• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• INNER JOIN ORDERS

• ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
LEFT JOIN
• The SQL LEFT JOIN returns all rows from the left table, even if there are no matches in the right
table.

• This means that if the ON clause matches 0 (zero) records in right table, the join will still return a row
in the result, but with NULL in each column from right table.

• This means that a left join returns all the values from the left table, plus matched values from the
right table or NULL in case of no matching join predicate.

• Syntax:

• The basic syntax of LEFT JOIN is as follows:


• SELECT table1.column1, table2.column2...
• FROM table1
• LEFT JOIN table2
• ON table1.common_filed = table2.common_field;
Example:

• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• LEFT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
RIGHT JOIN
• The SQL RIGHT JOIN returns all rows from the right table, even if there are no matches in the left table.

• This means that if the ON clause matches 0 (zero) records in left table, the join will still return a row in
the result, but with NULL in each column from left table.

• This means that a right join returns all the values from the right table, plus matched values from the left
table or NULL in case of no matching join predicate.

• Syntax:

• The basic syntax of RIGHT JOIN is as follows:

• SELECT table1.column1, table2.column2...

• FROM table1

• RIGHT JOIN table2

• ON table1.common_filed = table2.common_field;
Example:
• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• RIGHT JOIN ORDERS

• ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
FULL JOIN
• The SQL FULL JOIN combines the results of both left and right outer joins.

• The joined table will contain all records from both tables, and fill in NULLs
for missing matches on either side.

• Syntax:

• The basic syntax of FULL JOIN is as follows:


• SELECT table1.column1, table2.column2...
• FROM table1
• FULL JOIN table2
• ON table1.common_filed = table2.common_field;
Example:
• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• FULL JOIN ORDERS

• ON CUSTOMERS.ID =
ORDERS.CUSTOMER_ID;
SELF JOIN
• The SQL SELF JOIN is used to join a table to itself as if the table were two tables,
temporarily renaming at least one table in the SQL statement.

• Syntax:

• The basic syntax of SELF JOIN is as follows:

• SELECT a.column_name, b.column_name...

• FROM table1 a, table1 b

• WHERE a.common_filed = b.common_field;


Example

• SELECT a.ID, b.NAME, a.SALARY

• FROM CUSTOMERS a, CUSTOMERS b

• WHERE a.SALARY < b.SALARY;


CARTESIAN JOIN

• The CARTESIAN JOIN or CROSS JOIN returns the cartesian product of


the sets of records from the two or more joined tables.

• Thus, it equates to an inner join where the join-condition always evaluates


to True or where the join-condition is absent from the statement.

• Syntax:

• The basic syntax of INNER JOIN is as follows:


• SELECT table1.column1, table2.column2...
• FROM table1, table2 [, table3 ]
Example:

• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS, ORDERS;


SQL Unions Clause
• The SQL UNION clause/operator is used to combine the results of two or more SELECT statements without
returning any duplicate rows.

• To use UNION, each SELECT must have the same number of columns selected, the same number of column
expressions, the same data type, and have them in the same order, but they do not have to be the same
length.

• Syntax:

• The basic syntax of UNION is as follows:


• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
• UNION
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
Example:
• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• LEFT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID

• UNION

• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• RIGHT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
The UNION ALL Clause:
• The UNION ALL operator is used to combine the results of two SELECT statements including duplicate rows.

• The same rules that apply to UNION apply to the UNION ALL operator.

• Syntax:

• The basic syntax of UNION ALL is as follows:

• SELECT column1 [, column2 ]

• FROM table1 [, table2 ]

• [WHERE condition]

• UNION ALL

• SELECT column1 [, column2 ]

• FROM table1 [, table2 ]

• [WHERE condition]
Example:
• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• LEFT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID

• UNION ALL

• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• RIGHT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
INTERSECT Clause
• The SQL INTERSECT clause/operator is used to combine two SELECT statements, but returns rows only
from the first SELECT statement that are identical to a row in the second SELECT statement. This
means INTERSECT returns only common rows returned by the two SELECT statements.
• Just as with the UNION operator, the same rules apply when using the INTERSECT operator. MySQL
does not support INTERSECT operator
• Syntax:
• The basic syntax of INTERSECT is as follows:
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
• INTERSECT
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
Example:
• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• LEFT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID

• INTERSECT

• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• RIGHT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
EXCEPT Clause
• The SQL EXCEPT clause/operator is used to combine two SELECT statements and returns rows
from the first SELECT statement that are not returned by the second SELECT statement.
• This means EXCEPT returns only rows, which are not available in second SELECT statement.
• Just as with the UNION operator, the same rules apply when using the EXCEPT operator. MySQL
does not support EXCEPT operator.
• Syntax:
• The basic syntax of EXCEPT is as follows:
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
• EXCEPT
• SELECT column1 [, column2 ]
• FROM table1 [, table2 ]
• [WHERE condition]
Example:
• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• LEFT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID

• EXCEPT

• SELECT ID, NAME, AMOUNT, DATE

• FROM CUSTOMERS

• RIGHT JOIN ORDERS

• ON CUSTOMERS.ID = ORDERS.CUSTOMER_ID;
SQL Alias Syntax
• You can rename a table or a column temporarily by giving another name known as alias.
• The use of table aliases means to rename a table in a particular SQL statement.
• The renaming is a temporary change and the actual table name does not change in the database.
• The column aliases are used to rename a table's columns for the purpose of a particular SQL
query.
• Syntax:
• The basic syntax of table alias is as follows:
• SELECT column1, column2....
• FROM table_name AS alias_name
• WHERE [condition];
• The basic syntax of column alias is as follows:
• SELECT column_name AS alias_name
• FROM table_name
• WHERE [condition];
Example:
• SELECT C.ID, C.NAME, C.AGE, O.AMOUNT

• FROM CUSTOMERS AS C, ORDERS AS O

• WHERE C.ID = O.CUSTOMER_ID;

• SELECT ID AS CUSTOMER_ID, NAME AS CUSTOMER_NAME

• FROM CUSTOMERS

• WHERE SALARY IS NOT NULL;


SQL -Using Views
• A view is nothing more than a SQL statement that is stored in the database with an associated name.

• A view is actually a composition of a table in the form of a predefined SQL query.

• A view can contain all rows of a table or select rows from a table.

• A view can be created from one or many tables which depends on the written SQL query to create a
view.

• Views, which are kind of virtual tables, allow users to do the following:

• Structure data in a way that users or classes of users find natural or intuitive.

• Restrict access to the data such that a user can see and (sometimes) modify exactly what they
need and no more.

• Summarize data from various tables which can be used to generate reports.
Creating Views:
• Database views are created using the CREATE VIEW statement. Views can be created from
a single table, multiple tables, or another view.

• To create a view, a user must have the appropriate system privilege according to the specific
implementation.

• The basic CREATE VIEW syntax is as follows:


• CREATE VIEW view_name AS
• SELECT column1, column2.....
• FROM table_name
• WHERE [condition];

• You can include multiple tables in your SELECT statement in very similar way as you use
them in normal SQL SELECT query.
Example:
• CREATE VIEW CUSTOMERS_VIEW AS

• SELECT name, age

• FROM CUSTOMERS;

• UPDATE CUSTOMERS_VIEW

• SET AGE = 35

• WHERE name='Ramesh';

• DELETE FROM CUSTOMERS_VIEW

• WHERE age = 22;

• DROP VIEW CUSTOMERS_VIEW;


SQL HAVING CLAUSE
• The HAVING clause enables you to specify conditions that filter which group
results appear in the final results.

• The WHERE clause places conditions on the selected columns, whereas the
HAVING clause places conditions on groups created by the GROUP BY clause.

• Syntax:

• The following is the position of the HAVING clause in a query:


• SELECT
• FROM
• WHERE
• GROUP BY
• HAVING
• ORDER BY
• The HAVING clause must follow the GROUP BY clause in a
query and must also precede the ORDER BY clause if used.

• The following is the syntax of the SELECT statement, including


the HAVING clause:

• SELECT column1, column2

• FROM table1, table2

• WHERE [ conditions ]

• GROUP BY column1, column2

• HAVING [ conditions ]

• ORDER BY column1, column2


Example:
• SELECT *

• FROM CUSTOMERS

• GROUP BY age

• HAVING COUNT(age) >= 2;


END

You might also like