0_DBMS short text
0_DBMS short text
Subject Code:
MANAGEMENT 3132
SYSTEMS
Computer
Engineering
Semester 3
Lecture Notes
submitted by:
Lubna K.
Lecturer CT
2
MODULE I
3
Field - A field consists of a grouping of characters (single alphabetic, numeric, or other symbol).
A data field represents an attribute (a characteristic or quality) of some entity (object, person,
place, or event).
Record - Related fields of data are grouped to form a record. Thus, a record represents a
collection of attributes that describe an entity. Fixed-length records contain a fixed number
of fixed-length data fields. Variable-length records contain a variable number of fields and field
lengths.
File - A group of related records is known as a data file, or table. Files are frequently classified
by the application for which they are primarily used, such as a payroll file or an inventory file,
or the type of data they contain, such as a document file or a graphical image file. Files are also
classified by their permanence, for example, a master file versus a transaction file. A
transaction file would contain records of all transactions occurring during a period, whereas a
master file contains all the permanent records.
Database -
It is a collection of interrelated data.
Database can be software based or hardware based, with one sole purpose, storing data.
These can be stored in the form of tables.
A database can be of any size and varying complexity.
A database may be generated and manipulated manually or it may be computerized.
Example: Customer database consists the fields as cname, cno, and ccity
ADVANTAGES OF DBMS
Segregation of applicaion program.
Minimal data duplicacy or data redundancy.
Easy retrieval of data using the Query Language.
Reduced development time and maintainance need.
With Cloud Datacenters, we now have Database Management Systems capable of storing
almost infinite data.
Seamless integration into the application programming languages which makes it very easier
to add a database to almost any application or website.
DISADVANTAGES OF DBMS
It's Complexity
Except MySQL, which is open source, licensed DBMSs are generally costly.
They are large in size.
APPLICATIONS OF DBMS
COMPONENTS OF DBMS
The database management system can be divided into
five major components, they are:
1. Hardware
2. Software
3. Data
4. Procedures
5. Database Access Language
Let's have a simple diagram to see how they all fit
together to form a database management system.
Users
A typical DBMS has users with different rights and
permissions who use it for different purposes. Some users
retrieve data and some back it up. The users of a DBMS can
be broadly categorized as follows −
Administrators − Database Administrators (DBA)
maintain the DBMS and are responsible for
administrating the database. They are responsible to
look after its usage and by whom it should be used.
They create access profiles for users and apply
limitations to maintain isolation and force security.
Administrators also look after DBMS resources like
system license, required tools, and other software and hardware related maintenance.
Designers (Application Programmer or Software Developer)− Designers are the group of
people who actually work on the designing part of the database. They keep a close watch on
what data should be kept and in what format. They identify and design the whole set of
entities, relations, constraints, and views.
End Users − End users are those who actually reap the benefits of having a DBMS. End
users can range from simple viewers who pay attention to the logs or market rates to
sophisticated users such as business analysts.
3-tier Architecture
A 3-tier architecture separates its tiers from each other based on
the complexity of the users and how they use the data present in
the database. It is the most widely used architecture to design a
DBMS.
Database (Data) Tier − At this tier, the database resides
along with its query processing languages. We also have
the relations that define the data and their constraints at
this level.
Application (Middle) Tier − At this tier reside the
application server and the programs that access the
database. For a user, this application tier presents an
abstracted view of the database. End-users are unaware of
any existence of the database beyond the application. At
the other end, the database tier is not aware of any other user beyond the application tier.
Hence, the application layer sits in the middle and acts as a mediator between the end-user
and the database.
User (Presentation) Tier − End-users operate on this tier and they know nothing about any
existence of the database beyond this layer. At this layer, multiple views of the database can
be provided by the application. All views are generated by applications that reside in the
application tier.
Multiple-tier database architecture is highly modifiable, as almost all its components are
independent and can be changed independently.
DATA MODELS
Data models define how the logical structure of a database is modelled.
Data Models are fundamental entities to introduce abstraction in a DBMS.
Data models define how data is connected to each other and how they are processed and
stored inside the system.
The very first data model could be flat data-models, where all the data used are to be kept in
the same plane. Earlier data models were not so scientific, hence they were prone to
introduce lots of duplication and update anomalies.
While the Relational Model is the most widely used database model, there are other models
too:
Hierarchical Model
Network Model
Entity-relationship Model
Relational Model
Hierarchical Model
This database model organises data into a tree-like-structure, with a single root, to which
all the other data is linked. The hierarchy starts from the Root data, and expands like a
tree, adding child nodes to the parent nodes.
In this model, a child node will only have a single parent node.
This model efficiently describes many real-world relationships like index of a book,
recipes etc.
8
In hierarchical model, data is organised into tree-like structure with one one-to-many
relationship between two different types of data, for example, one department can have
many courses, many professors and of-course many students.
Network Model
This is an extension of the Hierarchical model. In
this model data is organised more like a graph, and
are allowed to have more than one parent node.
In this database model data is more related as more
relationships are established in this database model.
Also, as the data is more related, hence accessing
the data is also easier and fast. This database model
was used to map many-to-many data relationships.
This was the most widely used database model,
before Relational Model was introduced.
Entity-relationship Model
Entity-Relationship (ER) Model is based on the notion of real-world entities and relationships
among them. While formulating real-world scenario into the database model, the ER Model creates
entity set, relationship set, general attributes and constraints.
BVER Model is best used for the conceptual design of a database.
ER Model is based on −
Entities and their attributes.
Relationships among entities.
These concepts are explained below.
Entity − An entity in an ER
Model is a real-world entity having properties called attributes. Every attribute is defined
by its set of values called domain. For example, in a school database, a student is considered
as an entity. Student has various attributes like name, age, class, etc.
Relationship − The logical association among entities is called relationship. Relationships
are mapped with entities in various ways. Mapping cardinalities define the number of
association between two entities.
Mapping cardinalities −
one to one
one to many
many to one
many to many
Let's take an example, If we have to design a School
Database, then Student will be an entity with
attributes name, age, address etc. As Address is
generally complex, it can be another entity with
attributes street name, pincode, city etc, and there
will be a relationship between them.
9
Relational Model
In this model, data is organised in two-dimensional tables and the relationship is maintained
by storing a common field.
This model was introduced by E.F Codd in 1970, and since then it has been the most widely
used database model, infact, we can say the only database model used around the world.
The basic structure of data in the relational model is tables. All the information related to a
particular type is stored in rows of that table.
Hence, tables are also known as relations in relational model.
The main highlights of this model are :
- Data is stored in tables called relations.
- Relations can be normalized.
- In normalized relations, values saved are atomic values.
- Each row in a relation contains a unique value.
- Each column in a relation contains values from a same domain.
DATABASE SCHEMA
A database schema is the skeleton structure that represents the logical view of the entire
database.
It defines how the data is organized and how the relations among them are associated.
It formulates all the constraints that are to be applied on the data.
A database schema defines its entities and the relationship among them.
It contains a descriptive detail of the database, which can be depicted by means of schema
diagrams.
It’s the database designers who design the schema to help programmers understand the
database and make it useful.
10
DATABASE INSTANCE
Database schema is the skeleton of database. It is designed when the database doesn't exist
at all. Once the database is operational, it is very difficult to make any changes to it. A
database schema does not contain any data or information.
A database instance is a state of operational database with data at any given time. It contains
a snapshot of the database. Database instances tend to change with time.
A DBMS ensures that its every instance (state) is in a valid state, by diligently following all
the validations, constraints, and conditions that the database designers have imposed.
DATA INDEPENDENCE
A database system normally contains a lot of data in
addition to users’ data.
For example, it stores data about data, known as
metadata, to locate and retrieve data easily. It is
rather difficult to modify or update a set of metadata
once it is stored in the database.
But as a DBMS expands, it needs to change over time
to satisfy the requirements of the users. If the entire
data is dependent, it would become a tedious and
highly complex job.
11
Metadata itself follows a layered architecture, so that when we change data at one layer, it
does not affect the data at another level. This data is independent but mapped to each other.
MODULE II
14
RELATIONAL DBMS
A Relational Database management System(RDBMS) is a database management system
based on the relational model introduced by E.F Codd.
In relational model, data is stored in relations(tables) and is represented in form of
tuples(rows).
RDBMS is used to manage Relational database.
Relational database is a collection of organized set of tables related to each other, and from
which data can be accessed easily.
BASIC CONCEPTS
Table:
A table is a collection of data elements organised in terms of rows and columns.
A table is also considered as a convenient representation of relations.
But a table can have duplicate row of data while a true relation cannot have duplicate data.
Table is the most simplest form of data storage.
Below is an example of an Employee table.
ID Name Age Salary
1 Adam 34 13000
2 Alex 28 15000
3 Stuart 20 18000
4 Ross 42 19020
Tuple:
A single entry in a table is called a Tuple or Record or Row.
A tuple in a table represents a set of related data. For example, the above Employee table
has 4 tuples/records/rows.
Following is an example of single record or tuple.
1 Adam 34 13000
Attribute:
A table consists of several records(row), each record can be broken down into several
smaller parts of data known as Attributes.
In the above Employee table consist of four attributes, ID, Name, Age and Salary.
Attribute Domain
When an attribute is defined in a relation(table), it is defined to hold only a certain type of
values, which is known as Attribute Domain.
Hence, the attribute Name will hold the name of employee for every tuple. If we save
employee's address there, it will be violation of the Relational database model.
Name
Adam
15
Alex
Stuart - 9/401, OC Street, Amsterdam
Ross
Relation Schema:
A relation schema describes the structure of the relation, with the name of the relation(name
of table), its attributes and their names and type.
Relation instance:
A finite set of tuples in the relational database system represents relation instance. Relation
instances do not have duplicate tuples.
Relation Key:
A relation key is an attribute which can uniquely identify a particular tuple(row) in a
relation(table).
Key Constraints
We store data in tables, to later access it whenever required. In every table one or more than one
attributes together are used to fetch data from tables. The Key Constraint specifies that there
should be such an attribute(column) in a relation(table), which can be used to fetch data for any
tuple(row).
The Key attribute should never be NULL or same for two different row of data.
For example, in the Employee table we can use the attribute ID to fetch data for each of the
employee. No value of ID is null and it is unique for every row, hence it can be our Key attribute.
Domain Constraint
Domain constraints refer to the rules defined for the values that can be stored for a certain attribute.
Like we explained above, we cannot store Address of employee in the column for Name.
Similarly, a mobile number cannot exceed 10 digits.
Database Keys
Keys are very important part of Relational database model. They are used to establish and
identify relationships between tables and also to uniquely identify any record or row of data
inside a table.
A Key can be a single attribute or a group of attributes, where the combination may act as a
key.
Super Key
Super Key is defined as a set of attributes within a table that can uniquely identify each
record within a table. Super Key is a superset of Candidate key.
In the table defined above super key would include student_id, (student_id, name), phone
etc.
The first one is pretty simple as student_id is unique for every row of data, hence it can be
used to identity each row uniquely.
Next comes, (student_id, name), now name of two students can be same, but their student_id
can't be same hence this combination can also be a key.
Similarly, phone number for every student will be unique, hence again, phone can also be a
key. So they all are super keys.
Candidate Key
Candidate keys are defined as the minimal set of fields which can uniquely identify each
record in a table.
It is an attribute or a set of attributes that can act as a Primary Key for a table to uniquely
identify each record in that table.
17
In our example, student_id and phone both are candidate keys for table Student.
A candiate key can never be NULL or empty. And its value should be unique.
There can be more than one candidate keys for a table.
A candidate key can be a combination of more than one columns(attributes).
Primary Key
Primary key is a candidate key that is most
appropriate to become the main key for any
table.
It is a key that can uniquely identify each
record in a table.
For the table Student we can make the
student_id column as the primary key.
Composite Key
Key that consists of two or more
attributes that uniquely identify any
record in a table is called Composite key.
But the attributes which together form the
Composite key are not a key
independentely or individually.
In the above picture we have a Score
table which stores the marks scored by a
student in a particular subject.
In this table student_id and subject_id
together will form the primary key, hence it is a composite key.
Non-key Attributes
Non-key attributes are the attributes or fields of a table, other than candidate key
attributes/fields in a table.
Non-prime Attributes
Non-prime Attributes are attributes other than Primary Key attribute(s).
ER Model: Attributes
If a Student is an Entity, then student's roll no., student's name, student's age, student's
gender etc will be its attributes.
An attribute can be of many types, here are different types of attributes defined in ER
database model:
1. Simple attribute: The attributes with values that are atomic and cannot be broken down
further are simple attributes. For example, student's age.
2. Composite attribute: A composite attribute is made up of more than one simple attribute.
For example, student's address will contain, house no., street name, pincode etc.
3. Derived attribute: These are the attributes which are not present in the whole database
management system, but are derived using other attributes. For example, average age of
students in a class.
4. Single-valued attribute: As the name suggests, they have a single value.
5. Multi-valued attribute: And, they can have multiple values.
ER Model: Keys
If the attribute roll no. can uniquely identify a student entity, amongst all the students, then the
attribute roll no. will be said to be a key.
Following are the types of Keys:
1. Super Key
2. Candidate Key
3. Primary Key
ER Model: Relationships
When an Entity is related to another Entity, they are said to have a relationship. For
example, A Class Entity is related to Student entity, because students study in classes, hence
this is a relationship.
Depending upon the number of entities involved, a degree is assigned to relationships.
For example, if 2 entities are involved, it is said to be Binary relationship, if 3 entities are
involved, it is said to be Ternary relationship, and so on.
Components of ER Diagram
Entitiy, Attributes, Relationships etc form the components of ER Diagram and there are
defined symbols and shapes to represent each one of them.
Entity
Simple rectangular box represents an Entity.
An Entity can be any object, place, person, or class.
In ER Diagram, an entity is represented using
rectangles.
Consider an example of an Organisation-
Employee, Manager, Department, Product and
many more can be taken as entities in an
Organisation.
The yellow rhombus in between represents a
relationship.
Weak Entity
A weak Entity is represented using double rectangular boxes. It is generally connected to
another entity.
Weak entity is an entity that depends on another
entity.
Weak entity doesn't have any key attribute of its own.
Double rectangle is used to represent a weak entity.
20
ER Diagram: Relationship
A Relationship describes relation between entities.
Relationship is represented using diamonds or rhombus.
There are three types of relationship that exist between Entities.
21
1. Binary Relationship
2. Recursive Relationship
3. Ternary Relationship
The above example describes that one student can enroll only for one course and a course
will also have only one Student.
The above diagram represents that one student can enroll for more than one courses. And a
course can have more than 1 student enrolled in it.
Generalization
Generalization is a bottom-up approach in which
two lower level entities combine to form a higher
level entity.
In generalization, the higher level entity can also
combine with other lower level entities to make
further higher level entity.
It's more like Superclass and Subclass system, but
the only difference is the approach, which is bottom-
up.
Hence, entities are combined to form a more
generalised entity, in other words, sub-classes are
combined to form a super-class.
For example, Saving and Current account types
entities can be generalised and an entity with name
Account can be created, which covers both.
Specialization
Specialization is opposite to Generalization.
It is a top-down approach in which one higher level
entity can be broken down into two lower level entity.
in specialization, a higher level entity may not have
23
Aggregation
Aggregation is a process when relation between two
entities is treated as a single entity.
In the diagram, the relationship between Center and
Course together, is acting as an Entity, which is in
relationship with another entity Visitor.
Now in real world, if a Visitor or a Student visits a
Coaching Center, he/she will never enquire about the
center only or just about the course, rather he/she will ask
enquire about both.
Relational Algebra
Every database management system must define a query
language to allow users to access the data stored in the database.
In relational algebra, input is a relation(table from which data has to be accessed) and output
is also a relation(a temporary table holding the data asked for by the user).
It uses operators to perform queries.
An operator can be either unary or binary. They accept relations as their input and yield
relations as their output.
Relational algebra is performed recursively on a relation and intermediate results are also
considered relations.
The fundamental operations of relational algebra are as follows −
Select
Project
Union
Set different
Cartesian product
Rename
24
σ - selection predicate
r - relation.
P - prepositional logic formula which may use connectors like and, or, and not. These
terms may use relational operators like − =, ≠, ≥, < , >, ≤.
For example − 1) σsubject = "database"(Books)
Output − Selects tuples from books where subject is 'database'.
2) σsubject = "database" and price = "450"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450.
3) σsubject = "database" and price = "450" or year > "2010"(Books)
Output − Selects tuples from books where subject is 'database' and 'price' is 450 or those books
published after 2010.
Selects and projects columns named as subject and author from the relation Books.
Notation − r U s
Where r and s are either database relations or relation result set (temporary relation).
For a union operation to be valid, the following conditions must hold −
r, and s must have the same number of attributes.
Attribute domains must be compatible.
Duplicate tuples are automatically eliminated.
∏ author (Books) ∪ ∏ author (Articles)
25
Output − Projects the names of the authors who have either written a book or an article or both.
Output − Provides the name of authors who have written books but not articles.
Output − Yields a relation, which shows all the books and articles written by tutorialspoint.
Note:
Similarly we can generate relational database schema using the ER diagram. Following are some
key points to keep in mind while doing so:
1. Entity gets converted into Table, with all the attributes becoming fields(columns) in the
table.
2. Relationship between entities is also converted into table with primary keys of the related
entities also stored in it as foreign keys.
3. Primary Keys should be properly set.
4. For any relationship of Weak Entity, if primary key of any other entity is included in a table,
foriegn key constraint must be defined.
27
MODULE III
28
Introduction to SQL
SQL is a standard language for accessing and manipulating databases.
What is SQL?
SQL stands for Structured Query Language
SQL lets you access and manipulate databases
SQL became a standard of the American National Standards Institute (ANSI) in 1986, and of
the International Organization for Standardization (ISO) in 1987
RDBMS
RDBMS stands for Relational Database Management System.
RDBMS is the basis for SQL, and for all modern database systems such as MS SQL Server,
IBM DB2, Oracle, MySQL, and Microsoft Access.
The data in RDBMS is stored in database objects called tables. A table is a collection of
related data entries and it consists of columns and rows.
Look at the "Customers" table:
Example : SELECT * FROM Customers;
Every table is broken up into smaller entities called fields. The fields in the Customers table
consist of CustomerID, CustomerName, ContactName, Address, City, PostalCode and
Country.
A field is a column in a table that is designed to maintain specific information about every
record in the table.
A record, also called a row, is each individual entry that exists in a table. For example, there
are 91 records in the above Customers table. A record is a horizontal entity in a table.
29
A column is a vertical entity in a table that contains all information associated with a
specific field in a table.
SQL Syntax
Database Tables
A database most often contains one or more tables. Each table is identified by a name (e.g.
"Customers" or "Orders"). Tables contain records (rows) with data.
Here we will use the well-known Northwind sample database (included in MS Access and
MS SQL Server).
Below is a selection from the "Customers" table:
CustomerID CustomerName ContactName Address City PostalCode Country
Alfreds
1 Maria Anders Obere Str. 57 Berlin 12209 Germany
Futterkiste
Ana Trujillo Avda. de la
México
2 Emparedados y Ana Trujillo Constitución 05021 Mexico
D.F.
helados 2222
Antonio Moreno Antonio Mataderos México
3 05023 Mexico
Taquería Moreno 2312 D.F.
120 Hanover
4 Around the Horn Thomas Hardy London WA1 1DP UK
Sq.
Berglunds Christina Berguvsvägen
5 Luleå S-958 22 Sweden
snabbköp Berglund 8
The table 3.1 above contains five records (one for each customer) and seven columns (CustomerID,
CustomerName, ContactName, Address, City, PostalCode, and Country). This table will be used in
all examples in this session.
SQL Statements
Most of the actions you need to perform on a database are done with SQL statements.
The following SQL statement selects all the records in the "Customers" table:
Example: SELECT * FROM Customers;
SQL keywords are NOT case sensitive: select is the same as SELECT
SELECT Syntax
SELECT column1, column2, ...
FROM table_name;
Here, column1, column2, ... are the field names of the table you want to select data from. If
you want to select all the fields available in the table, use the following syntax:
SELECT * FROM table_name;
SELECT * Example
The following SQL statement selects all the columns from the "Customers" table:
SELECT Example
The following SQL statement selects all (and duplicate) values from the "Country" column
in the "Customers" table:
31
SELECT COUNT
The following SQL statement lists the number of different (distinct) customer countries:
Example
SELECT COUNT(DISTINCT Country) FROM Customers;
WHERE Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition;
Note: The WHERE clause is not only used in SELECT statement, it is also used in UPDATE,
DELETE statement, etc.!
AND Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition1 AND condition2 AND condition3 ...;
OR Syntax
SELECT column1, column2, ...
FROM table_name
WHERE condition1 OR condition2 OR condition3 ...;
NOT Syntax
SELECT column1, column2, ...
FROM table_name
WHERE NOT condition;
AND Example
The following SQL statement selects all fields from "Customers" where country is "Germany" AND
city is "Berlin":
OR Example
The following SQL statement selects all fields from "Customers" where city is "Berlin" OR
"München":
NOT Example
The following SQL statement selects all fields from "Customers" where country is NOT
33
"Germany":
ORDER BY Syntax
SELECT column1, column2, ...
FROM table_name
ORDER BY column1, column2, ... ASC|DESC;
ORDER BY Example
The following SQL statement selects all customers from the "Customers" table, sorted by the
"Country" column:
Example
SELECT * FROM Customers
ORDER BY Country;
Example
SELECT * FROM Customers
ORDER BY Country DESC;
34
Example
SELECT * FROM Customers
ORDER BY Country, CustomerName;
Example
INSERT INTO Customers (CustomerName, ContactName, Address, City, PostalCode,
Country)
VALUES ('Cardinal', 'Tom B. Erichsen', 'Skagen 21', 'Stavanger', '4006', 'Norway');
The selection from the "Customers" table will now look like this:
CustomerID CustomerName ContactName Address City PostalCode Country
305 - 14th
White Clover
89 Karl Jablonski Ave. S. Suite Seattle 98128 USA
Markets
3B
Matti
90 Wilman Kala Keskuskatu 45 Helsinki 21240 Finland
Karttunen
91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-012 Poland
Tom B.
92 Cardinal Skagen 21 Stavanger 4006 Norway
Erichsen
The CustomerID column is an auto-increment field and will be generated automatically when a new
record is inserted into the table.
Insert Data Only in Specified Columns
35
Example
INSERT INTO Customers (CustomerName, City, Country)
VALUES ('Cardinal', 'Stavanger', 'Norway');
The selection from the "Customers" table will now look like this:
CustomerID CustomerName ContactName Address City PostalCode Country
305 - 14th
White Clover
89 Karl Jablonski Ave. S. Suite Seattle 98128 USA
Markets
3B
Matti
90 Wilman Kala Keskuskatu 45 Helsinki 21240 Finland
Karttunen
91 Wolski Zbyszek ul. Filtrowa 68 Walla 01-012 Poland
92 Cardinal null null Stavanger null Norway
IS NULL Syntax
SELECT column_names
FROM table_name
WHERE column_name IS NULL;
Assume we have the following "Persons" table:
ID LastName FirstName Address City
1 Doe John 542 W. 27th Street New York
2 Bloggs Joe London
3 Roe Jane New York
4 Smith John 110 Bishopsgate London
Suppose that the "Address" column in the "Persons" table is optional. If a record is inserted with no
value for "Address", the "Address" column will be saved with a NULL value.
36
The following SQL statement uses the IS NULL operator to list all persons that have no
address:
SELECT LastName, FirstName, Address FROM Persons
WHERE Address IS NULL;
The result-set will look like this:
LastName FirstName Address
Bloggs Joe
Roe Jane
UPDATE Syntax
UPDATE table_name
SET column1 = value1, column2 = value2, ...
WHERE condition;
Note: Be careful when updating records in a table! Notice the WHERE clause in the UPDATE
statement. The WHERE clause specifies which record(s) that should be updated. If you omit the
WHERE clause, all records in the table will be updated!
UPDATE Table
The following SQL statement updates the first customer (CustomerID = 1) with a new
contact person and a new city.
Example
UPDATE Customers
SET ContactName = 'Alfred Schmidt', City= 'Frankfurt'
WHERE CustomerID = 1;
The selection from the "Customers" table will now look like this:
CustomerID CustomerName ContactName Address City PostalCode Country
Alfreds Alfred
1 Obere Str. 57 Frankfurt 12209 Germany
Futterkiste Schmidt
Ana Trujillo Avda. de la
México
2 Emparedados y Ana Trujillo Constitución 05021 Mexico
D.F.
helados 2222
Antonio Moreno Antonio Mataderos México
3 05023 Mexico
Taquería Moreno 2312 D.F.
120 Hanover
4 Around the Horn Thomas Hardy London WA1 1DP UK
Sq.
37
Example
UPDATE Customers
SET ContactName='Juan'
WHERE Country='Mexico';
The selection from the "Customers" table will now look like this:
CustomerID CustomerName ContactName Address City PostalCode Country
Alfreds Alfred
1 Obere Str. 57 Frankfurt 12209 Germany
Futterkiste Schmidt
Ana Trujillo Avda. de la
México
2 Emparedados y Juan Constitución 05021 Mexico
D.F.
helados 2222
Antonio Moreno Mataderos México
3 Juan 05023 Mexico
Taquería 2312 D.F.
120 Hanover
4 Around the Horn Thomas Hardy London WA1 1DP UK
Sq.
Berglunds Christina Berguvsvägen
5 Luleå S-958 22 Sweden
snabbköp Berglund 8
Update Warning!
Be careful when updating records. If you omit the WHERE clause, ALL records will be updated!
Example
UPDATE Customers
SET ContactName='Juan';
DELETE Syntax
DELETE FROM table_name
WHERE condition;
Note: Be careful when deleting records in a table! Notice the WHERE clause in the DELETE
statement. The WHERE clause specifies which record(s) that should be deleted. If you omit the
WHERE clause, all records in the table will be deleted!
38
Example
DELETE FROM Customers
WHERE CustomerName='Alfreds Futterkiste';
The "Customers" table will now look like this:
CustomerID CustomerName ContactName Address City PostalCode Country
Ana Trujillo Avda. de la
México
2 Emparedados y Ana Trujillo Constitución 05021 Mexico
D.F.
helados 2222
Antonio Moreno Antonio México
3 Mataderos 2312 05023 Mexico
Taquería Moreno D.F.
120 Hanover
4 Around the Horn Thomas Hardy London WA1 1DP UK
Sq.
Berglunds Christina
5 Berguvsvägen 8 Luleå S-958 22 Sweden
snabbköp Berglund
LIMIT number;
Oracle Syntax:
SELECT column_name(s)
FROM table_name
WHERE ROWNUM <= number;
Example
SELECT TOP 3 * FROM Customers;
The following SQL statement shows the equivalent example using the LIMIT clause:
Example
SELECT * FROM Customers
LIMIT 3;
The following SQL statement shows the equivalent example using ROWNUM:
Example
SELECT * FROM Customers
WHERE ROWNUM <= 3;
Example
SELECT TOP 50 PERCENT * FROM Customers;
MIN() Syntax
SELECT MIN(column_name)
FROM table_name
WHERE condition;
MAX() Syntax
SELECT MAX(column_name)
FROM table_name
WHERE condition;
40
Demo Database
Below is a selection from the "Products" table in the Northwind sample database:
ProductID ProductName SupplierID CategoryID Unit Price
1 Chais 1 1 10 boxes x 20 bags 18
2 Chang 1 1 24 - 12 oz bottles 19
3 Aniseed Syrup 1 2 12 - 550 ml bottles 10
4 Chef Anton's Cajun Seasoning 2 2 48 - 6 oz jars 22
5 Chef Anton's Gumbo Mix 2 2 36 boxes 21.35
MIN() Example
The following SQL statement finds the price of the cheapest product:
Example
SELECT MIN(Price) AS SmallestPrice
FROM Products;
MAX() Example
The following SQL statement finds the price of the most expensive product:
Example
SELECT MAX(Price) AS LargestPrice
FROM Products;
COUNT() Syntax
SELECT COUNT(column_name)
FROM table_name
WHERE condition;
AVG() Syntax
SELECT AVG(column_name)
FROM table_name
WHERE condition;
SUM() Syntax
SELECT SUM(column_name)
FROM table_name
WHERE condition;
41
Demo Database
Below is a selection from the "Products" table in the Northwind sample database:
ProductID ProductName SupplierID CategoryID Unit Price
1 Chais 1 1 10 boxes x 20 bags 18
2 Chang 1 1 24 - 12 oz bottles 19
3 Aniseed Syrup 1 2 12 - 550 ml bottles 10
4 Chef Anton's Cajun Seasoning 2 2 48 - 6 oz jars 22
5 Chef Anton's Gumbo Mix 2 2 36 boxes 21.35
COUNT() Example
The following SQL statement finds the number of products:
Example
SELECT COUNT(ProductID)
FROM Products;
AVG() Example
The following SQL statement finds the average price of all products:
Example
SELECT AVG(Price)
FROM Products;
Demo Database
Below is a selection from the "OrderDetails" table in the Northwind sample database:
OrderDetailID OrderID ProductID Quantity
1 10248 11 12
2 10248 42 10
3 10248 72 5
4 10249 14 9
5 10249 51 40
SUM() Example
The following SQL statement finds the sum of the "Quantity" fields in the "OrderDetails" table:
Example
SELECT SUM(Quantity)
FROM OrderDetails;
GROUP BY Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s);
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
The following SQL statement lists the number of customers in each country, sorted high to low:
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
ORDER BY COUNT(CustomerID) DESC;
HAVING Syntax
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;
The following SQL statement lists the number of customers in each country, sorted high to low
(Only include countries with more than 5 customers):
43
Example
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5
ORDER BY COUNT(CustomerID) DESC;
Syntax
CREATE DATABASE databasename;
Example
CREATE DATABASE testDB;
Syntax
DROP DATABASE databasename;
Note: Be careful before dropping a database. Deleting a database will result in loss of complete
information stored in the database!
Example
DROP DATABASE testDB;
Syntax
CREATE TABLE table_name (
column1 datatype,
column2 datatype,
column3 datatype,
....
);
The column parameters specify the names of the columns of the table.
44
The datatype parameter specifies the type of data the column can hold (e.g. varchar, integer,
date, etc.).
Example
CREATE TABLE Persons (
PersonID int,
LastName varchar(255),
FirstName varchar(255),
Address varchar(255),
City varchar(255)
);
The PersonID column is of type int and will hold an integer.
The LastName, FirstName, Address, and City columns are of type varchar and will hold
characters, and the maximum length for these fields is 255 characters.
The empty "Persons" table will now look like this:
PersonID LastName FirstName Address City
The empty "Persons" table can now be filled with data with the SQL INSERT INTO statement.
Syntax
DROP TABLE table_name;
Note: Be careful before dropping a table. Deleting a table will result in loss of complete
information stored in the table!
Example
DROP TABLE Shippers;
Syntax
TRUNCATE TABLE table_name;
45
SQL JOIN
A JOIN clause is used to combine rows from two or more tables, based on a related column
between them.
Let's look at a selection from the "Orders" table:
OrderID CustomerID OrderDate
10308 2 1996-09-18
10309 37 1996-09-19
10310 77 1996-09-20
46
Example
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID;
and it will produce something like this:
OrderID CustomerName OrderDate
10308 Ana Trujillo Emparedados y helados 9/18/1996
10365 Antonio Moreno Taquería 11/27/1996
10383 Around the Horn 12/16/1996
10355 Around the Horn 11/15/1996
10278 Berglunds snabbköp 8/12/1996
Example
SELECT Orders.OrderID, Customers.CustomerName
FROM Orders
INNER JOIN Customers ON Orders.CustomerID = Customers.CustomerID;
Note: The INNER JOIN keyword selects all rows from both tables as long as there is a match
between the columns. If there are records in the "Orders" table that do not have matches in
"Customers", these orders will not be shown!
Demo Database
In this tutorial we will use the well-known Northwind sample database.
Below is a selection from the "Customers" table:
PostalC
CustomerID CustomerName ContactName Address City Country
ode
1 Alfreds Futterkiste Maria Anders Obere Str. 57 Berlin 12209 Germany
Ana Trujillo Avda. de la
México
2 Emparedados y Ana Trujillo Constitución 05021 Mexico
D.F.
helados 2222
Antonio Moreno Mataderos México
3 Antonio Moreno 05023 Mexico
Taquería 2312 D.F.
And a selection from the "Orders" table:
OrderID CustomerID EmployeeID OrderDate ShipperID
10308 2 7 1996-09-18 3
10309 37 3 1996-09-19 1
10310 77 8 1996-09-20 2
SQL VIEWS
table with a unit price higher than the average unit price:
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName, UnitPrice
FROM Products
WHERE UnitPrice > (SELECT AVG(UnitPrice) FROM Products);
We can query the view above as follows:
SELECT * FROM [Products Above Average Price];
Another view in the Northwind database calculates the total sale for each category in 1997.
Note that this view selects its data from another view called "Product Sales for 1997":
CREATE VIEW [Category Sales For 1997] AS
SELECT DISTINCT CategoryName, Sum(ProductSales) AS CategorySales
FROM [Product Sales for 1997]
GROUP BY CategoryName;
We can query the view above as follows:
SELECT * FROM [Category Sales For 1997];
We can also add a condition to the query. Let's see the total sale only for the category
"Beverages":
SELECT * FROM [Category Sales For 1997]
WHERE CategoryName = 'Beverages';
DBMS- TRANSACTION
A transaction can be defined as a group of tasks. A single task is the minimum processing
50
B’s Account
Open_Account(B)
Old_Balance = B.balance
New_Balance = Old_Balance + 500
B.balance = New_Balance
Close_Account(B)
ACID Properties
A transaction is a very small unit of a program and it may contain several lowlevel tasks. A
transaction in a database system must maintain Atomicity, Consistency, Isolation, and Durability −
commonly known as ACID properties − in order to ensure accuracy, completeness, and data
integrity.
Atomicity − This property states that a transaction must be treated as an atomic unit, that is,
either all of its operations are executed or none. There must be no state in a database where a
transaction is left partially completed. States should be defined either before the execution of
the transaction or after the execution/abortion/failure of the transaction.
Consistency − The database must remain in a consistent state after any transaction. No
transaction should have any adverse effect on the data residing in the database. If the
database was in a consistent state before the execution of a transaction, it must remain
consistent after the execution of the transaction as well.
Durability − The database should be durable enough to hold all its latest updates even if the
system fails or restarts. If a transaction updates a chunk of data in a database and commits,
then the database will hold the modified data. If a transaction commits but the system fails
before the data could be written on to the disk, then that data will be updated once the
system springs back into action.
Isolation − In a database system where more than one transaction are being executed
simultaneously and in parallel, the property of isolation states that all the transactions will be
carried out and executed as if it is the only transaction in the system. No transaction will
affect the existence of any other transaction.
Serializability
When multiple transactions are being executed by the operating system in a multiprogramming
environment, there are possibilities that instructions of one transactions are interleaved with some
51
other transaction.
Schedule − A chronological execution sequence of a transaction is called a schedule. A schedule
can have many transactions in it, each comprising of a number of instructions/tasks.
Serial Schedule − It is a schedule in which transactions are aligned in such a way that one
transaction is executed first. When the first transaction completes its cycle, then the next
transaction is executed. Transactions are ordered one after the other. This type of schedule is
called a serial schedule, as transactions are executed in a serial manner.
In a multi-transaction environment, serial schedules are considered as a benchmark.
The execution sequence of an instruction in a transaction cannot be changed, but two
transactions can have their instructions executed in a random fashion.
This execution does no harm if two transactions are mutually independent and working on
different segments of data; but in case these two transactions are working on the same data,
then the results may vary. This ever-varying result may bring the database to an inconsistent
state.
To resolve this problem, we allow parallel execution of a transaction schedule, if its
transactions are either serializable or have some equivalence relation among them.
Equivalence Schedules
An equivalence schedule can be of the following types −
Result Equivalence
If two schedules produce the same result after execution, they are said to be result equivalent. They
may yield the same result for some value and different results for another set of values. That's why
this equivalence is not generally considered significant.
View Equivalence
Two schedules would be view equivalence if the transactions in both the schedules perform similar
actions in a similar manner.
For example −
If T reads the initial data in S1, then it also reads the initial data in S2.
If T reads the value written by J in S1, then it also reads the value written by J in S2.
If T performs the final write on the data value in S1, then it also performs the final write on
the data value in S2.
Conflict Equivalence
Two schedules would be conflicting if they have the following properties −
Both belong to separate transactions.
Both accesses the same data item.
At least one of them is "write" operation.
Two schedules having multiple transactions with conflicting operations are said to be conflict
equivalent if and only if −
Both the schedules contain the same set of Transactions.
52
States of Transactions
A transaction in a database can be in one of the following states −
Active − In this state, the transaction is being executed. This is the initial state of every
transaction.
Partially Committed − When a transaction executes its final operation, it is said to be in a
partially committed state.
Failed − A transaction is said to be in a failed state if any of the checks made by the
database recovery system fails. A failed transaction can no longer proceed further.
Aborted − If any of the checks fails and the transaction has reached a failed state, then the
recovery manager rolls back all its write operations on the database to bring the database
back to its original state where it was prior to the execution of the transaction. Transactions
in this state are called aborted. The database recovery module can select one of the two
operations after a transaction aborts −
Re-start the transaction
Kill the transaction
Committed − If a transaction executes all its operations successfully, it is said to be
committed. All its effects are now permanently established on the database system.
COMMIT command
COMMIT command is used to permanently save any transaction into the database.
53
When we use any DML command like INSERT, UPDATE or DELETE, the changes made
by these commands are not permanent, until the current session is closed, the changes made
by these commands can be rolled back.
To avoid that, we use the COMMIT command to mark the changes as permanent.
Following is commit command's syntax,
COMMIT;
ROLLBACK command
This command restores the database to last commited state. It is also used with SAVEPOINT
command to jump to a savepoint in an ongoing transaction.
If we have used the UPDATE command to make some changes into the database, and realise
that those changes were not required, then we can use the ROLLBACK command to
rollback those changes, if they were not commited using the COMMIT command.
Following is rollback command's syntax,
ROLLBACK TO savepoint_name;
SAVEPOINT command
SAVEPOINT command is used to temporarily save a transaction so that you can rollback to
that point whenever required.
Following is savepoint command's syntax,
SAVEPOINT savepoint_name;
In short, using this command we can name the different states of our data in any table and
then rollback to that state using the ROLLBACK command whenever required.
COMMIT;
SAVEPOINT A;
54
SAVEPOINT B;
SAVEPOINT C;
NOTE: SELECT statement is used to show the data stored in the table.
JDBC: establishes a connection with a database sends SQL statements processes the
results.
ODBC is used between applications JDBC is used by Java programmers to connect to
databases With a small "bridge" program, you can use the JDBC interface to access
ODBC- accessible databases.
ODBC
ODBC is (Open Database Connectivity): A standard or open application programming
interface (API) for accessing a database.
By using ODBC statements in a program, you can access files in a number of different
databases, including Access, dBase, DB2, Excel, and Text.
It allows programs to use SQL requests that will access databases without having to know
the proprietary interfaces to the databases.
ODBC handles the SQL request and converts it into a request the individual database system
understands.
JDBC
JDBC is: Java Database Connectivity is a Java API for connecting programs written in Java
to the data in relational databases
Consists of a set of classes and interfaces written in the Java programming language.
It provides a standard API for tool/database developers and makes it possible to write
database applications using a pure Java API.
The standard defined by Sun Microsystems, allowing individual providers to implement and
extend the standard with their own JDBC drivers.
establishes a connection with a database
sends SQL statements
processes the results.
JDBC vs ODBC
ODBC is used between applications
JDBC is used by Java programmers to connect to databases
With a small "bridge" program, you can use the JDBC interface to access ODBC- accessible
databases.
JDBC allows SQL-based database access for EJB persistence and for direct manipulation
from CORBA, DJB or other server objects .
56
MODULE IV
57
DBMS – Normalization
Functional Dependency
Functional dependency (FD) is a set of constraints between two attributes in a relation. Functional
dependency says that if two tuples have same values for attributes A1, A2,..., An, then those two
tuples must have to have same values for attributes B1, B2, ..., Bn.
Functional dependency is represented by an arrow sign (→) that is, X→Y, where X functionally
determines Y. The left-hand side attributes determine the values of attributes on the right-hand side.
Armstrong's Axioms
If F is a set of functional dependencies then the closure of F, denoted as F+, is the set of all
functional dependencies logically implied by F. Armstrong's Axioms are a set of rules, that when
applied repeatedly, generates a closure of functional dependencies.
Reflexive rule − If alpha is a set of attributes and beta is_subset_of alpha, then alpha holds
beta.
Augmentation rule − If a → b holds and y is attribute set, then ay → by also holds. That is
adding attributes in dependencies, does not change the basic dependencies.
Transitivity rule − Same as transitive rule in algebra, if a → b holds and b → c holds, then
a → c also holds. a → b is called as a functionally that determines b.
Normalization
atabase Normalization is a technique of organizing the data in the database. Normalization is
a systematic approach of decomposing tables to eliminate data redundancy(repetition) and
58
Each attribute must contain only a single value from its pre-defined domain.
attribute.
Non-prime attribute − An attribute, which is not a part of the prime-key, is said to be a
non-prime attribute.
If we follow second normal form, then every non-prime attribute should be fully
functionally dependent on prime key attribute.
That is, if X → A holds, then there should not be any proper subset Y of X, for which Y → A
also holds true i.e, there should not be any partial dependancy.
We see here in Student_Project relation that the prime key attributes are Stu_ID and
Proj_ID.
According to the rule, non-key attributes, i.e. Stu_Name and Proj_Name must be dependent
upon both and not on any of the prime key attribute individually.
But we find that Stu_Name can be identified by Stu_ID and Proj_Name can be identified by
Proj_ID independently. This is called partial dependency, which is not allowed in Second
Normal Form.
We find that in the above Student_detail relation, Stu_ID is the key and only prime key
60
attribute. We find that City can be identified by Stu_ID as well as Zip itself. Neither Zip is a
superkey nor is City a prime attribute. Additionally, Stu_ID → Zip → City, so there exists
transitive dependency.
To bring this relation into third normal form, we break the relation into two relations as
follows:
Features of OODBMS
In OODBMS, every entity is considered as object and represented in a table. Similar objects are
classified to classes and subclasses and relationship between two object is maintained using concept
62
of inverse reference.
1. Complexity
OODBMS has the ability to represent the complex internal structure (of object) with multilevel
complexity.
2. Inheritance
Creating a new object from an existing object in such a way that new object inherits all
characteristics of an existing object.
3. Encapsulation
It is an data hiding concept in OOPL which binds the data and functions together which can
manipulate data and not visible to outside world.
4. Persistency
OODBMS allows to create persistent object (Object remains in memory even after execution). This
feature can automatically solve the problem of recovery and concurrency.
2. Query Processing
3.Query Optimization
Challenge: New indexes and query processing techniques increase the options for query
optimization. But, the challenge is that the optimizer must know to handle and use the query
processing functionality properly.
Solution: While constructing a query plan, an optimizer must be familiar to the newly added index
structures
centralized system is not much efficient. The need to improve the efficiency gave birth to the
concept of Parallel Databases.
Parallel database system improves performance of data processing using multiple resources in
parallel, like multiple CPU and disks are used parallely.
It also performs many parallelization operations like, data loading and query processing.
Improve performance:
The performance of the system can be improved by connecting multiple CPU and disks in
parallel. Many small processors can also be connected in parallel.
Improve reliability:
Reliability of system is improved with completeness, accuracy and availability of data.
Architecture
There are three main architectures that have been proposed for parallel database management
systems :
Shared Memory – Any CPU has access to both memory and disk through a fast
interconnect (e.g high-speed bus). This provides excellent load balance however scalability
and availability is limited.
Shared Disk – This provides the CPU with its own memory but a shared disk. Meaning
there is no longer competition for shared memory but still competition for access to the
64
shared disk. This provides better scale up and the load balancing is still acceptable.
Availability is better than shared memory but still limited as disk failure would mean entire
system failure.
Share Nothing – Each processor has exclusive access to its main memory and disk unit, this
means all communication between CPUs is through a network connection. Shared nothing
has high availability and reliability; if one node fails the others are still able to run
independently. However load balance and skew become major issues with this architecture.