RDBMS Concepts
RDBMS Concepts
A Relational Database management System(RDBMS) is a database management system based on relational model introduced by E.F Codd. In relational model, data is represented in terms of tuples(rows). RDBMS is used to manage Relational database. Relational database is a collection of organized set of tables from which data can be accessed easily. Relational Database is most commonly used database. It consists of number of tables and each table has its own primary key.
What is Table ?
In Relational database, a table is a collection of data elements organised in terms of rows and columns. A table is also considered as convenient representation of relations. But a table can have duplicate tuples while a true relation cannot have duplicate tuples. Table is the most simplest form of data storage. Below is an example of Employee table. ID Name Age Salary
Adam
34
13000
Alex
28
15000
Stuart
20
18000
Ross
42
19020
What is a Record ?
A single entry in a table is called a Record or Row. A Record in a table represents set of related data. For example, the above Employee table has 4 records. Following is an example of single record.
Adam
34
13000
What is Field ?
A table consists of several records(row), each record can be broken into several smaller entities known asFields. The above Employee table consist of four fields, ID, Name, Age and Salary.
What is a Column ?
In Relational table, a column is a set of value of a particular type. The term Attribute is also used to represent a column. For example, in Employee table, Name is a column that represent names of employee. Name
Adam
Alex
Stuart
Ross
Database Keys
Keys are very important part of Relational database. They are used to establish and identify relation between tables. They also ensure that each record within a table can be uniquely identified by combination of one or more fields within a table.
Super Key
Super Key is defined as a set of attributes within a table that uniquely identifies each record within a table. Super Key is a superset of Candidate key.
Candidate Key
Candidate keys are defined as the set of fields from which primary key can be selected. It is an attribute or set of attribute that can act as a primary key for a table to uniquely identify each record in that table.
Primary Key
Primary key is a candidate key that is most appropriate to become main key of the table. It is a key that uniquely identify each record in a table.
Composite Key
Key that consist of two or more attributes that uniquely identify an entity occurance is called Composite key. But any attribute that makes up the Composite key is not a simple key in its own.
Non-key Attribute
Non-key attributes are attributes other than candidate key attributes in a table.
Non-prime Attribute
Non-prime Attributes are attributes other than Primary attribute.
Normalization of Database
Normalization is a systematic approach of decomposing tables to eliminate data redundancy and undesirable characteristics like Insertion, Update and Deletion Anamolies. It is a two step process that puts data into tabular form by removing duplicated data from the relation tables. Normalization is used for mainly two purpose,
Eliminating reduntant(useless) data. Ensuring data dependencies make sense i.e data is logically stored.
401
Adam
Noida
Bio
402
Alex
Panipat
Maths
403
Stuart
Jammu
Maths
404
Adam
Noida
Physics
Updation Anamoly : To update address of a student who occurs twice or more than twice in a table, we will have to update S_Address column in all the rows, else data will become inconsistent.
Insertion Anamoly : Suppose for a new admission, we have a Student id(S_id), name and address of a student but if student has not opted for any subjects yet then we have to insert NULL there, leading to Insertion Anamoly.
Deletion Anamoly : If (S_id) 401 has only one subject and temporarily he drops it, when we delete that row, entire student record will be deleted along with it.
Normalization Rule
Normalization rule are divided into following normal form.
1. First Normal Form 2. Second Normal Form 3. Third Normal Form 4. BCNF
401
Adam
Biology
401
Adam
Physics
402
Alex
Maths
403
Stuart
Maths
You can clearly see here that student name Adam is used twice in the table and subject math is also repeated. This violates the First Normal form. To reduce above table to First Normal form break the table into two different tables New Student Table : S_id S_Name
401
Adam
402
Alex
403
Stuart
10
401
Biology
11
401
Physics
12
402
Math
12
403
Math
In Student table concatenation of subject_id and student_id is the Primary key. Now both the Student table and Subject table are normalized to first normal form
101
Adam
10
order1
sale1
101
Adam
11
order2
sale2
102
Alex
12
order3
sale3
103
Stuart
13
order4
sale4
In Customer table concatenation of Customer_id and Order_id is the primary key. This table is in First Normal form but not in Second Normal form because there are partial dependencies of columns on primary key. Customer_Name is only dependent on customer_id, Order_name is dependent on Order_id and there is no link between sale_detail and Customer_name. To reduce Customer table to Second Normal form break the table into following three different tables. Customer_Detail Table : customer_id Customer_Name
101
Adam
102
Alex
103
Stuart
10
Order1
11
Order2
12
Order3
13
Order4
101
10
sale1
101
11
sale2
102
12
sale3
103
13
sale4
Now all these three table comply with Second Normal form.
In this table Student_id is Primary key, but street, city and state depends upon Zip. The dependency between zip and other fields is called transitive dependency. Hence to apply 3NF, we need to move the street, city and state to new table, with Zip as primary key. New Student_Detail Table : Student_id Student_name DOB Zip
Overview of Database
A Database is a collection of related data organised in a way that data can be easily accessed, managed and updated. Any piece of information can be a data, for example name of your school. Database is actualy a place where related piece of information is stored and various operations can be performed on it.
DBMS
A DBMS is a software that allows creation, definition and manipulation of database. Dbms is actualy a tool used to perform any kind of operation on data in database. Dbms also provides protection and security to database. It maintains data consistency in case of multiple users. Here are some examples of popular dbms, MySql, Oracle, Sybase, Microsoft Access and IBM DB2 etc.
Users : Users may be of various type such as DB administrator, System developer and End users.
Database application : Database application may be Personal, Departmental, Enterprise and Internal
DBMS : Software that allow users to define, create and manages database access, Ex: MySql, Oracle etc.
Functions of DBMS
Provides data Independence Concurrency Control Provides Recovery services Provides Utility services Provides a clear and logical view of the process that manipulates data.
Advantages of DBMS
Segregation of applicaion program. Minimal data duplicacy. Easy retrieval of data. Reduced development time and maintainance need.
Disadvantages of DBMS
Database Architecture
Database architecture is logically divided into two types.
1. Logical two-tier Client / Server architecture 2. Logical three-tier Client / Server architecture
Two-tier Client / Server architecture is used for User Interface program and Application Programs that runs on client side. An interface called ODBC(Open Database Connectivity) provides an API that allow client side program to call the dbms. Most DBMS vendors provide ODBC drivers. A client program may connect to several DBMS's. In this architecture some variation of client is also possible for example in some DBMS's more functionality is transferred to the client including data dictionary, optimization etc. Such clients are called Data server.
Three-tier Client / Server database architecture is commonly used architecture for web applications. Intermediate layer called Application server or Web Server stores the web connectivty software and the business logic(constraints) part of application used to access the right amount of data from the database server. This layer acts like medium for sending partially processed data between the database server and the client.
Database Model
A Database model defines the logical design of data. The model describes the relationships between different parts of the data. In history of database design, three models have been in use.
Hierarchical Model
In this model each entity has only one parent but can have several children . At the top of hierarchy there is only one entity which is called Root.
Network Model
In the network model, entities are organised in a graph,in which some entities can be accessed through sveral path
Relational Model
In this model, data is organised in two-dimesional tables called relations. The tables or relation are related to each other.
Codd's Rule
E.F Codd was a Computer Scientist who invented Relational model for Database management. Based on relational model, Relation database was created. Codd proposed 13 rules popularly known as Codd's 12 rules to test DBMS's concept against his relational model. Codd's rule actualy define what quality a DBMS requires in order to become a Relational Database Management System(RDBMS). Till now, there is hardly any commercial product that follows all the 13 Codd's rules. Even Oracle follows only eight and half out(8.5) of 13. The Codd's 12 rules are as follows.
Rule zero
This rule states that for a system to qualify as an RDBMS, it must be able to manage database entirely through the relational capabilities.
All information(including metadeta) is to be represented as stored data in cells of tables. The rows and columns have to be strictly unordered.
Introduction to SQL
Structure Query Language(SQL) is a programming language used for storing and managing data in RDBMS. SQL was the first commercial language introduced for E.F Codd's Relational model. Today almost all RDBMS(MySql, Oracle, Infomix, Sybase, MS Access) uses SQL as the standard database language. SQL is used to perform all type of data operations in RDBMS.
SQL Command
SQL defines following data languages to manipulate data of RDBMS.
create
alter
for alteration
truncate
drop
to drop a table
rename
to rename a table
DML commands are not auto-committed. It means changes are not permanent to database, they can be rolled back. Command Description
insert
update
delete
to delete a row
merge
commit
to permanently save
rollback
to undo change
savepoint
to save temporarily
Data control language provides command to grant and take back authority. Command Description
grant
revoke
select
create command
create is a DDL command used to create a table or a database.
Creating a Database
To create a database in RDBMS, create command is uses. Following is the Syntax, create database database-name;
Creating a Table
create command is also used to create a table. We can specify names and datatypes of various columns along.Following is the Syntax, create table table-name { column-name1 datatype1, column-name2 datatype2, column-name3 datatype3, column-name4 datatype4 }; create table command will tell the database system to create a new table with given table name and column information.
alter command
alter command is used for alteration of table structures. There are various uses of alter command, such as,
to add a column to existing table to rename any existing column to change datatype of any column or to modify its size. alter is also used to drop a column.
Using alter command we can add a column to an existing table. Following is the Syntax, alter table table-name add(column-name datatype); Here is an Example for this, alter table Student add(address char); The above command will add a new column address to the Student table
Here is an Example for this, alter table Student modify(address varchar(30)); The above command will modify address column of the Student table
To Rename a column
Using alter command you can rename an existing column. Following is the Syntax, alter table table-name rename old-column-name to column-name; Here is an Example for this, alter table Student rename address to Location; The above command will rename address column to Location.
To Drop a Column
alter command is also used to drop columns also. Following is the Syntax, alter table table-name drop(column-name); Here is an Example for this, alter table Student drop(address); The above command will drop address column from the Student table
truncate command
truncate command removes all records from a table. But this command will not destroy the table's structure. When we apply truncate command on a table its Primary key is initialized. Following is its Syntax, truncate table table-name Here is an Example explaining it. truncate table Student; The above query will delete all the records of Student table.
drop command
drop query completely removes a table from database. This command will also destroy the table structure. Following is its Syntax, drop table table-name Here is an Example explaining it. drop table Student; The above query will delete the Student table completely. It can also be used on Databases. For Example, to drop a database, drop database Test; The above query will drop a database named Test from the system.
rename query
rename command is used to rename a table. Following is its Syntax, rename old-table-name to table-name Here is an Example explaining it. rename Student to Student-record; The above query will rename Student table to Student-record.
DML command
Data Manipulation Language (DML) statements are used for managing data in database. DML commands are not auto-committed. It means changes made by DML command are not permanent to database, it can be rolled back.
1) INSERT command
Insert command is used to insert data into a table. Following is its general syntax, INSERT into table-name values(data1,data2,..) Lets see an example,
INSERT into Student values(101,'Adam',15); The above command will insert a record into Student table. S_id S_Name age
101
Adam
15
101
Adam
15
102
Alex
101
Adam
15
102
Alex
103
chris
14
Suppose the age column of student table has default value of 14.
2) UPDATE command
Update command is used to update a row of a table. Following is its general syntax, UPDATE table-name set column-name = value where condition; Lets see an example, update Student set age=18 where s_id=102; S_id S_Name age
101
Adam
15
102
Alex
18
103
chris
14
The above command will update two columns of a record. S_id S_Name age
101
Adam
15
102
Alex
18
103
Abhi
17
3) Delete command
Delete command is used to delete data from a table. Delete command can also be used with condition to delete a particular row. Following is its general syntax, DELETE from table-name;
101
Adam
15
102
Alex
18
103
Abhi
17
DELETE from Student where s_id=103; The above command will delete the record where s_id is 103 from Student table. S_id S_Name age
101
Adam
15
102
Alex
TCL command
Transaction Control Language(TCL) commands are used to manage transactions in database.These are used to manage the changes made by DML statements. It also allows statements to be grouped together into logical transactions.
Commit command
Commit command is used to permanently save any transaaction into database. Following is Commit command's syntax, commit;
Rollback command
This command restores the database to last commited state. It is also use with savepoint command to jump to a savepoint in a transaction.
Savepoint command
savepoint command is used to temporarily save a transaction so that you can rollback to that point whenever necessary. Following is savepoint command's syntax, savepoint savepoint-name;
abhi
adam
alex
Lets use some SQL queries on the above table and see the results. INSERT into class values(5,'Rahul'); commit; UPDATE class set name='abhijit' where id='5'; savepoint A; INSERT into class values(6,'Chris'); savepoint B; INSERT into class values(7,'Bravo');
savepoint C; SELECT * from class; The resultant table will look like, ID NAME
abhi
adam
alex
abhijit
chris
bravo
Now rollback to savepoint B rollback to B; SELECT * from class; The resultant table will look like ID NAME
abhi
adam
alex
abhijit
chris
Now rollback to savepoint A rollback to A; SELECT * from class; The result table will look like ID NAME
abhi
adam
alex
abhijit
DCL command
Data Control Language(DCL) is used to control privilege in Database. To perform any operation in the database, such as for creating tables, sequences or views we need privileges. Privileges are of two types,
System : creating session, table etc are all types of system privilege. Object : any command or query to work on tables comes under object privilege.
Grant : Gives user access privileges to database. Revoke : Take back permissions from user.
WHERE clause
Where clause is used to specify condition while retriving data from table. Where clause is used mostly withSelect, Update and Delete query. If condititon specified by where clause is true then only the result from table is returned.
101
Adam
15
Noida
102
Alex
18
Delhi
103
Abhi
17
Rohtak
104
Ankit
22
Panipat
Now we will use a SELECT statement to display data of the table, based on a condition, which we will add to the SELECT query using WHERE clause. SELECT s_id,
s_name, age, address from Student WHERE s_id=101; s_id s_Name age address
101
Adam
15
Noida
SELECT Query
Select query is used to retrieve data from a tables. It is the most used SQL query. We can retrieve complete tables, or partial by mentioning conditions using WHERE clause.
101
Adam
15
Noida
102
Alex
18
Delhi
103
Abhi
17
Rohtak
104
Ankit
22
Panipat
The above query will fetch information of s_id, s_name and age column from Student table
S_id S_Name age
101
Adam
15
102
Alex
18
103
Abhi
17
104
Ankit
22
The above query will show all the records of Student table, that means it will show complete Student table as result.
S_id S_Name age address
101
Adam
15
Noida
102
Alex
18
Delhi
103
Abhi
17
Rohtak
104
Ankit
22
Panipat
103
Abhi
17
Rohtak
101
Adam
26
5000
102
Ricky
42
8000
103
Abhi
22
10000
104
Rohan
35
5000
from Employee;
The above command will display a new column in the result, showing 3000 added into existing salaries of the employees.
eid Name salary+3000
101
Adam
8000
102
Ricky
11000
103
Abhi
13000
104
Rohan
8000
Like clause
Like clause is used as condition in SQL query. Like clause compares data with an expression using wildcard operators. It is used to find similar data from the table.
Wildcard operators
There are two wildcard operators that are used in like clause.
Percent sign % : represents zero, one or more than one character. Underscore sign _ : represents only one character.
101
Adam
15
102
Alex
18
103
Abhi
17
SELECT * from Student where s_name like = 'A%'; The above query will return all records where s_name starts with character 'A'. s_id s_Name age
101
Adam
15
102
Alex
18
103
Abhi
17
Example
SELECT * from Student where s_name like = '_d%'; The above query will return all records from Student table where s_name contain 'd' as second character. s_id s_Name age
101
Adam
15
Example
SELECT * from Student where s_name like = '%x'; The above query will return all records from Student table where s_name contain 'x' as last character.
s_id
s_Name
age
102
Alex
Order By Clause
Order by clause is used with Select statement for arranging retrieved data in sorted order. The Order byclause by default sort data in ascending order. To sort data in descending order DESC keyword is used withOrder by clause.
Syntax of Order By
SELECT column-list|* from table-name order by asc|desc;
401
Anu
22
9000
402
Shane
29
8000
403
Rohan
34
6000
404
Scott
44
10000
405
Tiger
35
8000
SELECT * from Emp order by salary; The above query will return result in ascending order of the salary. eid name age salary
403
Rohan
34
6000
402
Shane
29
8000
405
Tiger
35
8000
401
Anu
22
9000
404
Scott
44
10000
404
Scott
44
10000
401
Anu
22
9000
405
Tiger
35
8000
402
Shane
29
8000
403
Rohan
34
6000
Group By Clause
Group by clause is used to group the results of a SELECT query based on one or more columns. It is also used with SQL functions to group the result from one or more tables. Syntax for using Group by in a statement. SELECT column_name, function(column_name) FROM table_name WHERE condition GROUP BY column_name
401
Anu
22
9000
402
Shane
29
8000
403
Rohan
34
6000
404
Scott
44
9000
405
Tiger
35
8000
Here we want to find name and age of employees grouped by their salaries SQL query for the above requirement will be, SELECT name, age from Emp group by salary Result will be, name age
Rohan
34
shane
29
anu
22
401
Anu
22
9000
402
Shane
29
8000
403
Rohan
34
6000
404
Scott
44
9000
405
Tiger
35
8000
SQL query will be, select name, salary from Emp where age > 25 group by salary Result will be. name salary
Rohan
6000
Shane
8000
Scott
9000
You must remember that Group By clause will always come at the end, just like the Order by clause.
HAVING Clause
having clause is used with SQL Queries to give more precise condition for a statement. It is used to mention condition in Group based SQL functions, just like WHERE clause. Syntax for having will be, select column_name, function(column_name) FROM table_name WHERE column_name condition GROUP BY column_name HAVING function(column_name) condition
11
ord1
2000
Alex
12
ord2
1000
Adam
13
ord3
2000
Abhi
14
ord4
1000
Adam
15
ord5
2000
Alex
Suppose we want to find the customer whose previous_balance sum is more than 3000. We will use the below SQL query, SELECT * from sale group customer having sum(previous_balance) > 3000 Result will be, oid order_name previous_balance customer
11
ord1
2000
Alex
Distinct keyword
The distinct keyword is used with Select statement to retrieve unique values from the table. Distinct removes all the duplicate records while retrieving from database.
Example
Consider the following Emp table.
eid name age salary
401
Anu
22
5000
402
Shane
29
8000
403
Rohan
34
10000
404
Scott
44
10000
405
Tiger
35
8000
The above query will return only the unique salary from Emp table
salary
5000
8000
10000
AND operator
AND operator is used to set multiple conditions with Where clause.
Example of AND
Consider the following Emp table
eid name age salary
401
Anu
22
5000
402
Shane
29
8000
403
Rohan
34
12000
404
Scott
44
10000
405
Tiger
35
9000
SELECT * from Emp WHERE salary < 10000 AND age > 25
The above query will return records where salary is less than 10000 and age greater than 25.
eid name age salary
402
Shane
29
8000
405
Tiger
35
9000
OR operator
OR operator is also used to combine multiple conditions with Where clause. The only difference between AND and OR is their behaviour. When we use AND to combine two or more than two conditions, records satisfying all the condition will be in the result. But in case of OR, atleast one condition from the conditions specified must be satisfied by any record to be in the result.
Example of OR
Consider the following Emp table
eid name age salary
401
Anu
22
5000
402
Shane
29
8000
403
Rohan
34
12000
404
Scott
44
10000
405
Tiger
35
9000
The above query will return records where either salary is greater than 10000 or age greater than 25.
402 Shane 29 8000
403
Rohan
34
12000
404
Scott
44
10000
405
Tiger
35
9000
SQL Constraints
SQl Constraints are rules used to limit the type of data that can go into a table, to maintain the accuracy and integrity of the data inside table. Constraints can be divided into following two types,
Column level constraints : limits only column data Table level constraints : limits whole table data
Constraints are used to make sure that the integrity of data is maintained in the database. Following are the most used constraints that can be applied to a table.
DEFAULT
The above query will declare that the s_id field of Student table will not take NULL value.
UNIQUE Constraint
UNIQUE constraint ensures that a field or column will only have unique values. A UNIQUE constraint field will not have duplicate data. UNIQUE constraint can be applied at column level or table level.
The above query will declare that the s_id field of Student table will only have unique values and wont take NULL value.
The above query specifies that s_id field of Student table will only have unique value.
Primary key constraint uniquely identifies each record in a database. A Primary Key must contain unique value and it must not contain null value. Usually Primary Key is used to index the data inside the table.
101
Adam
Noida
102
Alex
Delhi
103
Stuart
Rohtak
10
Order1
101
11
Order2
103
12
Order3
102
In Customer_Detail table, c_id is the primary key which is set as foreign key in Order_Detail table. The value that is entered in c_id which is set as foreign key in Order_Detail table must be present in Customer_Detailtable where it is set as primary key. This prevents invalid data to be inserted into c_id column of Order_Detailtable.
In this query, c_id in table Order_Detail is made as foriegn key, which is a reference of c_id column of Customer_Detail.
On Delete Cascade : This will remove the record from child table, if that value of foriegn key is deleted from the main table.
On Delete Null : This will set all the values in that record of child table as NULL, for which the value of foriegn key is eleted from the main table.
If we don't use any of the above, then we cannot delete data from the main table for which data in child table exists. We will get an error if we try to do so.
ERROR : Record in child table exist
CHECK Constraint
CHECK constraint is used to restrict the value of a column between a range. It performs check on the values, before storing them into the database. Its like condition checking before saving data into a column.
The above query will restrict the s_id value to be greater than zero.