Lesson 1 and 2 For Student
Lesson 1 and 2 For Student
A database is a collection of related files that are usually integrated, linked or cross-referenced to
one another. The advantage of a database is that data and records contained in different files can
be easily organized and retrieved using specialized database management software called a
database management system (DBMS) or database manager.
As the name suggests, the database management system consists of two parts. They are:
1. Database and
2. Management System
What is a Database?
Data: Facts, figures, statistics etc. having no particular meaning (e.g. 1, ABC, 19 etc).
Record: Collection of related data items, e.g. in the above example the three data items had no
meaning. But if we organize them in the following way, then they collectively represent
meaningful information.
+
The columns of this relation are called Fields, Attributes or Domains. The rows are
called Tuples or Records.
T2
Roll Address
1 KOL
2 DEL
3 MUM
T3
Roll Year
1 I
2 II
3 I
T4
Year Hostel
I H1
II H2
We now have a collection of 4 tables. They can be called a “related collection” because we can
clearly find out that there are some common attributes existing in a selected pair of tables.
Because of these common attributes we may combine the data of two or more tables together to
find out the complete details of a student. Questions like “Which hostel does the youngest
student live in?” can be answered now, although Age andHostel attributes are in different tables.
In a database, data is organized strictly in row and column format. The rows are
calledTuple or Record. The data items within one row may belong to different data types. On
the other hand, the columns are often called Domain or Attribute. All the data items within a
single attribute are of the same data type.
A management system is a set of rules and procedures which help us to create organize and
manipulate the database. It also helps us to add, modify delete data items in the database. The
management system can be either manual or computerized.
A database management system is a set of software programs that allows users to create, edit and
update data in database files, and store and retrieve data from those database files. Data in a
database can be added, deleted, changed, sorted or searched all using a DBMS. If you were an
employee in a large organization, the information about you would likely be stored in different
files that are linked together. One file about you would pertain to your skills and abilities,
another file to your income tax status, another to your home and office address and telephone
number, and another to your annual performance ratings. By cross-referencing these files,
someone could change a person's address in one file and it would automatically be reflected in all
the other files. DBMSs are commonly used to manage:
Improved availability: One of the principle advantages of a DBMS is that the same information
can be made available to different users.
Minimized redundancy: The data in a DBMS is more concise because, as a general rule, the
information in it appears just once. This reduces data redundancy, or in other words, the need to
repeat the same data over and over again. Minimizing redundancy can therefore significantly
reduce the cost of storing information on hard drives and other storage devices. In contrast, data
fields are commonly repeated in multiple files when a file management system is used.
Accuracy: Accurate, consistent, and up-to-date data is a sign of data integrity. DBMSs foster
data integrity because updates and changes to the data only have to be made in one place. The
chances of making a mistake are higher if you are required to change the same data in several
different places than if you only have to make the change in one place.
Program and file consistency: Using a database management system, file formats and system
programs are standardized. This makes the data files easier to maintain because the same rules
and guidelines apply across all types of data. The level of consistency across files and programs
also makes it easier to manage data when multiple programmers are involved.
User-friendly: Data is easier to access and manipulate with a DBMS than without it. In most
cases, DBMSs also reduce the reliance of individual users on computer specialists to meet their
data needs.
Improved security: As stated earlier, DBMSs allow multiple users to access the same data
resources. This capability is generally viewed as a benefit, but there are potential risks for the
organization. Some sources of information should be protected or secured and only viewed by
select individuals. Through the use of passwords, database management systems can be used to
restrict data access to only those who should see it.
However, there could be a few disadvantages of using DBMS. They can be as following:
1. As DBMS needs computers, we have to invest a good amount in acquiring the hardware,
software, installation facilities and training of users.
2. We have to keep regular backups because a failure can occur any time. Taking backup is a
lengthy process and the computer system cannot perform any other job at this time.
3. While data security system is a boon for using DBMS, it must be very robust. If someone can
bypass the security system then the database would become open to any kind of mishandling.
There are basically two major downsides to using DBMSs. One of these is cost, and the other the
threat to data security.
Cost: Implementing a DBMS system can be expensive and time-consuming, especially in large
organizations. Training requirements alone can be quite costly.
Security: Even with safeguards in place, it may be possible for some unauthorized users to
access the database. In general, database access is an all or nothing proposition. Once an
unauthorized user gets into the database, they have access to all the files, not just a few.
Depending on the nature of the data involved, these breaches in security can also pose a threat to
individual privacy. Steps should also be taken to regularly make backup copies of the database
files and store them because of the possibility of fires and earthquakes that might destroy the
system.
DBMS Fundamentals
A database management system is a set of software programs that allows users to create, edit and
update data in database files, and store and retrieve data from those database files. Data in a
database can be added, deleted, changed, sorted or searched all using a DBMS. If you were an
employee in a large organization, the information about you would likely be stored in different
files that are linked together. One file about you would pertain to your skills and abilities,
another file to your income tax status, another to your home and office address and telephone
number, and another to your annual performance ratings. By cross-referencing these files,
someone could change a person's address in one file and it would automatically be reflected in all
the other files. DBMSs are commonly used to manage:
Computerized file management systems (sometimes called file managers) are not considered true
database management systems because files cannot be easily linked to each other. However, they
can serve as useful data management functions by providing a system for storing information in
files. For example, a file management system might be used to store a mailing list or a personal
address book. When files need to be linked, a relational database should be created using
database application software such as Oracle, Microsoft Access, IBM DB2, or FileMaker Pro.
Database Administrator
The Database Administrator, better known as DBA, is the person (or a group of persons)
responsible for the well being of the database management system. S/he has the flowing
functions and responsibilities regarding database management:
1. Definition of the schema, the architecture of the three levels of the data abstraction, data
independence.
2. Modification of the defined schema as and when required.
3. Definition of the storage structure i.e. and access method of the data stored i.e. sequential,
indexed or direct.
4. Creating new used-id, password etc, and also creating the access permissions that each user can
or cannot enjoy. DBA is responsible to create user roles, which are collection of the permissions
(like read, write etc.) granted and restricted for a class of users. S/he can also grant additional
permissions to and/or revoke existing permissions from a user if need be.
5. Defining the integrity constraints for the database to ensure that the data entered conform to
some rules, thereby increasing the reliability of data.
6. Creating a security mechanism to prevent unauthorized access, accidental or intentional handling
of data that can cause security threat.
7. Creating backup and recovery policy. This is essential because in case of a failure the database
must be able to revive itself to its complete functionality with no loss of data, as if the failure has
never occurred. It is essential to keep regular backup of the data so that if the system fails then all
data up to the point of failure will be available from a stable storage. Only those amount of data
gathered during the failure would have to be fed to the database to recover it to a healthy status.
We know that the same thing, if viewed from different angles produces difference sights.
Likewise, the database that we have created already can have different aspects to reveal if seen
from different levels of abstraction.
The term Abstraction is very important here. Generally it means the amount of detail you want
to hide.
The word schema means arrangement – how we want to arrange things that we have to store.
The diagram above shows the three different schemas used in DBMS, seen from different levels
of abstraction.
The lowest level, called the Internal or Physical schema, deals with the description of how raw
data items (like 1, ABC, KOL, H2 etc.) are stored in the physical storage (Hard Disc, CD, Tape
Drive etc.). It also describes the data type of these data items, the size of the items in the storage
media, the location (physical address) of the items in the storage device and so on. This schema
is useful for database application developers and database administrator.
The middle level is known as the Conceptual or Logical Schema, and deals with the structure
of the entire database. Please note that at this level we are not interested with the raw data items
anymore, we are interested with the structure of the database. This means we want to know the
information about the attributes of each table, the common attributes in different tables that help
them to be combined, what kind of data can be input into these attributes, and so on. Conceptual
or Logical schema is very useful for database administrators whose responsibility is to maintain
the entire database.
The highest level of abstraction is the External or View Schema. This is targeted for the end
users. Now, an end user does not need to know everything about the structure of the entire
database, rather than the amount of details he/she needs to work with. We may not want the end
user to become confused with astounding amount of details by allowing him/her to have a look at
the entire database, or we may also not allow this for the purpose of security, where sensitive
information must remain hidden from unwanted persons. The database administrator may want
to create custom made tables, keeping in mind the specific kind of need for each user. These
tables are also known as virtual tables, because they have no separate physical existence. They
are crated dynamically for the users at runtime. Say for example, in our sample database we have
created earlier, we have a special officer whose responsibility is to keep in touch with the parents
of any under aged student living in the hostels. That officer does not need to know every detail
except the Roll, Name, Addresss and Age. The database administrator may create a virtual table
with only these four attributes, only for the use of this officer.
Data Independence
This brings us to our next topic: data independence. It is the property of the database which tries
to ensure that if we make any change in any level of schema of the database, the schema
immediately above it would require minimal or no need of change.
What does this mean? We know that in a building, each floor stands on the floor below it. If we
change the design of any one floor, e.g. extending the width of a room by demolishing the
western wall of that room, it is likely that the design in the above floors will have to be changed
also. As a result, one change needed in one particular floor would mean continuing to change the
design of each floor until we reach the top floor, with an increase in the time, cost and labour.
Would not life be easy if the change could be contained in one floor only? Data independence is
the answer for this. It removes the need for additional amount of work needed in adopting the
single change into all the levels above.
Data independence can be classified into the following two types:
1. Physical Data Independence: This means that for any change made in the physical schema, the
need to change the logical schema is minimal. This is practically easier to achieve. Let us explain
with an example.
Say, you have bought an Audio CD of a recently released film and one of your friends has
bought an Audio Cassette of the same film. If we consider the physical schema, they are entirely
different. The first is digital recording on an optical media, where random access is possible. The
second one is magnetic recording on a magnetic media, strictly sequential access. However, how
this change is reflected in the logical schema is very interesting. For music tracks, the logical
schema for both the CD and the Cassette is the title card imprinted on their back. We have
information like Track no, Name of the Song, Name of the Artist and Duration of the Track,
things which are identical for both the CD and the Cassette. We can clearly say that we have
achieved the physical data independence here.
2. Logical Data Independence: This means that for any change made in the logical schema, the
need to change the external schema is minimal. As we shall see, this is a little difficult to
achieve. Let us explain with an example.
Suppose the CD you have bought contains 6 songs, and some of your friends are interested in
copying some of those songs (which they like in the film) into their favorite collection. One
friend wants the songs 1, 2, 4, 5, 6, another wants 1, 3, 4, 5 and another wants 1, 2, 3, 6. Each of
these collections can be compared to a view schema for that friend. Now by some mistake, a
scratch has appeared in the CD and you cannot extract the song 3. Obviously, you will have to
ask the friends who have song 3 in their proposed collection to alter their view by deleting song 3
from their proposed collection as well.
When a company asks you to make them a working, functional DBMS which they can work
with, there are certain steps to follow. Let us summarize them here:
1. Gathering information: This could be a written document that describes the system in question
with reasonable amount of details.
2. Producing ERD: ERD or Entity Relationship Diagram is a diagrammatic representation of the
description we have gathered about the system.
3. Designing the database: Out of the ERD we have created, it is very easy to determine the tables,
the attributes which the tables must contain and the relationship among these tables.
4. Normalization: This is a process of removing different kinds of impurities from the tables we
have just created in the above step.
Cardinality of Relationship
While creating relationship between two entities, we may often need to face the cardinality
problem. This simply means that how many entities of the first set are related to how many
entities of the second set. Cardinality can be of the following three types.
One-to-One
Only one entity of the first set is related to only one entity of the second set. E.g. A teacher
teaches a student. Only one teacher is teaching only one student. This can be expressed in the
following diagram as:
One-to-Many
Only one entity of the first set is related to multiple entities of the second set. E.g. A teacher
teaches students. Only one teacher is teaching many students. This can be expressed in the
following diagram as:
Many-to-One
Multiple entities of the first set are related to multiple entities of the second set. E.g. Teachers
teach a student. Many teachers are teaching only one student. This can be expressed in the
following diagram as:
Many-to-Many
Multiple entities of the first set is related to multiple entities of the second set. E.g.Teachers
teach students. In any school or college many teachers are teaching many students. This can be
considered as a two way one-to-many relationship. This can be expressed in the following
diagram as:
In this discussion we have not included the attributes, but you can understand that they can be
used without any problem if we want to.
A key is an attribute of a table which helps to identify a row. There can be many different types
of keys which are explained here.
Super Key or Candidate Key: It is such an attribute of a table that can uniquely identify a row
in a table. Generally they contain unique values and can never contain NULL values. There can
be more than one super key or candidate key in a table e.g. within a STUDENT table Roll and
Mobile No. can both serve to uniquely identify a student.
Primary Key: It is one of the candidate keys that are chosen to be the identifying key for the
entire table. E.g. although there are two candidate keys in the STUDENT table, the college
would obviously use Roll as the primary key of the table.
Alternate Key: This is the candidate key which is not chosen as the primary key of the table.
They are named so because although not the primary key, they can still identify a row.
Composite Key: Sometimes one key is not enough to uniquely identify a row. E.g. in a single
class Roll is enough to find a student, but in the entire school, merely searching by the Roll is not
enough, because there could be 10 classes in the school and each one of them may contain a
certain roll no 5. To uniquely identify the student we have to say something like “class VII, roll
no 5”. So, a combination of two or more attributes is combined to create a unique combination of
values, such as Class + Roll.
Foreign Key: Sometimes we may have to work with an attribute that does not have a primary
key of its own. To identify its rows, we have to use the primary attribute of a related table. Such
a copy of another related table’s primary key is called foreign key.
Relational DBMS
This is our subject of study. A DBMS is relational if the data is organized into relations, that is,
tables. In RDBMS, all data are stored in the well-known row-column format.
Hierarchical DBMS
In HDBMS, data is organized in a tree like manner. There is a parent-child relationship among
data items and the data model is very suitable for representing one-to-many relationship. To
access the data items, some kind of tree-traversal techniques are used, such as preorder traversal.
Because HDBMS is built on the one-to-many model, we have to face a little bit of difficulty to
organize a hierarchical database into row column format. For example, consider the following
hierarchical database that shows four employees (E01, E02, E03, and E04) belonging to the same
department D1.
There are two ways to represent the above one-to-many information into a relation that is built in
one-to-one relationship. The first is called Replication, where the department id is replicated a
number of times in the table like the following.
Dept- Employee
Id Code
D1 E01
D1 E02
D1 E03
D1 E04
Replication makes the same data item redundant and is an inefficient way to store data. A better
way is to use a technique called the Virtual Record. While using this, the repeating data item is
not used in the table. It is kept at a separate place. The table, instead of containing the repeating
information, contains a pointer to that place where the data item is stored.
Network DBMS
The NDBMS is built primarily on a one–to-many relationship, but where a parent-child
representation among the data items cannot be ensured. This may happen in any real world
situation where any entity can be linked to any entity. The NDBMS was proposed by a group of
theorists known as the Database Task Group (DBTG). What they said looks like this:
In NDBMS, all entities are called Records and all relationships are called Sets. The record from
where the relationship starts is called the Owner Record and where it ends is calledMember
Record. The relationship or set is strictly one-to-many.