Databases
Databases
Database terms:
Entity: physical objects like person, patient or events on which information
or data is being collected. It can also be an abstract object like a patient
record.
Attribute: individual data item within an entity; e.g. date of birth, surname.
Relationship: links between two different entities or relations (tables). E.g
A student stays at St. Augustine’s High School. The entities here becomes
student and St. Augustine’s High School, of which the relationship becomes
stays
Data Dictionary:
It is a table holding information about a database
A file (table) with description of the structure of data held in a
database
Used by managers when they modify the database.
Not visible to (used by) general users.
It maps logical database to physical storage
Allows existence checks on data to be carried out.
Stores details of data used, including the following
Name of data item (fields or variables)
Data type
Length
Validation criteria
Amount of storage required for each item
Who owns the data
Who accesses the data
Programs which uses the data
Page 1 of 36
Flat file: Data stored in a single file (table), allowing simple structuring, e.g.
spread sheet database file of student records. Data is stored in rows
representing records while columns represent fields. Thus data is stored in
a two dimensional format.
Page 2 of 36
Types of databases (Database Models)
These include relational, hierarchical, network databases and Object
oriented databases.
1. Relational databases:
- These are database that organises data in a table format, consisting of
related data in another file, allowing users to link the files.
- A table is collection of records stored using rows and column structure.
- Each table is called a relation. A relation is a table with columns and
rows.
- It only applies to logical structure of the database, not the physical
structure.
Page 3 of 36
Disadvantages
o Substantial hardware and system software overhead
o May promote “islands of information” problems
o However, it may be difficult to come up with relationships.
Database Keys:
A simple key contains a single attribute.
A composite key is a key that contains more than one attribute.
A candidate key is an attribute (or set of attributes) that uniquely
identifies a row. A candidate key must possess the following properties:
o Unique identification - For every row the value of the key
must uniquely identify that row.
o Non redundancy - No attribute in the key can be discarded
without destroying the property of unique identification.
Super key: An attribute or a set of attributes that uniquely identifies a
tuple within a relation. A super key is any set of attributes that
uniquely identifies a row. A super key differs from a candidate key in
that it does not require the non-redundancy property.
Primary key: It is a candidate key that is used to identify a unique
(one) record from a relation. A primary key is the candidate key
which is selected as the principal unique identifier. Every relation must
contain a primary key. The primary key is usually the key selected to
identify a row when the database is physically implemented. For
example, a part number is selected instead of a part description.
Foreign key: A primary key in one file that is used/found in another
file. Foreign key is set of fields or attributes in one relation that is used
to “refer” to a tuple in another relation. Thus it is a filed in one table but
also used as a primary key in another table.
Secondary Key: A field used to identify more than one record at a time,
e.g. a surname.
*NB: Attribute: A characteristic of a record, e.g. its surname, date of birth.
Entity: any object or event about which data can be collected, e.g. a
patient, student, football match, etc.
Page 4 of 36
These databases have links that are used to express relationships
between different data items.
They are based on the principle of linked lists
Data is maintained by a single input.
There is little duplication of data.
There is no duplication of inputs.
Linkages are more flexible.
Many to many relationships to records are limited
Handles more relationship types
Promotes database integrity
Ensures data independence
3. Hierarchical database:
Is a database structure in which data is held in a tree structure, indicating
different levels of files within the system.
Records are subordinates to other records in a tree structure of database.
Records at the lower level holds more details than their father records
It promotes grandfather, father, and son relationship of records as
illustrated below.
Page 5 of 36
Each father file has one or more son files.
Each son file has only one father file.
There are no cross linkages of file records.
Database security and integrity is ensured
Complex implementation as it is difficult to access all the files at one
time.
A lot of duplication exists in this type of database structure
Difficult to manage
Implementation limitations (no M:N relationship)
Page 6 of 36
Benefits
OODBMS are faster than relational DBMS because data isn’t stored in
relational rows and columns but as objects. Objects have a many to
many relationship and are accessed by the use of pointers, which will
be faster.
OODBMS is that it can be reprogrammed without affecting the entire
system.
Can handle complex data models and therefore is more superior than
RDBMS
Disadvantages
Pointer-based techniques will tend to be slower and more difficult to
formulate than relational.
Object databases lack a formal mathematical foundation, unlike the
relational model, and this in turn leads to weaknesses in their query
support.
Page 7 of 36
Components of DBMS Environment
Hardware
Can range from a Personal Computer to a network of computers where
the database is run.
Software
DBMS software, operating system, network software (if necessary) and
also the application programs.
Data
Includes data used by the organization and a description of this data
called the schema. The data in the database is persistent, integrated,
structured, and shared.
Procedures
Instructions and rules that should be applied to the design and use of the
database and DBMS. Procedures are the rules that govern the design and
the use of database. The procedure may contain information on how to log
on to the DBMS, start and stop the DBMS, procedure on how to identify the
failed component, how to recover the database, change the structure of the
table, and improve the performance.
People:
Users or people who operate the database, including those who
manage and design the database
Page 8 of 36
Recovery and archiving system: these allow data to be copied onto
backups in case of disaster.
Report writers: these are programs used to design output reports
without writing an algorithm in any programming language.
Teleprocessing monitors: Software that manages communication
between the database and remote terminals.
Objectives Of DBMS
The main objectives of database management system are data availability,
data integrity, data security, and data independence.
Data Availability
Data availability refers to the fact that the data are made available to wide
variety of users in a meaningful format at reasonable cost so that the users
can easily access the data.
Data Integrity
Data integrity refers to the correctness of the data in the database. In other
words, the data available in the database is reliable and consistent data.
Data Security
Data security refers to the fact that only authorized users can access the
data. Data security can be enforced by passwords. If two separate users are
accessing a particular data at the same time, the DBMS must not allow them
to make conflicting changes.
Page 9 of 36
Communicating With the Database
Some databases have their own computer languages. For all the data in
databases, data descriptions must be provided. Data Description
(Definition) Languages (DDL) are provided as well as the Data
Manipulation Language (DML)
*NB:
DDL- Refers to data about data (data used to describe data
(metadata)).
o It specifies the data in the database.
o It defines the structure of the tables.
o It is used to define the data tables.
o It specifies the data types of data held.
o It specifies constraints on the data.
o contains validation rules for data
DML: Language used by users to (access) retrieve data from
databases. It allows user to perform the following
o Allows storage of data in tables
o Insert new records
o Update the database
o Delete records
o Modify/edit records
o Search and retrieve data
A combination of the DDL and the DML is called a Data Sub-Language (DSL)
or a Query Language. The most common DSL is the Structured Query
Language (SQL)
Each database must have user interface, which may be in the following
Page 10 of 36
Below is an illustration of a menu driven type of interface:
1. PRINT RECORD
2. DISPLAY RECORD
3. DELETE RECORD
4. EDIT RECORD
5. MY OPTION IS: __
The user has to enter 1, 2, 3 or 4 and then press enter on the keyboard.
Advantages:
It is fast in carrying out task.
The user does not need to remember the commands by heart.
It is very easy to learn and to use.
Disadvantages:
The user is restricted to those few options available and thus is not flexible
to use.
Form-Based Interfaces
- A forms-based interface displays a form to each user.
- The form has spaces for input/insertion of data
- Insertion fields are provided together with validation checks on data
entered.
- It mirrors a hardcopy form.
- Data is entered in strict order.
- The form has explanatory notes /comments on the screen
- It also uses drop-down lists tick boxes, etc
- Each record in the database may have its own form displayed on the
screen.
- Users can fill out all of the form entries to insert new data, or they fill
out only certain entries, in which case the DBMS will retrieve
matching data for the remaining entries.
- Forms are usually designed and programmed for naive
(inexperienced) users as interfaces to canned (pre-recorded)
transactions.
- Many DBMSs have forms specification languages, special languages
that help programmers specify such forms.
- Some systems have utilities that define a form by letting the end user
interactively construct a sample form on the screen.
- Ensures that no data is missed/left un-entered.
Page 11 of 36
- It is very easy to insert validation checks/routines. (read Heathcote
for more on Form-Based interfaces and other forms of user interfaces)
Application: Ordering goods online, applying for membership online,
applying for an e-mail address online, completing postal order forms, etc. It
ensures that only the relevant information is captured/entered.
Advantages
- It saves disk storage space since there are no icons and less graphics
involved.
- It is very fast in executing the commands given once the user
mastered the commands.
- It saves time if the user knows the commands by heart.
Disadvantages
- It takes too long for the user to master all the commands by heart.
- It is less user friendly.
- More suited to experienced users like programmers.
- Commands for different software packages are rarely the same and
this will lead to mix-up of commands by the user.
Page 12 of 36
Graphical User Interfaces
A graphical interface (GUI) typically displays a schema to the user in
diagrammatic form, which can implemented using Windows, Icons, Menus
and Pointers (WIMP). It is suitable for inexperienced users. The user can
then specify a query by manipulating the diagram. In many cases, GUIs
utilize both menus and forms. Most GUIs use a pointing device, such as a
mouse, to pick certain parts of the displayed record.
Advantages of GUI
- It is faster to give commands by just clicking.
- It is easier for a novice (beginner) to use the system right away. It is
user friendly (this is an interface that is easy to learn, understand and to
use).
- There is no need for users to remember commands of the language.
- It avoids typing errors since no typing is involved.
- It is easier and faster for user to switch between programs and files.
- A novice can use the system right away.
Disadvantages of GUI
- The icons occupy a lot of disk storage space that might be used for
storage of data.
- Occupy more main memory than command driven interfaces.
- Run slowly in complex graphics and when many windows are open.
- Irritate to use for simple tasks due to a greater number of operations
needed
A relation that does not necessarily actually exist in the database but
is produced upon request, at time of request.
Contents of a view are defined as a query on one or more base
relations.
Views are dynamic, meaning that changes made to base relations that
affect view attributes are immediately reflected in the view.
Provides powerful and flexible security mechanism by hiding parts of
database from certain users.
Permits users to access data in a customized way, so that same data
can be seen by different users in different ways, at same time.
Page 13 of 36
Can simplify complex operations on base relations.
A user’s view is immune to changes made in other views.
Users should not need to know physical database storage details.
Page 14 of 36
Describes the data as seen by the application that is making use of
the DBMS
Involves identification of entity types, unique identifiers,
Logical level describes what data are stored in the database
Describes what relationships exist among those data.
Logical level describes the entire database in terms of a small
number of simple structures.
Database administrator uses the logical level of abstraction.
Page 15 of 36
Data Independence
Data independence means that programs are isolated from changes in the
way the data are structured and stored. Data independence is the immunity
of application programs to changes in storage structures and access
techniques. For example if we add a new attribute, change index structure
then in traditional file processing system, the applications are affected. But
in a DBMS environment these changes are reflected in the catalogue, as a
result the applications are not affected. Data independence renders
application programs immune to changes in the logical and physical
organization of data in the system.
Logical organization refers to changes in the Schema. Example adding a
column or tuples does not stop queries from working.
Physical organization refers to changes in indices, file organizations, etc
Page 16 of 36
- Conceptual schema changes (e.g. addition/removal of entities),
should not require changes to external schema or rewrites of
application programs
Page 17 of 36
Define, implement and control database storage.
Ensure that policies and procedures are established.
Guarantee effective production, control and use of data.
Define the strategy of backup storage and recovering from system
breakdown.
Supervise amendments to the database.
Ensures that the data is secure from unauthorised access.
To control the database environment
To standardize the use of database and associated software
To support the development and maintenance of database application
projects
To ensure all documentation related to standards and implementation
is up-to-date
Distributed Database
A logically interrelated collection of shared data (and a description of this
data), physically distributed over a computer network.
Page 18 of 36
DDBMS - characteristics
• Collection of logically-related shared data.
• Data split into fragments.
• Fragments may be replicated.
• Fragments/replicas allocated to sites.
• Sites linked by a communications network.
• Data at each site is under control of a DBMS.
• DBMSs handle local applications autonomously.
• Each DBMS participates in at least one global application.
Advantages of DDBMSs
• Reflects Organizational Structure
• Improved Sharing and Local Autonomy
• Improved Availability: A failure does not make the entire system
inoperable
• Improved Reliability: Data may be replicated
• Improved Performance: Data are local to the site of “greatest demand”
• Modular Growth: easy to add new module
Disadvantages of DDBMSs
• More complex
• Cost: Especially in system management
• Security: network must be made secure
• Integrity Control More Difficult
• Database Design More Complex: due to fragmentation, allocation of
fragments to a specific site.
(read Heathcote for more on record locking, Open Systems and ODBC, Client –
Server databases, etc)
Advantages of Databases
Reduces data duplication: Avoids repletion of same records being
stored more than once in the database. This is because records are
linked to each other allowing data stored in all tables to be used
through accessing one table
- Duplication of data means same data being stored more than once.
- This can also be termed as data redundancy. Data redundancy is a
problem in file-based approach due to the decentralized approach.
The main drawbacks of duplication of data are:
Duplication of data leads to wastage of storage space. If the
storage space is wasted it will have a direct impact on cost. The
cost will increase.
Page 19 of 36
Duplication of data can lead to loss of data integrity; the data
are no longer consistent. Assume that the employee detail is
stored both in the department and in the main office. Now the
employee changes his contact address. The changed address is
stored in the department alone and not in the main office. If
some important information has to be sent to his contact
address from the main office then that information will be lost.
Validation checks are made on data during entry thereby reducing data
entry errors.
Searching and retrieval of data is very fast.
Ensures data independence: A change in the program structure or
view does not affect data stored in tables. Data independence means
independence between application program and the data. The
advantage is that when the data representation changes, it is not
necessary to change the application program
NB: Data Dependence: This means the application program depends
on the data. If some modifications have to be made in the data, then the
application program has to be rewritten.
Improves security of data: Access to some data can be controlled
because each user has own view of data. The DBMS can use access
rights (levels) for each user when accessing data, preventing users
from seeing data not of their level. Regular backups can be made to the
data files automatically by the DBMS to alternative devices. Usernames
and passwords can be used to protect data from unauthorised access,
record locking during updating process, encryption of database, etc
Less likelihood of data getting lost.
Record structure can be easily modified if the need arises.
Files can be linked together making file updating easier and faster.
Reduces data redundancy. Redundancy means duplication of data. Data
redundancy will occupy more space hence it is not desirable as it will be
more expensive to the organisation.
Data can be secured from unauthorised access by use of passwords.
Users can share data if the database is networked. Duplication of
records is eliminated.
Ad hoc reports can be created easily.
Improves Data Integrity: refers to the correctness of data stored in
databases. Data accessed will be similar to all users, removing
contradictions caused by duplicates of records with different data
values. This is because most of the information is stored only once.
Integrity is also enhanced as data is protect from wrong/inappropriate
processing thereby leading to users trusting the correctness of data
Page 20 of 36
Sorting of records in any order is very fast
Removes data inconsistency: inconsistency means different copies of
the same record will have data with different values.
Disadvantages of databases
If the computer breaks down, you may not be able to access the data.
It is costly to initially setup the database.
Computer data can be easily copied illegally and therefore should be
password protected.
Takes time and costs to train users of the systems.
Expensive to employ a database administrator who will manage the
database
Page 21 of 36
files is faster files is slower
Better security of records is enhanced Less security of data from
unauthorised access
Promote program data independence There is program data dependence
There is centralised management of No central data management,
data which is more efficient difficult to manage and less security
However, the introduction of the computer systems means that staff would
need new skills, can lead to unemployment, people are likely to work from
home, could lead to de-skilling and some health problems will suffice. Can
you identify some of the health problems and how they can be solved or
minimised?
Page 22 of 36
commands resemble English language sentences in their construction and
use and therefore are easy to learn and understand. SQL is referred to as
nonprocedural database language. Here nonprocedural means that, when
we want to retrieve data from the database it is enough to tell SQL what
data to be retrieved, rather than how to retrieve it.
The user specifies a certain condition. The program will go through all the
records in the database file and select those records that satisfy the
condition. The result of the query will then be stored in form of a table.
In Microsoft Access, users just type the data to be searched like in the
table below:
If one wants to search students who paid $24 and the number of Subjects
as 5, he enters the following in the design view of the table query;
Page 23 of 36
The above can be written in SQL as given below in order to produce the
same result:
SELECT [ALL / DISTINCT] expr1 [AS col1], expr2 [AS col2] ;
FROM tablename WHERE condition
NB: SQL is not only used for searching records from databases. It has
commands to delete, insert, print, update, modify, sort data stored in
databases.
The three main divisions in SQL are DDL, DML, and DCL. The data definition
language (DDL) commands of SQL are used to define a database which
includes creation of tables, indexes, and views. The data manipulation
commands (DML) are used to load, update, and query the database through
Page 24 of 36
the use of the SELECT command. DML commands are usually written in
uppercase. Data control language (DCL) is used to establish user access to
the database. (Read Heathcote for more SQL, page 308)
DATABASE NORMALISATION
Page 25 of 36
Normalization stages
Page 27 of 36
notes. We now have the relations in 2NF, which will appear as
follows:
DATABASE RELATIONSHIPS
Attributes
This is a property or characteristic of an entity. Attributes are properties of
entities. In other words, entities are described in a database by a set of
attributes.
The following are example of attributes:
– Brand, cost, and weight are the attributes of CELLPHONE.
– Student number, name, and grade are the attributes of STUDENT
Page 29 of 36
Entity
An entity is something of interest to an organisation about which data is to
be held. It could be a person, place, object, event or concept about which
data is to be maintained.
Entity Type
An entity type or entity set is a collection of similar entities. Some examples
of entity types are:
– All students at NUST, say STUDENT.
– All courses at NUST, say COURSE.
– All departments at NUST, say DEPARTMENT.
An entity may belong to more than one entity type. For example, a staff
working in a particular department can pursue higher education as part-
time. Hence the same person is a LECTURER at one instance and STUDENT
at another instance.
Relationship
This is a link or association between entities. Relationship type is a
meaningful association between entity types.
The examples of relationship types are:
– Teaches is the relationship type between LECTURER and STUDENT.
– Buying is the relationship between VENDOR and CUSTOMER.
– Treatment is the relationship between DOCTOR and PATIENT
Page 30 of 36
- Relationship name is an active or a passive verb.
Types of Relationship
• One-to-one
Eg Products in a supermarket each have a unique barcode number.
• One-to-many
Eg A video club member may hire out a number of videos.
• Many to One
Many videos can be hired by one member.
• Many-to-many
Teachers and pupils in a school. Each teacher teaches many pupils
and each pupil has many teachers.
A teacher may order many books, but each book could be ordered by
many teachers.
Page 31 of 36
Thus in general, the relationship types are as follows:
Entity-Relationship Diagram
An entity-relationship diagram is a diagrammatic way of representing the
relationships between the entities in a database.
drives
Employee Company car (One-to-one)
holds
Ward Patient (One-to-many)
features
Album Singers (Many-to-many)
Example
• A hospital is organised into a number of wards.
• Each ward has a ward number and a name recorded, along with a
number of beds in that ward.
• Each ward is staffed by nurses.
• Nurses have their staff number and name recorded, and are
assigned to a single ward.
• Each patient in the hospital has a patient identification number,
and their name, address and date of birth are recorded.
• Each patient is under the care of a single consultant and is assigned
to a single ward.
• Each consultant is responsible for a number of patients. Consultants
have their staff number, name and specialism recorded.
Page 32 of 36
Many – to-Many relationships are not encouraged in E-R diagrams since
they violate the 3NF of databases. To remove M-N relationships, a link
entity is used to link entities with a M-N relationship as illustrated below:
Data security
Refers to methods of keeping data safe from various hazards and from
unauthorized access and this includes:
- Natural hazards like fire, floods, etc
- Deliberate destruction/corruption by former employees or by
terrorists.
- Illegal access to data by hackers, who may steel, amend or destroy
the data
- Accidental loss of data due to hardware failure, software failure, etc.
(Refer to Heathcote pages 105 - 109 for more on measures of ensuring
data security. Pupils must be able to describe the following in detail:
Page 33 of 36
Keeping data secure from fraudulent or malicious damage;
Password protection
User IDs and passwords
Encryption
Access rights and user permissions
Different user views
Biometric measures
Periodic backups
Antiviruses and protection measures
Audit trails
System restore and Rollback facilities
Record locking
Pupils should describe/explain concepts above.)
REVIEW QUESTIONS:
1. A garden design company keeps records of its customers. Each customer
has had a design produced for them which will be one of a library of design
types stored by the company. Each design type uses plants. Each customer
is sent an account based on the number of plants in the design.
(a) Draw an E-R (entity-relationship) diagram in third normal form, based
on this information. [10]
(b) Each delivery of plants to the garden design company is identified by a
batch number. Explain how customers who received eucalyptus trees from
batch 12 can be contacted. [4]
Page 34 of 36
4. A landscape garden company services a number of gardens. Each
GARDEN is owned by an OWNER. Each owner may have more than one
garden. Each garden has a number of PLANTS in it and each plant may be in
a number of gardens.
Draw an entity relationship (E-R) diagram to represent this data model in
third normal form and label the relationships. [10]
8. Each LEAGUE has a number of TEAMs but each TEAM is only in one
LEAGUE. Each TEAM plays at a number of GROUNDs during the season
and each GROUND will host a number of TEAMs during the season.
(i) State the relationship between LEAGUE and TEAM.
Page 35 of 36
Draw the entity-relationship (E-R) diagram to show this
relationship. [2]
(ii) State the relationship between TEAM and GROUND.
Draw the E-R diagram to show this relationship. [2]
(ii) Explain how the relationship between TEAM and GROUND can
be designed in third normal form. [4]
Page 36 of 36