Cit 208 Information Systems
Cit 208 Information Systems
GUIDE
CIT 208
INFORMATION SYSTEMS
Lagos Office
14/16 Ahmadu Bello Way
Victoria Island, Lagos
e-mail: [email protected]
URL: www.nou.edu.ng
ISBN: 978-058-392-0
ii
CIT 208 COURSE GUIDE
CONTENTS PAGE
Introduction………………………………………..……….. iv
What You Will Learn in This Course…………………..…. iv
Course Aims……………………………………………….. iv
Course Objectives………………………………………….. v
Working through This Course……………………………. v
Course Materials…………………………………………… v
Study Units………………………………………………… v
Textbooks and References………………………………… vi
Assignment File…………………………………………… vi
Presentation Schedule………………………………........... ix
Assessment ………………………………………………… ix
Tutor-Marked Assignments (TMAs)……………………… ix
Final Examinations and Grading………………………....... x
Course Marking Scheme…………………………………… x
Course Overview……………………………..…………….. xi
How to Get the Best from This Course……………………. xi
Facilitators/Tutors and Tutorials………………………….. xiii
Summary……………….……………………..…………….. xiv
iii
CIT 208 COURSE GUIDE
INTRODUCTION
This course is divided into three modules. The first module deals with
the basic introduction to the concept of Information Systems, SQL and
Database Programming with JDBC.
The third module deals with Web services, XML and database recovery.
This Course Guide gives you a brief overview of the course content,
course duration, and course materials.
The main purpose of this course is to provide the necessary tools for
designing and managing Information Systems. It makes available the
steps and tools that will enable you to make proper and accurate
decision on database designs and operations whenever the need arises.
Thus, we intend to achieve through the following.
COURSE AIMS
iv
CIT 208 COURSE GUIDE
COURSE OBJECTIVES
A number of objectives have been set out to ensure that the course
achieves its aims. Apart from the course objectives, every unit of this
course has set objectives. In the course of the study, you will need to
confirm, at the end of each unit, if you have met the objectives set at the
beginning of each unit. By the end of this course you should be able to:
COURSE MATERIALS
These include:
1. Course Guide
2. Study Units
3. Recommended Texts
4. A file for your assignments and for records to monitor your
progress.
v
CIT 208 COURSE GUIDE
STUDY UNITS
Module 1
Module 2
Module 3
vi
CIT 208 COURSE GUIDE
vii
CIT 208 COURSE GUIDE
Codd, E. F. (1970). "A Relational Model of Data for Large Shared Data
Banks." Communications of the ACM, June.
viii
CIT 208 COURSE GUIDE
ASSIGNMENT FILE
These are of two types: the Self Assessment Exercises and the Tutor-
Marked Assignments. The self assessment exercises will enable you
monitor your performance by yourself, while the Tutor-Marked
Assignment is a supervised assignment. The assignments take a certain
percentage of your total score in this course. The Tutor-Marked
Assignments will be assessed by your tutor within a specified period.
The examination at the end of this course will aim at determining the
level of mastery of the subject matter. This course includes twelve
Tutor-Marked Assignments and each must be done and submitted
accordingly. Your best scores, however, will be recorded for you. Be
sure to send these assignments to your tutor before the deadline to avoid
loss of marks.
PRESENTATION SCHEDULE
ASSESSMENT
There are two aspects to the assessment of the course. First, are the
tutor-marked assignments; second, is a written examination.
ix
CIT 208 COURSE GUIDE
At the end of the course, you will need to sit for a final three-hour
examination. This will also count for 70% of your total course mark.
Assignment questions for the units in this course are contained in the
Assignment File. You should be able to complete your assignments
from the information and materials contained in your set textbooks,
reading and study units. However, you may wish to use other references
to broaden your viewpoint and provide a deeper understanding of the
subject.
When you have completed each assignment, send it together with the
form to your tutor. Make sure that each assignment reaches your tutor
on or before the deadline given. If, however, you cannot complete your
work on time, contact your tutor before the assignment is done to
discuss the possibility of an extension.
The final examination for the course will carry 70% percentage of the
total marks available for this course. The examination will cover every
aspect of the course, so you are advised to revise all your corrected
assignments before the examination.
This course endows you with the status of a teacher and that of a learner.
This means that you teach yourself and that you learn, as your learning
capabilities would allow. It also means that you are in a better position
to determine and to ascertain the what, the how, and the when of your
language learning. No teacher imposes any method of learning on you.
The course units are similarly designed with the introduction following
the table of contents, then a set of objectives and then the dialogue and
so on.
The objectives guide you as you go through the units to ascertain your
knowledge of the required terms and expressions.
x
CIT 208 COURSE GUIDE
This table shows how the actual course marking is broken down.
Assessment Marks
Assignments 1- 4 Four assignments, best three marks of the
four count at 30% of course marks
Final Examination 70% of overall course marks
Total 100% of course marks
COURSE OVERVIEW
xi
CIT 208 COURSE GUIDE
In distance learning, the study units replace the university lecturer. This
is one of the great advantages of distance learning; you can read and
work through specially designed study materials at your own pace, and
at a time and place that suit you best. Think of it as reading the lecture
instead of listening to a lecturer. In the same way that a lecturer might
set you some reading to do, the study units tell you when to read your
set books or other material. Just as a lecturer might give you an in-class
exercise, your study units provide exercises for you to do at appropriate
points.
Each of the study units follows a common format. The first item is an
introduction to the subject matter of the unit and how a particular unit is
integrated with the other units and the course as a whole. Next is a set
of learning objectives. These objectives enable you know what you
should be able to do by the time you have completed the unit. You
should use these objectives to guide your study. When you have
finished the units you must go back and check whether you have
achieved the objectives. If you make a habit of doing this you will
significantly improve your chances of passing the course.
Remember that your tutor’s job is to assist you. When you need help,
do not hesitate to call and ask your tutor to provide it.
xii
CIT 208 COURSE GUIDE
7. Review the objectives for each study unit to confirm that you
have achieved them. If you feel unsure about any of the
objectives, review the study material or consult your tutor.
8. When you are confident that you have achieved a unit’s
objectives, you can then start on the next unit. Proceed unit by
unit through the course and try to pace your study so that you
keep yourself on schedule.
9. When you have submitted an assignment to your tutor for
marking, do not wait for its return before starting on the next unit.
Keep to your schedule. When the assignment is returned, pay
particular attention to your tutor’s comments, both on the tutor-
marked assignment form and also written on the assignment.
Consult your tutor as soon as possible if you have any questions
or problems.
10. After completing the last unit, review the course and prepare
yourself for the final examination. Check that you have achieved
the unit objectives (listed at the beginning of each unit) and the
course objectives (listed in this Course Guide).
Your tutor will mark and comment on your assignments, keep a close
watch on your progress and on any difficulties you might encounter and
provide assistance to you during the course. You must mail or submit
your tutor-marked assignments to your tutor well before the due date (at
least two working days are required). They will be marked by your tutor
and returned to you as soon as possible.
• you do not understand any part of the study units or the assigned
readings,
• you have difficulty with the self-tests or exercises,
• you have a question or problem with an assignment, with your
tutor’s comments on an assignment or with the grading of an
assignment.
You should try your best to attend the tutorials. This is the only chance
to have face to face contact with your tutor and to ask questions which
xiii
CIT 208 COURSE GUIDE
are answered instantly. You can raise any problem encountered in the
course of your study. To gain the maximum benefit from course
tutorials, prepare a question list before attending them. You will learn a
lot by participating in discussions actively.
SUMMARY
We hope that by the end of this course you would have acquired the
required knowledge to view Information Systems in a new way.
I wish you success with the course and hope that you will find it both
interesting and useful.
xiv
MAIN
COURSE
CONTENTS PAGE
Module 1 ………………………………………………… 1
Module 2 ………………………………………………… 52
MODULE 1
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Definition
3.2 Overview
3.3 History
3.4 Types of Information Systems
3.4.1 Transaction Processing Systems
3.4.2 Management Information and Reporting Systems
(MIS)
3.4.3 Decision Support Systems
3.4.4 Expert Systems
3.5 Information Systems Department
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
Having read through the course guide, you will have a general
understanding of what this unit is about and how it fits into the course as
a whole. This unit describes the general concept of Information Systems
(IS), types and its application areas.
2.0 OBJECTIVES
At the end of this unit, you should be able to:
• explain the term information system
• identify the various types of IS
• relate the history of IS
• describe IS department.
1
CIT 208 INFORMATION SYSTEMS
3.2 Overview
Structure:
2
CIT 208 MODULE 1
From prior studies and experiences with information systems there are at
least four classes of information systems:
3
CIT 208 INFORMATION SYSTEMS
These systems are designed to help mid-level and senior managers make
those difficult decisions about which not every relevant parameter is
known. These decisions, referred to as semi-structured decisions, are
characteristic of the types of decisions made at the higher levels of
management. A decision on whether or not to introduce a particular
(brand new) product into an organisation’s product line is an example of
a semi-structured decision. Another example is the decision on whether
or not to open a branch in a foreign country. Some of the parameters that
go into the making of these decisions are known. The value of a
Decision Support System (DSS) is in its ability to permit “what-if”
analyses (e.g., What if interest rates rose by 2 per cent? What if our main
competitor lowered its price by 5 per cent? What if import tariffs are
imposed/increased in the foreign country in which we do, or plan to do,
business?). That is, a DSS helps the user (decision maker) to model and
analyse different scenarios in order to arrive at a final, reasonable
decision, based on the analysis. There are decision support systems that
help groups (as opposed to individuals) to make consensus-based
decisions. These are known as Group Decision Support Systems
(GDSS).
4
CIT 208 MODULE 1
reasonable decision, rather than to actually make the decision for the
user.
4.0 CONCLUSION
5.0 SUMMARY
5
CIT 208 INFORMATION SYSTEMS
SELF-ASSESSMENT EXERCISE
6
CIT 208 MODULE 1
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Introduction to Database and Structured Query Language
(SQL)
3.2 History of SQL
3.3 Basic Categories of SQL Statements
3.4 Viewing the Structure of a Table
3.5 Writing Basic SQL Select Statement
3.6 Summary of Functions of SQL
3.7 Using SQL in Your Web Site
3.8 Relational Database Management System
3.9 Introduction to SQL Syntax
3.9.1 Database Tables
3.9.2 SQL Statements
3.9.3 SQL, DML and DDL
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
2.0 OBJECTIVES
7
CIT 208 INFORMATION SYSTEMS
• efficient
• easy to learn and use
• functionally complete(With SQL, you can define, retrieve, and
manipulate data in the tables)
Note: Most of the SQL database programmes also have their own
proprietary extensions in addition to the SQL standard!
8
CIT 208 MODULE 1
SQL was developed by IBM Research in the mid 70s and standardised
by the ANSI and later by the ISO. Most database management systems
implement a majority of one of these standards and add their proprietary
extensions. SQL allows the retrieval, insertion, updating, and deletion of
data. A database management system also includes management and
administrative functions. Most – if not all – implementations also
include a command-line interface (SQL/CLI) that allows for the entry
and execution of the language commands, as opposed to only providing
an application programming interface (API) intended for access from a
graphical user interface (GUI).
9
CIT 208 INFORMATION SYSTEMS
• DML retrieves data from the database, enters new rows, changes
existing rows, and removes unwanted rows from tables in the
database, respectively. The basic Data Manipulation Language
(DDL) includes the following:
- select statement
- insert statement
- update statement
- delete statement
- merge statement
• DDL sets up, changes and removes data structures from tables.
The basic Data Definition Language includes the following:
- create statement
- alter statement
- drop statement
- rename statement
- truncate statement
- comment statement
• DCL gives or removes access rights to both a database and the
structures within it. The basic Data Control Languages are:
- grant statement
- revoke statement
• Transaction Control manages the changes made by the DML
statements. Changes to the data can be grouped together into
logical transactions. The basic Transaction Control Languages
are:
- commit
- rollback
- save point
Using the following simple rules and guidelines, you can construct valid
statements that are both easy to read and easy to edit.
10
CIT 208 MODULE 1
The structure of any database table can be viewed by using the describe
clause of the SQL statement. The general syntax of the describe
statement is given below:
DESCRIBE table:
For the purpose of this course two tables called Departments and
Employees in the Oracle database will be used. Thus, we need to see the
structure of this table so that we will be able to familiarise ourselves
with the column used in the table. To do this, we write the query:
DESCRIBE departments:
From the table above, we can infer that departments table has 4 columns
and that 2 of these columns are not allowed to be null.
DESCRIBE employee:
11
CIT 208 INFORMATION SYSTEMS
DEPARTMENT_ID NUMBER(4)
From the table above, we can infer that employees table has 11 columns
and that 5 of these columns are not allowed to be null.
To extract data from the database, you need to use the SQL SELECT
statement. You may need to restrict the columns that are displayed.
Using a SELECT statement, you can do the following:
To build a web site that shows some data from a database, you will need
the following:
12
CIT 208 MODULE 1
RDBMS is the basis for SQL, and for all modern database systems like
MS SQL Server, IBM DB2, Oracle, MySQL, and Microsoft Access.
The data in RDBMS is stored in database objects called tables. A table
is a collection of related data entries and it consists of columns and
rows. Relational database will further be described in Module 2.
The table above contains three records (one for each person) and five
columns (P_Id, LastName, FirstName, Address, and City).
Most of the actions you need to perform on a database are done with
SQL statements.
The following SQL statement will select all the records in the “Persons”
table:
13
CIT 208 INFORMATION SYSTEMS
SQL can be divided into two parts: The Data Manipulation Language
(DML) and the Data Definition Language (DDL).
The query and update commands form the DML part of SQL:
4.0 CONCLUSION
5.0 SUMMARY
14
CIT 208 MODULE 1
SELF-ASSESSMENT EXERCISE
15
CIT 208 INFORMATION SYSTEMS
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 SQL Create Table Statement
3.1.1 SQL CREATE TABLE Syntax
3.1.2 CREATE TABLE Example
3.2 SQL SELECT Statement
3.2.1 SQL SELECT Syntax
3.2.2 An SQL SELECT Example
3.2.3 Navigation in a Result-set
3.3 The SQL SELECT DISTINCT Statement
3.3.1 SQL SELECT DISTINCT Syntax
3.3.2 SELECT DISTINCT Example
3.4 SQL WHERE Clause
3.4.1 SQL WHERE Syntax
3.4.2 WHERE Clause Example
3.4.3 Quotes around Text Fields
3.4.4 Operators Allowed in the WHERE Clause
3.5 SQL AND & OR Operators
3.5.1 AND Operator Example
3.5.2 OR Operator Example
3.6 Combining AND & OR
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
This unit will introduce you to how to write basic SQL programmes
such as creating tables, selecting a view from a table and familiarise you
with basic SQL operators.
2.0 OBJECTIVES
16
CIT 208 MODULE 1
The data type specifies what type of data the column can hold. For a
complete reference of all the data types available in MS Access,
MySQL, and SQL Server visit www.datatyperef.com
The P_Id column is of type int and will hold a number. The LastName,
FirstName, Address, and City columns are of type varchar with a
maximum length of 255 characters.
17
CIT 208 INFORMATION SYSTEMS
The empty table can be filled with data with the INSERT INTO
statement.
SELECT column_name(s)
FROM table_name
and
SELECT * FROM table_name
18
CIT 208 MODULE 1
LastName FirstName
Akinbode Ola
Okafor Chris
Amodu Ali
SELECT * Example
Now we want to select all the columns from the “Persons” table.
19
CIT 208 INFORMATION SYSTEMS
In a table, some of the columns may contain duplicate values. This is not
a problem; however, sometimes you will want to list only the different
(distinct) values in a table.
Now we want to select only the distinct values from the column named
“City” from the table above.
City
Lagos
Kaduna
20
CIT 208 MODULE 1
The WHERE clause is used to extract only those records that fulfill a
specified criterion.
SELECT column_name(s)
FROM table_name
WHERE column_name operator value
Now we want to select only the persons living in the city “Sandnes”
from the table above.
SQL uses single quotes around text values (most database systems will
also accept double quotes).
21
CIT 208 INFORMATION SYSTEMS
This is correct:
SELECT * FROM Persons WHERE FirstName='Chris'
This is wrong:
SELECT * FROM Persons WHERE FirstName=Chris
This is correct:
SELECT * FROM Persons WHERE Year=1965
This is wrong:
SELECT * FROM Persons WHERE Year='1965'
Operator Description
= Equal
<> Not equal
> Greater than
< Less than
>= Greater than or equal
<= Less than or equal
BETWEENBetween an inclusive
range
LIKE Search for a pattern
IN If you know the exact
value you want to return
for at least one of the
columns
Note: In some versions of SQL the <> operator may be written as !=
The AND & OR operators are used to filter records based on more than
one condition.
The AND operator displays a record if both the first condition and the
second condition is true while the OR operator displays a record if either
the first condition or the second condition is true.
22
CIT 208 MODULE 1
Now we want to select only the persons with the first name equal to
“Tove” AND the last name equal to “Svendson”:
Now we want to select only the persons with the first name equal to
“Tove” OR the first name equal to “Ola”:
You can also combine AND and OR (use parenthesis to form complex
expressions).
23
CIT 208 INFORMATION SYSTEMS
Now we want to select only the persons with the last name equal to
“Svendson” AND the first name equal to “Tove” OR to “Ola”:
SELF-ASSESSMENT EXERCISE
4.0 CONCLUSION
5.0 SUMMARY
SELF-ASSESSMENT EXERCISE
24
CIT 208 MODULE 1
25
CIT 208 INFORMATION SYSTEMS
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 The ORDER BY Keyword
3.1.1 SQL ORDER BY Syntax
3.1.2 ORDER BY Example
3.2 SQL INSERT INTO Statement
3.2.1 SQL INSERT INTO Syntax
3.2.2 SQL INSERT INTO Example
3.3 SQL UPDATE Statement
3.3.1 SQL UPDATE Syntax
3.3.2 SQL UPDATE Example
3.4 SQL DELETE Statement
3.4.1 SQL DELETE Syntax
3.4.2 SQL DELETE Example
3.4.3 Delete All Rows
3.5 JOINING Tables
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
This unit introduces you to how to write basic SQL programmes such as
creating tables, selecting a view from a table and familiarise you with
basic SQL operators.
2.0 OBJECTIVES
26
CIT 208 MODULE 1
• SELECT expr
• FROM table
• [WHERE condition(s)]
• [ORDER BY{column, expr} [ASC|DESC]
• In the syntax,
• ORDER BY specifies the order in which the retrieved rows
are displayed
• ASC specifies rows in ascending order (this is the
default value)
• DESC order the rows in descending order
• numeric values are displayed with the lowest value first e.g 1-999
• date values are displayed with the earliest value first e.g 01-JAN-
92 before 01-JAN-95
• character values are displayed in alphabetical order
• null values are displayed last for ascending sequences and first
for descending sequences.
ORDER BY last_name:
27
CIT 208 INFORMATION SYSTEMS
The INSERT INTO statement is used to insert new records into a new
row in a table.
E.G
28
CIT 208 MODULE 1
Example
INSERT INTO employees (employee_id, first_name, last_name, email,
phone_number, hire_date, job_id,salary, commission_pct, manager_id,
department_id )
UPDATE table_name
SET column1=value, column2=value2,...
WHERE some_column=some_value
Note: Notice the WHERE clause in the UPDATE syntax. The WHERE
clause specifies which record or records that should be updated. If you
omit the WHERE clause, all records will be updated!
29
CIT 208 INFORMATION SYSTEMS
DELETE *
FROM employees;
DELETE FROM employees
Where department_id =60;
It is possible to delete all rows in a table without deleting the table. This
means that the table structure, attributes, and indexes will be intact:
Note: Be very careful when deleting records. You cannot undo this
statement!
The select statement can be used to join two tables together. It can be
used to extract part of Table A and part of Table B to form Table C. For
example, assuming student and studentclass are two different tables. Let
us look at this instruction:
This statement shows that SID, name are columns or fields from student
table and classname and SID are also columns from studentclass table.
The fields in the new table to form by this instruction are:
SID name classname
SELF-ASSESSMENT EXERCISE
i. Write the SQL statement to delete two rows from student, Name
and grade = 56?
ii. What is the syntax to arrange the element of table in Ascending
and Descending order?
30
CIT 208 MODULE 1
4.0 CONCLUSION
5.0 SUMMARY
31
CIT 208 INFORMATION SYSTEMS
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Arithmetic Operations
3.1.1 Using Arithmetic Operators
3.1.2 Operator Precedence
3.1.3 Defining a Null Value
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
2.0 OBJECTIVES
32
CIT 208 MODULE 1
Operator Description
+ Add
- Subtract
* Multiply
/ Divide
This gives
33
CIT 208 INFORMATION SYSTEMS
LastName Salary12*Salary+100
Akinbode 4800 57700
Okafor 17000 204100
Amodu 12000 144100
Buba 9000 108100
Ngozi 7700 92500
Sowale 24000 288100
LastName Salary12*(Salary+100)
Akinbode 4800 58800
Okafor 17000 205200
Amodu 12000 145200
Buba 9000 109200
Ngozi 7700 93600
Sowale 24000 289200
34
CIT 208 MODULE 1
LAST_NAME 12*SALARY*COMMISSION_PCT
Akinbode
Okafor
Amodu
Buba
Ngozi
Sowale
4.0 CONCLUSION
In this unit, you have learnt how to write basic SQL statements, using
operators in SQL, how to use the SQL ORDER statement, to arrange a
group of data. Also the SQL INSERT statement was explained,
including how to update and delete rows in a table.
5.0 SUMMARY
What you have learned in this unit concerns:
• writing basic SQL statements
• ordering a group of data using the ORDER BY statement
• updating, inserting and deleting rows in a table using SQL
statements
35
CIT 208 INFORMATION SYSTEMS
36
CIT 208 MODULE 1
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Introduction to Database
3.1.1 Definition of Database
3.1.2 Classification of Database
3.1.3 Database Management Systems
3.1.4 Relational Database Model
3.2 Database Objects and Constraints
3.2.1 Definition of SQL
3.2.2 SQL Statements
3.2.3 Database Objects
3.2.4 Constraints
3.3 Database Programming
3.3.1 Database Programming in Java Using JDBC
3.3.2 Accessing the Database Using JDBC Step by Step
3.3.3 Using JDBC in the Real World
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
2.0 OBJECTIVES
37
CIT 208 INFORMATION SYSTEMS
38
CIT 208 MODULE 1
A relational database:
39
CIT 208 INFORMATION SYSTEMS
3.2.3 Constraints
• NOT NULL: This specifies that the column cannot contain a null
value
• UNIQUE: This specifies that a column or combination of
columns whose values must be unique for all rows in the table.
• PRIMARY KEY: This uniquely identifies each row of the table
• FOREIGN KEY: This establishes and enforces a foreign key
relationship between the column and a column of the referenced
table.
• CHECK: This specifies a condition that must be true.
40
CIT 208 MODULE 1
JDBC has been with the Java Standard Edition (JSE) from version 1.1.
The latest version is 4 and is being shipped with JSE 6. Regardless of
the version, JDBC supports four types of implementations or drivers.
They are:
41
CIT 208 INFORMATION SYSTEMS
A Type II Driver uses the native API of the target database server to
communicate with the server. Hence it is known as a Native Protocol
Driver as well as a Partly Java Partly Native Driver. This Driver doesn't
contain pure Java code as it uses the client-side API provided by the
target database server. To call the client-side API of the database, it uses
JNI. However, since it does not have the overhead of calling ODBC, a
Type II Driver is faster than a Type I. Also, by using a Type II Driver,
one can access functionalities that are specific to the database server
which is being used.
A Type III Driver is also known as Network Protocol Driver. Type III
Drivers target the middleware. The middleware then communicates with
the database server. In essence, Type III Drivers are like Type I with the
exception that Type III Drivers are completely written in Java and use
the network protocol of the middleware instead of ODBC API. Type III
Drivers are more secure since middleware is in the picture. In a nutshell,
in Type III Drivers the conversion logic is at the middleware level and
not at the client-side.
The choice of which driver to use depends on the type of application that
is being developed. For example, if the application is web-based, the
best option is Type IV as it releases the application server from being a
42
CIT 208 MODULE 1
The best part of using JDBC for database programming is that if one has
the required type of driver, regardless of the database server, the steps to
connect and query the database remain more or less the same. The steps
to access database for a typical relational database server include:
All of these steps are the same for any database, be it Oracle or MySQL.
The only change comes in the query to be passed in step four. Here are
the details.
The driver, regardless of type, can be loaded in one of two ways: using
the Class loader, or explicitly creating the instance. The difference
between them, apart from how the driver is instantiated, is whether the
Driver has to be registered explicitly or not.
• Class.forName("sun.jdbc.odbc.JdbcOdbcDriver").newInstan
ce();
• Explicitly creating the instance
• The second way to load a driver is to instantiate it explicitly using
the new operator. This is similar to that of creating a new instance
of any class. However, when the driver is being explicitly
43
CIT 208 INFORMATION SYSTEMS
instantiated, one will have to register the driver with the runtime
environment using the register() method of the DriverManager
class. For example to load Type I Driver, the statements would
be:
• Driver driver=new sun.jdbc.odbc.JdbcOdbcDriver();
• DriverManager.register(driver);
• Or the statements can be merged as:
• DriverManager.register(new
sun.jdbc.odbc.JdbcOdbcDriver());
Once the driver is loaded and registered, the next step is to get a
connection.
Creating a Connection
Once the driver has been loaded and registered, the next step is creating
a connection with the database server. The connection is created when
one creates an instance of Connection. To get an instance of Connection,
the getConnection() method of the DriverManager class has to be called.
In reality, Connection is an interface and when getConnection() is
called, the DriverManager provides an instance of the proper
implementing class to a reference variable of Connection. There are
three forms of the getConnection() method which are:
44
CIT 208 MODULE 1
• where odbc is the subprotocol and test is the DSN which points to
the database to connect to. The next step is to create a statement
object.
Statement is the simplest type that represents a simple query. Its object
can be instantiated using any of the following forms of the
createStatement() method of the Connection interface:
• createStatement() -
• Returns a Statement object with default concurrency conditions.
• createStatement(int resultSetType, int resultSetConcurrency)-
ResultSet.TYPE_SCROLL_SENSITIVE,
ResultSet.CONCUR_UPDATABLE);
PreparedStatement conserves resources. Whenever a query is sent to
the database server, it goes through four steps: parsing the query,
compiling the query, linking and executing the query. When a statement
object is used to execute a query all four steps are repeated again and
45
CIT 208 INFORMATION SYSTEMS
• prepareStatement(String query)-
This form is similar to the first form with the added options of
specifying whether ResultSets are scrollable and updatable or not. The
values for the two parameters are the same as those described in the
Statement section.
• prepareCall(String query)-
This returns a CallableStatement object that can be used to
execute a procedure or function, which is passed as the query.
The query is of the form "{sum(?,?)}" where sum is the
function/procedure to be called.
• prepareCall(String sql, int resultSetType, int
resultSetConcurrency)-
46
CIT 208 MODULE 1
To get a ResultSet which is both updatable and scrollable, this form can
be used. The resultSetType and resultSetConcurrency are same as that
used with prepareStatement().
The rows retrieved by the execution of a SQL query are given back by
JDBC in the form of a ResultSet object. A ResultSet contains all the
rows retrieved by a query. To retrieve a ResultSet object, one can call
the executeQuery() method of the Statement object. If the Statement
object is of the type PreparedStatement, then executeQuery() without
any argument needs to be called. If it is of the type Statement, then a
SQL query will have to be passed to the method. For example, to
retrieve a ResultSet from a Statement for the query "Select * from user",
the code would be
For example, if the user table has a column named "name," then the
statements to retrieve the values for the "name" column would be
47
CIT 208 INFORMATION SYSTEMS
• while(result.next){
• System.out.println(result.getString("name));
• }
So, here is the GenericDAO class. It accepts the driver class and URL to
connect to as constructor arguments along with the user name and
password.
package jdbctest;
import java.sql.Connection;
import java.sql.DriverManager;
import java.sql.SQLException;
import java.sql.Statement;
public class GenericDAO
{
Connection connection;
Statement statement;
public GenericDAO()
{
connection=null;
statement=null;
}
public GenericDAO(String driverClass,String connectionURL,String
user,String password)
{
try
{
Class.forName(driverClass).newInstance();
connection=DriverManager.getConnection(connectionURL,user,passwo
rd);
statement=connection.createStatement();
}
catch (InstantiationException e)
{
48
CIT 208 MODULE 1
e.printStackTrace();
}
catch (SQLException e)
{
e.printStackTrace();
}
catch (IllegalAccessException e)
{
e.printStackTrace();
}
catch (ClassNotFoundException e)
{
e.printStackTrace();
}
}
public void setStatement(Statement statement)
{
this.statement = statement;
}
public Statement getStatement()
{
return statement;
}
}
Next is the DataOp class. It has one method that operates on the user
table. This class is not generic.
package jdbctest;
import java.sql.ResultSet;
import java.sql.SQLException;
import java.sql.Statement;
import java.util.ArrayList;
import java.util.List;
public class DataOp
{
Statement statement;
public DataOp(Statement statement)
{
this.statement=statement;
}
public List getUserList(String user)
{
List list=new ArrayList();
try
{
ResultSet result=statement.executeQuery("Select * from user where
user_id='"+user+"'");
49
CIT 208 INFORMATION SYSTEMS
while(result.next())
{
list.add(result.getString(1));
}
}
catch (SQLException e)
{
e.printStackTrace();
list=null;
}
return list;
}
}
Last is the class that tests the GenericDAO and DataOp classes. Here we
are passing the driver name corresponding to Type IV of MySQL JDBC
driver and the corresponding URL.
package jdbctest;
}
}
That completes a basic application.
4.0 CONCLUSION
50
CIT 208 MODULE 1
5.0 SUMMARY
51
CIT 208 INFORMATION SYSTEMS
MODULE 2
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Conceptual Models
3.1.1 Mapping EER to Relational Data Model
3.2 Schema Design
3.2.1 Database Schema Design
3.2.2 Consideration for Schema Design
3.2.3 Schema Building Blocks
3.3 Database Relationships
3.3.1 Relationship and Relationship Type
3.3.2 Enhanced ER Data Model
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
52
CIT 208 MODULE 2
2.0 OBJECTIVES
Conceptual model is a term that has been used for a long time in
database design. It has long been the practice in IT to describe a large
system in terms of a set of interacting modules. If you can describe what
each module does and describe how they interact with each other, you
have a high level description of the system. Furthermore, if you describe
each module in terms of sub-modules each interacting with each area,
you have now a more detailed description of the system. Thus, arises the
concept of having being able to zoom into parts of the system in more
and more detail and being able to zoom out to see a wider and wider part
of the system.
53
CIT 208 INFORMATION SYSTEMS
External and conceptual schemas are designed in the EER data model.
Since there is no commercially available DBMS based on EER data
model, and since most modern DBMS are based on relational data
model, conceptual schema has to be mapped into the relational data
model. Conversion is done using a mapping algorithm. The mapping
algorithm, we are going to consider, contains seven steps
54
CIT 208 MODULE 2
55
CIT 208 INFORMATION SYSTEMS
56
CIT 208 MODULE 2
57
CIT 208 INFORMATION SYSTEMS
S={
• Department ({DeptId, DeptName }, {DeptId }),
• Student ({StudId, StudName }, {StudId }),
• Course ({DeptId, CourNo, CourName }, {DeptId +CourNo })
• Exam ({DeptId, CourNo, StudId, Grade }, {DeptId +CourNo
+,StudId })
• Lecturer ({DeptId, LectNo, LectName, HireDate },{DeptId +
LectNo })
}
DeptId DeptNa
1 Posses
Lecture Department s
r
58
CIT 208 MODULE 2
For example, consider the use case text: “The item is given an
identification number.” We have two nouns: item and identification
number. “Item”—has the verb “is given” and is a candidate for a
database table. “Identification number”—no related verb, it will
probably be a column in the “Item” table.
59
CIT 208 INFORMATION SYSTEMS
The schema design for a database affects its usability and performance
in many ways, so it is important to make the initial investment in time
and research to design a database that meets the needs of its users. This
section is not intended to provide a detailed guide to database design,
but only to present some ideas to consider in designing a database.
Identify the main processes of the business; for example, taking orders
for the product, filling out insurance claims, or tracking promotions.
These processes are different for every business, but they must be
clearly identified and defined in order to create a useful database. The
people who know the processes are the people who work in the business,
and interviews are essential to determine these processes.
The database should reflect the business, both in what it measures and
tracks and in the terminology used to describe the facts and dimensions
of the business. Interviews with managers and users will reveal what
they want to know, how they measure the business, what criteria they
use to make decisions, and what words they use to describe these things.
60
CIT 208 MODULE 2
This information helps determine the contents of the fact and dimension
tables.
The data to populate the tables in the database must be complete enough
to be useful and must be valid, consistent data. An analysis of the
proposed input data and its sources will reveal whether the available
data can support the proposed schema.
Facts are usually numerical and continuous values; for example, revenue
or inventory. Facts that are additive can be summed to produce valid
measures in reports. For example, sales for each month are additive and
can be summed to produce year-to-date totals. Month-end inventory
balances, however, are not additive in the sense that a yearly total of
month-end inventory balances is of dubious value, but a monthly
average might be meaningful.
Facts that are measured with different dimensions or use different timing
should be stored in separate tables. For example, a single database can
be used for orders, shipments, and manufacturing. Although the facts
measured in each area of the business are different, they share some but
not all of the same dimensions.
61
CIT 208 INFORMATION SYSTEMS
A schema can be a star schema with one fact table and one dimension
table.
62
CIT 208 MODULE 2
A schema can be a star schema with one fact table and several
dimension tables.
A schema can be a multiple star schema, with a family of fact tables that
share some, but not necessarily all, dimension tables.
63
CIT 208 INFORMATION SYSTEMS
A schema can be a star schema with a fact table that contains multiple
foreign keys that reference single dimension tables.
This example illustrates how the schema design affects both usability
and usefulness of the database.
The salad dressing database has one fact table, sales, and three
dimension tables: Product, Week, and Market, as illustrated in the
following figure.
64
CIT 208 MODULE 2
Each record in the Sales fact table contains a field for each of the three
dimensions: Product, Period, and Market. The columns in the Sales table
containing these fields are the foreign keys whose concatenated values
give each row in the Sales table a unique identifier. Sales also contain
seven additional fields that contain values for measures of interest to
market analysts.
Grade Year
65
CIT 208 INFORMATION SYSTEMS
Recursive Relationships
66
CIT 208 MODULE 2
Employee
4.0 CONCLUSION
In this unit you have been given an insight into high-level modelling of
database structures and the processes involved in designing a schema
that models an arbitrary problem domain as precisely as possible, you
were also introduced to the various database relationship types and
modelling using enhanced ER Data.
5.0 SUMMARY
• A conceptual model represents 'concepts' (entities) and
relationships between them.
• A relationship is an association between two or more entities. A
relationship can be represented by combining representations of
associated entities and properties of their association.
• Each relationship type can be mapped as a separate relation
schema, but it is considered to be a good practice to map it as:
- A separate relation schema in the case of M: N (and 1:1)
cardinality ratios, and
- By primary key propagation in the other cases
• The relation schema that gets the propagated primary key as the
foreign key, simultaneously represents an entity type and a
relationship type
• There are three possible ways to map a IS-A hierarchy:
- Each class as a separate relation schema, with subclass
relation schemas inheriting super class primary key,
- Only subclasses as separate relation schemas that inherit
all the super class attributes, and
- The super class and all the subclasses map into one
relation schema
• Each set of mutivalued attributes is mapped into separate relation
schema
• EER data model is introduced to provide more semantic power to
UoD modelling
• EER introduces a number of new modelling constructs and a
diagrammatic technique
• Set of similar UoD entities is represented by an entity type
• Set of associations between two or more entities is represented by
a relationship type
67
CIT 208 INFORMATION SYSTEMS
68
CIT 208 MODULE 2
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Functional Dependencies
3.2 Classification of Functional Dependencies
3.2.1 Fully Functional Dependency
3.2.2 Partial Functional Dependency
3.2.3 Transitive Functional Dependency
3.3 Properties of Functional Dependencies
3.4 Closure of a Set of Functional Dependencies
3.4.1 Algorithm to Determine Closure
3.5 Keys
3.5.1 Super Key
3.5.2 Primary Key
3.5.3 Candidate Key
3.5.4 Secondary Key
3.5.5 Alternative Key
3.5.6 Keys Example
3.6 Database Normalisation
3.6.1 Example
3.7 Decomposition
3.7.1 Lossless-Join Decomposition
3.7.2 Decomposition into BCNF
3.7.3 Decomposition into 3NF
3.8 Minimal Cover for a Set of Functional Dependencies
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
69
CIT 208 INFORMATION SYSTEMS
2.0 OBJECTIVES
For example, suppose one is designing a system to track vehicles and the
capacity of their engines. Each vehicle has a unique vehicle
identification number (VIN). One could write VIN→ Enginecapacity
because it would be inappropriate for a vehicle’s engine to have more
than one capacity. However, Enginecapacity→ VIN is incorrect because
there could be many vehicles with the same engine capacity.
70
CIT 208 MODULE 2
71
CIT 208 INFORMATION SYSTEMS
1. Let C→ CA
2. Let the next dependency be A→B. If A is in CA and B is not,
then C→CA +B
3. Continue step 2 until no new attributes can be added to CA.
• we first initialize A+ =A
• because A→ B, C add BC to A+
72
CIT 208 MODULE 2
Therefore A+ = A, B, C, D, F
3.5 Keys
There are basically five kinds of keys which are sets of attributes of a
relation functionally depending on one or more attributes of the relation:
A primary key can be said to be a super key that also serves as part of
the determinants in the functional dependencies.
A secondary key is a key that is part of the candidate key and not part of
the primary key.
An alternative key is a key that is part of the super keys and not part of
the candidate keys.
73
CIT 208 INFORMATION SYSTEMS
Find
• all the super key(s)
• primary key(s)
• candidate key(s)
• secondary key(s)
• alternative key(s)
Solution:
The next thing to do now is to determine the closure for each attribute:
• A+ =A
• B+ =BC, BCD B→ C, B→D
• C+ =C
• D+ =D
• AB+ =ABC, ABCD AB→ C, AB→D
• AC+ =AC
• AD+ =AD
• BC+ =BCD BC→ D
• BD+ = BCD BD→ C
• CD+ =CD
• ABC+ = ABCD ABC→ D
• ABD+ = ABCD ABD→C
• ACD+ =ACD
• BCD+ =BCD
• ABCD+ =ABCD
74
CIT 208 MODULE 2
Three norm forms were initially proposed called First norm form (INF),
Second norm form (2NF) and Third norm form (3NF). Subsequently
R.Boyce and E.F. Codd introduced a stronger definition of 3NF called
Boyce Codd norm form (BCNF).With the exception of INF, all these
norm forms are based on functional dependencies among the attributes
of a table. Higher norm forms that go beyond BCNF were introduced
later such as 4NF and 5NF. However, these later norm forms deal with
situations that are very rare.
3.6.1 Example
75
CIT 208 INFORMATION SYSTEMS
Solution:
3.7 Decomposition
We begin with the relation example from above. This relation has
attributes ABCD and two FDs: B→C and B→ D .Assuming B is not a
key and D is not part of any key. The second FD causes a violation of
3NF.
Our decision to decompose ABCD into ABC and BD, rather than say
AB and BC was just a good guess. It was guided by the observation that
the dependency B→ D caused the violation of 3NF; the most natural
way to deal with this violation is to remove the attribute D from the
schema. To compensate for removing D from the main schema, we can
add a relation BD because each B value is associated with at most one D
value according to the FD: B→ D
76
CIT 208 MODULE 2
R-A denotes the set of attributes other than A in R, and XA denotes the
union of attributes in X and A. Since X→ A violates BCNF, it is not a
77
CIT 208 INFORMATION SYSTEMS
78
CIT 208 MODULE 2
Note that the order in which we consider FDs while applying these steps
could produce different minimal covers, there could be several minimal
covers for a given set of FDs.
SELF-ASSESSMENT EXERCISE
79
CIT 208 INFORMATION SYSTEMS
4.0 CONCLUSION
5.0 SUMMARY
i. Super keys
ii. Primary keys
iii. Candidate keys
iv. Alternative keys
v. Secondary keys
80
CIT 208 MODULE 2
REGULAR EXPRESSIONS
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 General Introduction
3.2 Regular Expressions
3.3 Elements /Metacharacters of Regular Expressions
3.3.1 Classes
3.3.2 Range Operator
3.3.3 Class Repetition Operators
3.3.4 Backslash Operator
3.3.5 Repetition Operator’s Specific Characteristics
3.3.6 Class Denying
3.3.7 The Period
3.3.8 Alternacy Operator
3.3.9 Anchors
3.3.10 Groups
3.3.11 Question Mark
3.4 Regular Expression Engines
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
2.0 OBJECTIVES
81
CIT 208 INFORMATION SYSTEMS
Their name comes from the mathematical theory on which they are
based. In writings, it is often abbreviated to regex or regexp. In this unit,
regex is used because it is easy to pronounce the plural “regexes”.
Think about having a web page with a form with the following fields:
• Name
• Surname
• E-mail
• Phone number
Once you have filled in the format and sent the data to the script, it is
very important to check if they are correct.
82
CIT 208 MODULE 2
• name: [a-zA-Z]*
• surname: [a-zA-Z’ ]+
• email: [a-zA-Z0-9_\.]+@[a-zA-Z0-9-]+\.[a-zA-Z]{0,4}
• phone number: [0-9]+\-[0-9]+
3.3.1 Classes
The first one we are going to analyze is the star *. It is the one that can
verify how many times a class is repeated inside a string and to select
the entire consecutive occurrence. For example, the following regular
expression [a-z]* selects in a string all the consecutive occurrence of
83
CIT 208 INFORMATION SYSTEMS
Very similar to the star is the plus + operator that works in the same
way, but it verifies if a class it is repeated inside a string one or more
times. This operator considers an empty set as a negative solution.
Now you can understand the regex that we used to verify the email:
• [a-zA-z0-9_\.]+@[a-zA-Z0-9-]+\.[a-zA-Z]{0,4}
• <.+>
If this operation does not satisfy our demand we need to use one of the
following methods:
• <.+?>
• <[ ^<>]+>
The first one makes the repetition operator less strong and it makes it
stop in the first part of the closing character.
84
CIT 208 MODULE 2
• [^\.]+
The former regex individuates all the 4 characters sequences that start
with c followed by any characters and then followed by a and s. It
creates different combinations such as:
• case
• cosa
• cose
• c%s9
• c£sl
This operator is in form of a pipe | which has the same function of the
OR. For example, the regex george|stuart individuates inside a string
the word george or the word stuart.
3.3.9 Anchors
85
CIT 208 INFORMATION SYSTEMS
If one day one of the friends was banned from expense, his data would
not be useful anymore and could be necessary to remove them. If there
were thousands data the regex would be the fastest solution. If the data
of the banned friend is the ones in the third column, the fastest solution
to remove them would be to eliminate the entire occurrence in the
following regex:
• ,[0-9]*€$
The $ character does not identify any characters, but a position, the end
of a line. Therefore the former regex finds all the consecutive characters
series that start with a comma followed by some numbers, followed by
the €, followed by the ending of a line.
3.3.10 Groups
• ([0-9]{5}[a-zA-Z])+$
86
CIT 208 MODULE 2
In the groups the question mark can be used to avoid the match
memorisation. We have already seen that question mark could be used
to restrict the repetitions. Now we will see that there exist many
different functions for this simple character.
The first function makes a group optional, as you can see in the
following example:
• michael (owen)?
In the former regex the group (owen) is made optional and therefore it
will be possible to select both the simple occurrence of the word
michael and the occurrence of the word couple michael owen.
The second function is being an anchor. The question mark can also be
used in the groups as a keeper, to individuate it as a position inside the
text. Example:
• michael(?=owen)
The former regex selects the word michael in a text only if it is followed
by the group (owen) that will not be selected.
You can also use the question mark to individuate the absence of a
position. For example the following function selects the word michael
only if it is not followed by the group (owen):
• michael(?!owen)
87
CIT 208 INFORMATION SYSTEMS
languages like PHP), the .NET regular expression library, and the
regular expression package included with version 1.4 and later of the
Java JDK. There are certain important differences in regex flavor.
4.0 CONCLUSION
In this unit, you learnt about regular expressions which are search
patterns to match characters in a string, their metacharacters and
elements, we also considered regular expressions search engines. You
can also transform characters or strings to their regular expression
equivalence.
5.0 SUMMARY
88
CIT 208 MODULE 2
89
CIT 208 INFORMATION SYSTEMS
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Introduction to Query Language
3.2 Relational Algebra
3.3 Operations in Relational Algebra
3.3.1 Selection Operator
3.3.2 The Projection Operator
3.3.3 The Union Operator
3.3.4 The Set Difference Operator
3.3.5 The Cartesian product Operator
3.4 Additional operations in Relational Algebra
3.5 Relational Algebra Expressions
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
2.0 OBJECTIVES
90
CIT 208 MODULE 2
91
CIT 208 INFORMATION SYSTEMS
Type: unary
Symbol: Greek letter sigma, σ
General form: σ(predicate)(relation instance)
Schema of result relation: same as operand relation
Size of result relation (tuples): ≤ |operand relation |
Examples:
• σ(major = “CS”)(students)
• σ(major = “CS” and hair-color = “brown”)(students)
• σ(hours-attempted > hours-earned)(students)
The select operation selects tuples from a relation instance which satisfy
a specified predicate.
92
CIT 208 MODULE 2
Type: unary
Symbol: Greek letter pi, π
General form: π(attribute-list)(relation instance)
Schema of result relation: specified by <attribute-list>
Size of result relation (tuples): ≤| operand relation|
Examples:
93
CIT 208 INFORMATION SYSTEMS
Type: binary
Symbol: union symbol, ∪
General form: r ∪s, where r and s are union compatible
Schema of result relation: schema of operand relations
Size of result relation (tuples): ≤ max {|r|+|s|}
Examples:
94
CIT 208 MODULE 2
Type: binary
Symbol: −
General form: r − s, where r and s are union compatible
Schema of result relation: schema of operand relation
Size of result relation (tuples): ≤ |relation r|
Examples: r − s
95
CIT 208 INFORMATION SYSTEMS
Type: binary
Symbol: ×
General form: r × s (no restrictions on r and s)
Schema of result relation: schema r × schema s with renaming
Size of result relation (tuples) : >|relation r |and >|relation s|
Examples: r×s
The Cartesian product operation allows for the combining of any two
relations into a single relation. Recall that a relation is by definition a
subset of a Cartesian product of a set of domains, so this gives you some
idea of the behavior of the Cartesian product operation.
96
CIT 208 MODULE 2
97
CIT 208 INFORMATION SYSTEMS
OUTER JOIN
Introduced to insert those tuples that don't match, or contain null values
for join attributes into join relation
• Notation:
98
CIT 208 MODULE 2
Example
99
CIT 208 INFORMATION SYSTEMS
Example Query 1:
Find the names of all the students who are Computer Science majors.
Approach:
Example Query 2:
Find the student-num (s#) and name of all the students who have
completed more than 90 hours.
Approach:
• First select all of the students who have completed more than 90
hours.
• r = σ(hours_completed > 90)(S)
• Next project the student-num and name attributes from the
previous result.
• result = π(s#, name)(r)
100
CIT 208 MODULE 2
Example Query 3:
Find the names of all those students who are less than 20 years old who
have completed more than 80 hours.
Approach:
• First select all of the students who have completed more than 80
hours and are less than 20 years old.
• r = σ((hours_completed > 80) AND (age < 20))(S)
• Next project the name attribute from the previous result.
• result = π(name)(r)
• Complete Query Expression:
• result = π(name)(σ((hours_completed > 80) AND (age < 20))(S))
Example Query 4:
Find the names of all the courses that are offered by either Computer
Science or Physics.
Approach:
Example Query 5:
Find the names of all the students who took a course in the Fall 2006
term that was taught by a professor who had more than 20 years of
teaching experience.
Approach:
101
CIT 208 INFORMATION SYSTEMS
Example Query 6:
Find the names of all the professors who are either in the Computer
Science department or have more than 20 years of teaching experience.
4.0 CONCLUSION
SELF-ASSESSMENT EXERCISE
Using the sample database given under 3.5, write query expressions to:
i. Find the name of the professor who taught a course in the Fall
2006 term
ii. Find the student numbers for those students who were enrolled
only in the spring 2007 term.
5.0 SUMMARY
102
CIT 208 MODULE 2
103
CIT 208 INFORMATION SYSTEMS
MODULE 3
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 What is Web Service?
3.1.1 Web Service Security
3.1.2 Web Service Reliability
3.1.3 Web Services Transaction
3.2 Applications of Web Services
3.2.1 Remote Procedure Calls (RPC)
3.2.2 Service-Oriented Architecture
3.2.3 Representational State Transfer
3.3 Web Services Framework
3.4 Web Services Architecture
3.4.1 Purpose of Web Services Architecture
3.4.2 Agent and Services
3.4.3 Requesters and Providers
3.4.4 Service Description
3.4.5 Semantics
3.4.6 Overview of Engaging a Web Service
3.5 Concepts and Relationships
3.5.1 Introduction
3.5.2 How to Read This Section
3.5.3 Concepts
3.5.4 Relationships
3.5.5 Concept Maps
3.5.6 Model
3.5.7 Conformance
3.5.8 The Architectural Models
3.5.9 Message-Oriented Model
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
104
CIT 208 MODULE 3
1.0 INTRODUCTION
The current set of web service specifications defines protocols for web
service interoperability.
2.0 OBJECTIVES
Web Service Security defines how to use XML Encryption and XML
Signature in SOAP to secure message exchanges, as an alternative or
extension to using HTTPS to secure the channel.
105
CIT 208 INFORMATION SYSTEMS
Web services are a set of tools that can be used in a number of ways.
The three most common ways of use are Remote procedure calls (RPC),
SOA and REST.
The first web services tools were focused on RPC, and as a result this
style is widely deployed and supported. However, it is sometimes
criticised for not being loosely coupled, because it was often
implemented by mapping services directly to language-specific
functions or method calls.
SOA web services are supported by most major software vendors and
industry analysts. Unlike RPC Web services, loose coupling is more
likely, because the focus is on the "contract" that WSDL provides, rather
than the underlying implementation details.
106
CIT 208 MODULE 3
Messaging Model
Name Platform Specifications Protocols
(Destination)
ActionWeb Ruby (on SOAP, XML-
Client/Server ?
Service Rails) RPC, WSDL
WS-ReliableMessaging,
WS-Coordination, WS-
Apache
Java/C++ Client/Server Security, WS- SOAP, WSDL
Axis
AtomicTransaction, WS-
Addressing
WS-
ReliableMessaging,WS-
Security, WS- SOAP,MTOM
Apache Client/Server/
Java/C AtomicTransaction, WS- , WSDL 2.0,
Axis2 Asyn Support
Addressing ,MTOM WSDL
,WS-Policy, WS-
MetadataExchange
WS-Addressing, WS- SOAP1.1,
Apache Client/Server/ Policy, WS- SOAP1.2,MT
Java
CXF Asyn Support ReliableMessaging, WS- OM, WSDL
Security, MTOM 2.0, WSDL
AlchemySO
C++ Client/Server WS-Addressing SOAP
AP
csoap C Client/Server ? SOAP
Halcyon Ruby Client/Server N/A JSON
Java,
Ruby,
Python,
Hessian Client/Server Hessian 1.0.1 Hessian
Erlang,
PHP,
others
JSON-RPC-
Java Server ??? JSON-RPC
Java
JSON-RPC-
Lua Server ??? JSON-RPC
Lua
Java Web
Services
WS-Addressing, WS- SOAP,
Developmen Java Client/Server
Security, ??? WSDL, ???
t Pack /
GlassFish
Object Oriented, Creates
NuSOAP PHP Client/Server SOAP, WSDL
Users Help document,
107
CIT 208 INFORMATION SYSTEMS
SOAP,
SOAP Lite Perl Client/Server ???
WSDL, ???
WS-Addressing, WS-
ReliableMessaging, WS-
Web Coordination, WS-
Services AtomicTransaction, WS- SOAP,
Interoperab Java Client/Server Security, WS-Security WSDL,
ility Policy, WS-Trust, WS- MTOM
Technology SecureConversation,
WS-Policy, WS-
MetadataExchange
Web
Services
Java Client ??? SOAP, WSDL
Invocation
Framework
Windows
WS-Addressing, WS-
Communica
.Net Client/Server ? ReliableMessaging, WS- SOAP, WSDL
tion
Security
Foundation
XFire
became WS-Addressing, WS-
Java Client/Server SOAP, WSDL
Apache Security
CXF
XML
Interface SOAP, XML-
Java Server ? ??
for Network RPC
Services
WS-Addressing, WS-
Discovery, WS- SOAP, XML-
gSOAP C/C++ Client/Server
Enumeration, WS- RPC, WSDL
Security
Zolera
SOAP
Python Client/Server ??? SOAP, WSDL
Infrastructu
re (ZSI)
WSO2 Web
WS-Addressing, WS-
Services
C (build Policy, WS-Security,
Framework Client/Server, SOAP,
on WS-SecurityPolicy, WS-
for Publish/Subscribe WSDL, TLS
Axis2/c) ReliableMessaging, WS-
C(WSO2
Eventing
WSF/C)
WS-Addressing, WS-
Policy, WS-Security,
SOAP,
WSO2 WS-SecurityPolicy, WS-
PHP Client/Server WSDL,
WSF/PHP ReliableMessaging, WS-
WSDL 2.0
SecureConversation,
MTOM
WSO2 Ruby on Client/Server WS-Addressing, WS- SOAP, WSDL
108
CIT 208 MODULE 3
The architecture does not attempt to specify how web services are
implemented, and imposes no restriction on how web services might be
combined. The WSA describes both the minimal characteristics that are
common to all web services, and a number of characteristics that are
needed by many, but not all, web services.
109
CIT 208 INFORMATION SYSTEMS
(In most cases, the requester agent is the one to initiate this message
exchange, though not always. Nonetheless, for consistency we still use
the term “requester agent” for the agent that interacts with the provider
agent, even in cases when the provider agent actually initiates the
exchange.).
3.4.5 Semantics
110
CIT 208 MODULE 3
There are many ways that a requester entity might engage and use a web
service. In general, the following broad steps are required, as illustrated
in Figure 1: (1) the requester and provider entities become known to
each other (or at least one becomes known to the other); (2) the
requester and provider entities somehow agree on the service description
and semantics that will govern the interaction between the requester and
provider agents; (3) the service description and semantics are realised by
the requester and provider agents; and (4) the requester and provider
agents exchange messages, thus performing some task on behalf of the
requester and provider entities. (I.e., the exchange of messages with the
provider agent represents the concrete manifestation of interacting with
the provider entity’s web service). These steps are explained in more
detail in 3.4 Web Service Discovery. Some of these steps may be
automated, others may be performed manually.
111
CIT 208 INFORMATION SYSTEMS
3.5.1 Introduction
3.5.3 Concepts
112
CIT 208 MODULE 3
3.5.4 Relationships
• An agent is
A computational resource
• A message has
A message sender
113
CIT 208 INFORMATION SYSTEMS
The merit of a concept map is that it allows rapid navigation of the key
concepts and illustrates how they relate to each other. It should be
stressed, however, that these diagrams are primarily navigational aids;
the written text is the definitive source.
3.5.6 Model
3.5.7 Conformance
114
CIT 208 MODULE 3
115
CIT 208 INFORMATION SYSTEMS
The essence of the message model revolves around a few key concepts
illustrated above: the agent that sends and receives messages, the
structure of the message in terms of message headers and bodies and the
mechanisms used to deliver messages. Of course, there are additional
details to consider: the role of policies and how they govern the message
level model. The abridged diagram shows the key concepts; the detailed
diagram expands on this to include many more concepts and
relationships.
116
CIT 208 MODULE 3
117
CIT 208 INFORMATION SYSTEMS
Policies are about resources. They are applied to agents that may attempt
to access those resources, and are put in place, or established, by people
who have responsibility for the resource.
118
CIT 208 MODULE 3
SELF-ASSESSMENT EXERCISE
4.0 CONCLUSION
119
CIT 208 INFORMATION SYSTEMS
5.0 SUMMARY
120
CIT 208 MODULE 3
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 General Introduction
3.2 Origin and Goals
3.3 Terminology
3.4 Why Do We Need XML?
3.5 Rendering HTML
3.6 Processing HTML
3.7 Tags, Elements, and Attributes
3.8 How XML is Changing the Web
3.9 XML Document Rules
3.9.1 Overview
3.9.2 The Root Element
3.9.3 Elements Cannot Overlap
3.9.4 End Tag is Required
3.9.5 Elements are Case- Sensitive
3.9.6 Attributes Must Have Quoted Values
3.9.7 XML Declarations
3.9.8 Other Things in XML Documents
3.9.9 Namespaces
3.10 Defining Document Content
3.10.1 Overview
3.10.2 Document Type Definitions
3.10.3 Symbols in DTDs
3.10.4 A Word about Flexibility
3.10.5 Defining Attributes
3.10.6 XML Schemas
3.10.7 A Sample XML Schema
3.10.8 Defining Elements in Schema
3.10.9 Defining Element Content in Schemas
3.11 XML Programming Interfaces
3.11.1 Overview
3.11.2 The Document Object Model
3.11.3 DOM Issues
3.11.4 The Simple APL for XML
3.11.5 SAX Issues
3.11.6 JDOM
3.11.7 The Java API for XML Parsing
3.11.8 Which Interface is Right for You?
3.12 Determining the Right Interface
3.12.1 Overview
121
CIT 208 INFORMATION SYSTEMS
1.0 INTRODUCTION
2.0 OBJECTIVES
122
CIT 208 MODULE 3
123
CIT 208 INFORMATION SYSTEMS
3.3 Terminology
• Error
• Fatal Error
• At User Option
124
CIT 208 MODULE 3
• Validity Constraint
• Well-Formedness Constraint
• Match
• For Compatibility
• For Interoperability
HTML is the most successful markup language of all time. You can
view the simplest HTML tags on virtually any device, from palmtops to
mainframes, and you can even convert HTML markup into voice and
other formats with the right tools. Given the success of HTML, why did
the W3C create XML? To answer that question, take a look at this
document:
125
CIT 208 INFORMATION SYSTEMS
• 14 Ken Street
• <br>
• Enugu</p>
The trouble with HTML is that it was designed with humans in mind.
Even without viewing the above HTML document in a browser, you and
I can figure out that it is someone’s postal address. (Specifically, it is a
postal address for someone in Nigeria addresses; you could probably
guess what this represents.).
• If you find a paragraph with two <br> tags, the postal code is the
second word after the first comma in the second break tag.
Although this algorithm works with this example, there is any
number of perfectly valid addresses worldwide for which this
simply would not work. Even if you could write an algorithm that
found the postal code for any address written in HTML, there is
any number of paragraphs with two break tags that do not contain
addresses at all. Writing an algorithm that looks at any HTML
paragraph and finds any postal codes inside it would be
extremely difficult, if not impossible.
126
CIT 208 MODULE 3
Now let us look at a sample XML document. With XML, you can assign
some meaning to the tags in the document. More importantly, it is easy
for a machine to process the information as well. You can extract the
postal code from this document by simply locating the content
surrounded by the <postal-code> and </postal-code> tags, technically
known as the <postal-code> element.
• <address>
• <name>
• <title>Mrs.</title><first-name>
• Mary
• </first-name>
• <last-name>
• McGoon
• </last-name>
• </name>
• <street>
• 1401 Main Street
• </street>
• <city>Anytown</city>
• <state>NC</state>
• <postal-code>
• 34829
• </postal-code>
• </address>
• <address>
• <name>
• <title>Mrs.</title>
• <first-name>
• Mary
• </first-name>
• <last-name>
• McGoon
• </last-name>
• </name>
127
CIT 208 INFORMATION SYSTEMS
• <street>
• 1401 Main Street
• </street>
• <city state="NC">Anytown</city>
• <postal-code>
• 34829
• </postal-code>
• </address>
- A tag is the text between the left angle bracket (<) and the right
angle bracket (>).There are starting tags (such as <name>) and
ending tags (such as </name>)
- An element is the starting tag, the ending tag, and everything in
between. In the sample above, the <name> element contains three
child elements: <title>, <first-name>, and <last-name>.
- An attribute is a name-value pair inside the starting tag of an
element. In this example, state is an attribute of the <city>
element; in earlier examples,<state> was an element (see A
sample XML document).
Now that you have seen how developers can use XML to create
documents with self-describing data, let us look at how people are using
those documents to improve the web. Here are a few key areas:
128
CIT 208 MODULE 3
pages for someone named "Chip," you might also find pages on
chocolate chips, computer chips, wood chips, and lots of other
useless matches. Searching XML documents for <first-name>
elements that contained the text Chip would give you a much
better set of results.
3.9.1 Overview
If you have looked at HTML documents, you are familiar with the basic
concepts of using tags to mark up the text of a document. This section
discusses the differences between HTML documents and XML
documents. It goes over the basic rules of XML documents, and
discusses the terminology used to describe them.
129
CIT 208 INFORMATION SYSTEMS
Notice that the document has a comment that is outside the root element;
that's perfectly legal.
• <?xml version="1.0"?>
• <!-- A well-formed document -->
• <greeting>
• Hello, World!
• </greeting>
• <?xml version="1.0"?>
• <!-- An invalid document -->
• <greeting>
• Hello, World!
• </greeting>
• <greeting>
• Hola, el Mundo!
• </greeting>
XML elements cannot overlap. Here is some markup that is not legal:
If you begin a <i> element inside a <b> element, you have to end it
there as well. If you want the text XML to appear in italics, you need to
add a second <i> element to correct the markup:
130
CIT 208 MODULE 3
An XML parser will accept only this markup; the HTML parsers in most
web browsers will accept both.
You can not leave out any end tags. In the first example below, the
markup is not legal because there are no end paragraph (</p>) tags.
While this is acceptable in HTML (and, in some cases, SGML), an XML
parser will reject it.
131
CIT 208 INFORMATION SYSTEMS
Compare the two examples below. The markup at the top is legal in
HTML, but not in XML. To do the equivalent in XML, you have to give
the attribute a value, and you have to enclose it in quotes.
You can use either single or double quotes, just as long as you are
consistent.
If the value of the attribute contains a single or double quote, you can
use the other kind of quote to surround the value (as in name="Doug’s
car"), or use the entities " for a double quote and ' for a
single quote. An entity is a symbol, such as ", that the XML parser
replaces with other text, such as ".
Most XML documents start with an XML declaration that provides basic
information about the document to the parser. An XML declaration is
recommended, but not required. If there is one, it must be the first thing
in the document.
Finally, standalone, which can be either yes or no, defines whether this
document can be processed without reading any other files. For
132
CIT 208 MODULE 3
example, if the XML document does not reference any other files, you
would specify standalone="yes". If the XML document references other
files that describe what the document can contain (more about those files
in a minute), you could specify standalone="no". Because
standalone="no" is the default, you rarely see standalone in XML
declarations.
3.9.8 Other Things in XML Documents
There are a few other things you might find in an XML document:
Comments: Comments can appear anywhere in the document; they can
even appear before or after the root element. A comment begins with <!-
- and ends with -->. A comment cannot contain a double hyphen (--)
except at the end; with that exception, a comment can contain anything.
Most importantly, any markup inside a comment is ignored; if you want
to remove a large section of an XML document, simply wrap that
section in a comment. (To restore the commented-out section, simply
remove the comment tags.) Here is some markup that contains a
comment:
• <!—Here’s a PI for Cocoon: -->
• <?cocoon-process type="sql"?>
Processing instructions: A processing instruction is markup intended
for a particular piece of code. In the example above, there is a
processing instruction (sometimes called a PI) for Cocoon, an XML
processing framework from the Apache Software Foundation. When
Cocoon is processing an XML document, it looks for processing
instructions that begin with cocoon-process, then processes the XML
document accordingly. In this example, the type="sql" attribute tells
Cocoon that the XML document contains a SQL statement.
133
CIT 208 INFORMATION SYSTEMS
3.9.9 Namespaces
XML’s power comes from its flexibility, the fact that you and I and
millions of other people can define our own tags to describe our data.
Remember the sample XML document for a person’s name and address?
That document includes the <title>element for a person’s courtesy title,
a perfectly reasonable choice for an element name. If you run an online
bookstore, you might create a <title> element for the title of a book. If
you run an online mortgage company, you might create a <title>
element for the title to a piece of property. All of those are reasonable
choices, but all of them create elements with the same name. How do
you tell if a given <title>element refers to a person, a book, or a piece of
property? With namespaces.
• <?xml version="1.0"?>
• <customer_summary
• xmlns:addr="http://www.xyz.com/addresses/"
• xmlns:books="http://www.zyx.com/books/"
• xmlns:mortgage="http://www.yyz.com/title/">
• ... <addr:name><title>Mrs.</title> ... </addr:name> ...
• ... <books:title>Lord of the Rings</books:title> ...
• ... <mortgage:title>NC2948-388-1983</mortgage:title> ...
In this example, the three namespace prefixes are addr, books, and
mortgage.
Notice that defining a namespace for a particular element means that all
of its child elements belong to the same namespace. The first <title>
element belongs to the addr namespace because its parent element,
<addr:Name>, does. One final point: The string in a namespace
definition is just a string. Yes, these strings look like URLs, but they
are not. You could define xmlns:addr="mike" and that would work just
as well. The only thing that is important about the namespace string is
that it is unique; that is why most namespace definitions look like URLs.
The XML parser does not go to http://www.zyx.com/books/ to search
for a DTD or schema; it simply uses that text as a string. It is confusing,
but that is how namespaces work.
134
CIT 208 MODULE 3
3.10.1 Overview
So far, in this unit you have learned about the basic rules of XML
documents; that is all well and good, but you need to define the elements
you are going to use to represent data. You will learn two ways of doing
that in this section.
The other method is to use an XML Schema. A schema can define all of
the document structures that you can put in a DTD, and it can also
define data types and more complicated rules than a DTD can. The W3C
developed the XML Schema specification a couple of years after the
original XML spec.
This DTD defines all of the elements used in the sample document. It
defines three basic things:
135
CIT 208 INFORMATION SYSTEMS
There are a few symbols used in DTDs to indicate how often (or
whether) something may appear in an XML document. Here are some
examples, along with their meanings:
136
CIT 208 MODULE 3
Before going on, a quick note about designing XML document types for
flexibility. Consider the sample name and address document type; I
clearly wrote it with U.S. postal addresses in mind. If you want a DTD
or schema that defines rules for other types of addresses, you would
have to add a lot more complexity to it. Requiring a <state> element
might make sense in Australia, but it would not in the UK. A Canadian
address might be handled by the sample DTD, but adding a <province>
element is a better idea. Finally, be aware that in many parts of the
world, concepts like title, first name, and last name do not make sense.
The bottom line: If you are going to define the structure of an XML
document, you should put as much forethought into your DTD or
schema as you would if you were designing a database schema or a data
structure in an application. The more future requirements you can
foresee, the easier and cheaper it will be for you to implement them
later.
This introductory unit does not go into great detail about how DTDs
work, but there is one more basic topic to cover here: defining attributes.
You can define attributes for the elements that will appear in your XML
document. Using a DTD, you can also:
Suppose that you want to change the DTD to make state an attribute of
the <city> element. Here is how to do that:
137
CIT 208 INFORMATION SYSTEMS
This defines the <city> element as before, but the revised example also
uses an ATTLIST declaration to list the attributes of the element. The
name city inside the attribute list tells the parser that these attributes are
defined for the <city> element. The name state is the name of the
attribute, and the keywords CDATA and #REQUIRED tell the parser
that the state attribute contains text and is required (if it's optional,
CDATA #IMPLIED will do the trick).
Finally, DTDs allow you to define default values for attributes and
enumerate all of the valid values for an attribute:
The example here indicates that it only supports addresses from the
states of Arizona (AZ), California (CA), Nevada (NV), Oregon (OR),
Utah (UT), and Washington (WA), and that the default state is
California. Thus, you can do a very limited form of data validation.
While this is a useful function, it is a small subset of what you can do
with XML schemas.
With XML schemas, you have more power to define what valid XML
documents look like. They have several advantages over DTDs:
138
CIT 208 MODULE 3
Here is an XML schema that matches the original name and address
DTD. It adds two constraints: The value of the <state> element must be
exactly two characters long and the value of the <postal-code> element
must match the regular expression [0-9]{5}(-[0-9]{4})?. Although the
schema is much longer than the DTD, it expresses more clearly what a
valid document looks like. Here's the schema:
139
CIT 208 INFORMATION SYSTEMS
• </xsd:element>
• <xsd:element name="title" type="xsd:string"/>
• <xsd:element name="first-Name" type="xsd:string"/>
• <xsd:element name="last-Name" type="xsd:string"/>
• <xsd:element name="street" type="xsd:string"/>
• <xsd:element name="city" type="xsd:string"/>
• <xsd:element name="state">
• <xsd:simpleType>
• <xsd:restriction base="xsd:string">
• <xsd:length value="2"/>
• </xsd:restriction>
• </xsd:simpleType>
• </xsd:element>
• <xsd:element name="postal-code">
• <xsd:simpleType>
• <xsd:restriction base="xsd:string">
• <xsd:pattern value="[0-9]{5}(-[0-9]{4})?"/>
• </xsd:restriction>
• </xsd:simpleType>
• </xsd:element>
• </xsd:schema>
• <xsd:element name="address">
• <xsd:complexType>
• <xsd:sequence>
• <xsd:element ref="name"/>
• <xsd:element ref="street"/>
• <xsd:element ref="city"/>
• <xsd:element ref="state"/>
• <xsd:element ref="postal-code"/>
• </xsd:sequence>
• </xsd:complexType>
• </xsd:element>
• As in the DTD version, the XML schema example defines that an
<address> contains a <name>, a <street>, a <city>, a <state>, and
a <postal-code> element, in that order. Notice that the schema
140
CIT 208 MODULE 3
Most of the elements contain text; defining them is simple. You merely
declare the new element, and give it a datatype of xsd:string:
The sample schema defines constraints for the content of two elements:
The content of a <state> element must be two characters long, and the
content of a <postal-code> element must match the regular expression
[0-9]{5}(-[0-9]{4})?. Here's how to do that:
• <xsd:element name="state">
• <xsd:simpleType>
• <xsd:restriction base="xsd:string">
• <xsd:length value="2"/>
• </xsd:restriction>
• </xsd:simpleType>
• </xsd:element>
• <xsd:element name="postal-code">
• <xsd:simpleType>
• <xsd:restriction base="xsd:string">
• <xsd:pattern value="[0-9]{5}(-[0-9]{4})?"/>
• </xsd:restriction>
• </xsd:simpleType>
• </xsd:element>
For the <state> and <postal-code> elements, the schema defines new
data types with restrictions. The first case uses the <xsd:length>
element, and the second uses the <xsd:pattern> element to define a
regular expression that this element must match.
This summary only scratches the surface of what XML schemas can do;
there are entire books written on the subject. For the purpose of this
introduction, suffice to say that XML schemas are a very powerful and
flexible way to describe what a valid XML document looks like.
141
CIT 208 INFORMATION SYSTEMS
3.11.1 Overview
The Document Object Model, commonly called the DOM, defines a set
of interfaces to the parsed version of an XML document. The parser
reads in the entire document and builds an in-memory tree, so your code
can then use the DOM interfaces to manipulate the tree. You can move
through the tree to see what the original document contained, you can
delete sections of the tree; you can rearrange the tree, add new branches,
and so on. The DOM was created by the W3C, and is an Official
Recommendation of the consortium.
The DOM provides a rich set of functions that you can use to interpret
and manipulate an XML document, but those functions come at a price.
As the original DOM for XML documents was being developed, a
number of people on the XML-DEV mailing list voiced concerns about
it:
142
CIT 208 MODULE 3
• A SAX parser sends events to your code. The parser tells you
when it finds the start of an element, the end of an element, text,
the start or end of the document, and so on. You decide which
events are important to you, and you decide what kind of data
structures you want to create to hold the data from those events.
If you do not explicitly save the data from an event, it is
discarded.
• A SAX parser does not create any objects at all; it simply delivers
events to your application. If you want to create objects based on
those events, it is up to you.
• A SAX parser starts delivering events to you as soon as the parse
begins. Your code will get an event when the parser finds the
start of the document, when it finds the start of an element, when
it finds text, and so on. Your application starts generating results
right away; you do not have to wait until the entire document has
been parsed.
Even better, if you are only looking for certain things in the document,
your code can throw an exception once it is found what it is looking for.
The exception stops the SAX parser, and your code can do whatever it
needs to do with the data it has found.
Having said all of these things, both SAX and DOM have their place.
The remainder of this section discusses why you might want to use one
interface or the other.
To be fair, SAX parsers also have issues that can cause concern:
• SAX events are stateless. When the SAX parser finds text in an
XML document, it sends an event to your code. That event
simply gives you the text that was found; it does not tell you what
element contains that text. If you want to know that, you have to
write the state management code yourself.
• SAX events are not permanent. If your application needs a data
structure that models the XML document, you have to write that
code yourself. If you need to access data from a SAX event, and
you did not store that data in your code, you have to parse the
document again.
143
CIT 208 INFORMATION SYSTEMS
3.11.6 JDOM
Frustrated by the difficulty in doing certain tasks with the DOM and
SAX models, Jason Hunter and Brett McLaughlin created the JDOM
package. JDOM is a Java technology-based, open source project that
attempts to follow the 80/20 rule: Deliver what 80% of users need with
20% of the functions in DOM and SAX. JDOM works with SAX and
DOM parsers, so it is implemented as a relatively small set of Java
classes.
The main feature of JDOM is that it greatly reduces the amount of code
you have to write. Although this introductory unit does not discuss
programming topics in depth, JDOM applications are typically one-third
as long as DOM applications, and about half as long as SAX
applications. (DOM purists, of course, suggest that learning and using
the DOM is good discipline that will pay off in the long run.) JDOM
does not do everything, but for most of the parsing you want to do, it is
probably just the thing.
Although DOM, SAX, and JDOM provide standard interfaces for most
common tasks, there are still several things they do not address. For
example, the process of creating a DOMParser object in a Java
programme differs from one DOM parser to the next. To fix this
problem, Sun has released JAXP, the Java API for XML Parsing. This
API provides common interfaces for processing XML documents using
DOM, SAX, and XSLT. JAXP provides interfaces such as the
DocumentBuilderFactory and the DocumentBuilder that provide a
standard interface to different parsers. There are also methods that allow
you to control whether the underlying parser is namespace-aware and
whether it uses a DTD or schema to validate the XML document.
144
CIT 208 MODULE 3
145
CIT 208 INFORMATION SYSTEMS
The main feature of JDOM is that it greatly reduces the amount of code
you have to write. Although this introductory unit does not discuss
programming topics in depth, JDOM applications are typically one-third
as long as DOM applications, and about half as long as SAX
applications. (DOM purists, of course, suggest that learning and using
the DOM is good discipline that will pay off in the long run.) JDOM
does not do everything, but for most of the parsing you want to do, it is
probably just the thing.
3.12.1 Overview
146
CIT 208 MODULE 3
3.12.5 DOM
• The Core DOM defines the DOM itself, the tree structure, and
the kinds of nodes and exceptions your code will find as it moves
through the tree. The complete spec is at w3.org/TR/DOM-Level-
2-Core/.
• Events define the events that can happen to the tree, and how
those events are processed. This specification is an attempt to
reconcile the differences in the object models supported by
Netscape and Internet Explorer since Version 4 of those
browsers. This spec is at w3.org/TR/DOM-Level-2-Events/ .
• Style defines how XSLT style sheets and CSS style sheets can be
accessed by a programme. This spec is at w3.org/TR/DOM-Level-
2-Style/
• Traversals and Ranges define interfaces that allow programmes
to traverse the tree or define a range of nodes in the tree. You can
find the complete spec at w3.org/TR/DOM-Level-2-Traversal-
Range/.
• Views define an AbstractView interface for the document itself.
See w3.org/TR/DOM-Level-2-Views/ for more information.
147
CIT 208 INFORMATION SYSTEMS
The Simple API for XML defines the events and interfaces used to
interact with a SAX-compliant XML parser. You can find the complete
SAX specification at www.saxproject.org.
The JDOM project was created by Jason Hunter and Brett McLaughlin
and lives at jdom.org/. At the JDOM site, you can find code, sample
programmes, and other tools to help you get started. (For
developerWorks articles on JDOM, see Resources on page 32).
One significant point about SAX and JDOM is that both of them came
from the XMLdeveloper community, not a standards body. Their wide
acceptance is a tribute to the active participation of XML developers
worldwide.
There are two standards for linking and referencing in the XML world:
XLink and XPointer:
3.12.8 Security
There are two significant standards that address the security of XML
documents. One is the XML Digital Signature standard
(w3.org/TR/xmldsig-core/), which defines an XML document structure
for digital signatures. You can create an XML digital signature for any
kind of data, whether it is an XML document, an HTML file, plain text,
binary data, and so on. You can use the digital signature to verify that a
particular file was not modified after it was signed. If the data you are
signing is an XML document, you can embed the XML document in the
signature file itself, which makes processing the data and the signature
very simple.
148
CIT 208 MODULE 3
149
CIT 208 INFORMATION SYSTEMS
query the registry to find what you want. The source of all things
UDDI is uddi.org.
Finally, for a good source of XML standards, visit the XML Repository
at xml.org/xml/registry.jsp. This site features hundreds of standards for a
wide variety of industries.
a. Real-world examples
At this point, I hope you are convinced that XML has tremendous
potential to revolutionise the way eBusiness works. While
potential is great, what really counts is actual results in the
marketplace. This section describes three case studies in which
organisations have used XML to streamline their business
processes and improve their results.
All of the case studies discussed here come from IBM's jStart program.
The jStart team exists to help customers use new technologies to solve
problems. When a customer agrees to a jStart engagement, the customer
receives IBM consulting and development services at a discount, with
the understanding that the resulting project will be used as a case study.
If you would like to see more case studies, including case studies
involving web services and other new technologies, visit the jStart Web
page at ibm.com/software/jstart.
Be aware that the jStart team is no longer doing engagements for XML
projects; the team's current focus is Web services engagements. Web
services use XML in a specialized way, typically through the SOAP,
WSDL, and UDDI standards mentioned earlier in Web services.
b. A messaging-based system
150
CIT 208 MODULE 3
Using Java technology and XML has been very successful for First
Union. According to Bill Barnett, Manager of the Distributed Object
Integration Team at First Union, "The combination of Java and XML
really delivered for us. Without a platform-independent environment
like Java and the message protocol independence we received from the
use of XML, we would not have the confidence that our distributed
infrastructure could evolve to meet the demand from our ever-growing
customer base."
4.0 CONCLUSION
At this point, I hope you are convinced that XML is the best way to
move and manipulate structured data. If you are not using XML already,
how do you get started? Here are some suggestions:
151
CIT 208 INFORMATION SYSTEMS
5.0 SUMMARY
152
CIT 208 MODULE 3
The dW XML zone is your one-stop shop for XML resources. See
www-106.ibm.com/developerworks/xml for everything you always
wanted to know about XML. XML tools: developerWorks has "Fill
your XML toolbox" articles that describe XML programming tools for a
variety of languages:
IBM's jStart team: The jStart team works at very low cost to help
customers build solutions using new technology (XML Web
services, for example). In return, those customers agree to let
IBM publicize their projects as a case study.
153
CIT 208 INFORMATION SYSTEMS
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 General Introduction
3.2 XML and Relational - Opposites Attract
3.3 XML and Relational: Four Approaches
3.4 SQL/XML
3.5 XML Publishing Functions
3.6 The XML Datatype
3.7 SQL/XML Mapping Rules
3.8 XQuery and Native XML Programming
3.9 Native XML Programming
3.9.1 XML is not Objects!
3.9.2 XML is not just text!
3.9.3 What should a Native XML Programming
Language do?
3.10 XQuery and SQL/XML Views
3.11 Spanning Sources: XQuery, Web Messages, and
Databases
3.12 XQuery for Java (JSR 225)
3.13 SQL/XML and XQuery: Do we need both?
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
2.0 OBJECTIVES
• define XML
• write and use XML Queries to solve real life problems.
154
CIT 208 MODULE 3
Note
XQuery is a completely new query language that uses XML as the basis
for its data model and type system. It is being developed in the XML
Query Working Group [XQWG], which is a part of the World Wide
Web Consortium. In this paper, we characterise XQuery as a "Native
XML Programming Language". XQuery is based on XML in the same
way that SQL is based on the relational model or object-oriented
languages are based on the object-oriented model - XML is central to its
type system, in which elements and attributes are just as fundamental as
integers and strings. Although XQuery per se has no concept of
relational data, several products and many projects provide ways to
query relational data using an XML view of the database, and the need
to make this possible has influenced the design of XQuery throughout
its development. XQuery allows you to work in the XML world no
155
CIT 208 INFORMATION SYSTEMS
matter what type of data you are working with - relational, XML or
object data.
XQuery is ideal for native XML programming. When used with XML
views of relational data, it is also ideal for queries data that must
represent results as XML, to query XML stored inside or outside the
database, or to span relational and XML sources.
For queries based only on relational data, SQL/XML and XQuery have
substantially similar functionality. However, the way in which a given
task is done is quite different, since SQL/XML operates on the
borderline between SQL and XML, and XQuery lives in a purely XML
world. Even when the data is all relational, the two languages appeal to
very different audiences - SQL/XML is very much an extension of SQL,
designed for SQL programmers, and XQuery takes a purely XML view
of the world. For queries that span relational and XML sources, XQuery
has important advantages.
XML and relational databases are tightly wed in most web applications,
but a look at the two models shows that it is an unlikely marriage -
though a necessary one. The relational model is based on two
dimensional tables which have neither hierarchy nor significant order.
XML is based on trees in which order is significant. In the relational
model, neither hierarchy nor sequence may be used to model
information; in XML, hierarchy and sequence are the main ways to
represent information. Although this is one of the more fundamental
differences between the two models, it is by no means the only one.
156
CIT 208 MODULE 3
But most of the data for these web pages comes from relational
databases, and needs to be converted to appropriate XML hierarchies.
157
CIT 208 INFORMATION SYSTEMS
Note that in the original SQL tables, each customer is represented only
once. This is also true of the XML. The SQL result set, however,
contains multiple rows for a given customer if that customer is
associated with more than one project, and these rows contain duplicate
information. Translating this result set into the desired XML is tedious
for the programmer. And just as a single relational database may be used
with an infinite number of queries, it may also be used to create an
infinite number of XML documents with different structures. Today,
many programmers spend a great deal of time doing this kind of
translation.
158
CIT 208 MODULE 3
XML applications that use relational data can choose from four
approaches, each with distinct advantages and disadvantages. The first
three of these are compared in some detail, with code samples, in
[SQL/XML-JDBC].
The programmer can use JDBC or ODBC together with SAX or DOM
and perhaps XSLT to transform the results of SQL queries to XML. For
instance, the programme might first query for customers, then perform
an additional query to find the projects associated with each customer.
This is inefficient because of the number of queries required.
The programmer can use the XML extensions provided by the major
database vendors. These are based on several different approaches.
Some of these are simpler to use or maintainable than others, but they all
make the task easier. However, since these extensions are all
proprietary, they are not an option when a database-independent solution
is needed.
The programmer can use SQL/XML, which is part of SQL 2003. For a
SQL programmer, this approach requires little new learning - a small set
of XML publishing functions have been added to SQL to allow queries
to create any desired XML structure. This approach will be explored
with examples in the next section. SQL/XML is being supported by
Oracle and IBM, but not by Microsoft. Database-independent
implementations of SQL/XML are also available, and can be used with
any major relational database. SQL/XML can be used with traditional
database APIs such as JDBC.
The programmer can use XQuery, a native XML query language. Since
XQuery is a new language, it requires more learning for SQL
programmers, but it is likely to be more natural for XML programmers.
Unlike SQL/XML, XQuery is optimal for processing XML, and it is
also particularly good for applications that must process XML together
with relational data, with full support for XML. Most of the major
database vendors intend to support XQuery. The first standardized API
for XQuery, XQuery for Java (JSR 225), is now being developed under
159
CIT 208 INFORMATION SYSTEMS
3.4 SQL/XML
The XML Publishing Functions are the part that are directly used in a
SQL query. The XML Datatype governs the result of a query, and the
Mapping Rules determine how SQL data or metadata is represented as
XML.
The XML Publishing Functions allow SQL to create any desired XML
structure. They are part of SQL 2003, and can be used in normal SQL
expressions. Here are the XML publishing functions of SQL 2003:
160
CIT 208 MODULE 3
Let us compare a traditional SQL query with one that uses an XML
publishing function. Here is a traditional SQL query that shows
customers and their associated projects:
Now let is wrap the result in XML elements using xmlelement(), one of
the publishing functions:
161
CIT 208 INFORMATION SYSTEMS
162
CIT 208 MODULE 3
This output contains two rows, with one element in each row.
Subqueries in SQL/XML are allowed to return only one row; therefore,
to return more than one row of values in a SQL/XML subquery, they
must be combined to form a single value. xmlagg() is an XML
publishing function that produces a forest of elements by collecting the
XML values that are returned from multiple rows and concatenating the
values to make one value. Here is a query that uses the above subquery
to create the XML output from the previous section:
The above query illustrates a very common pattern used to create XML
hierarchies using SQL/XML.
The XML Datatype is a datatype in the same way that integer, date, or
CLOB are datatypes in SQL. Since SQL/XML allows a query to create
XML instances, there must be a datatype that corresponds to these
instances. It is anticipated that the XML Datatype will be supported in
JDBC 4.0. It is too early to say exactly how it will be used in that
specification, but it is likely that it will retrieve XML values much like
other values, and that XMLvalues can be retrieved as text, DOM, or
SAX events. This is the approach currently taken by DataDirect Connect
for SQL/XML. To illustrate this, let us use a SQL/XML query to create
163
CIT 208 INFORMATION SYSTEMS
a table with two columns, an integer containing the CustId and an XML
column containing the XML output from the previous query. Here is the
query:
The XML Type also plays a second important role - relational databases
now routinely store XML in individual column, and the XML Type
provides a standard type for such columns, which is useful both in SQL
and in JDBC.
The XML publishing functions use SQL values to create XML values,
and these XML values have W3C XML Schema types. When we
discussed the XML publishing functions, we did not address specifically
how the XML representation is determined. The mapping rules of
SQL/XML describe in excruciating detail how SQL values can be
mapped to and from XML values, and how SQL metadata can be
mapped to and from W3C XML Schemas.
164
CIT 208 MODULE 3
To give a flavor for the level of detail in which this is specified, here are
the equivalent headings from the SQL/XML specification’s table of
contents:
165
CIT 208 INFORMATION SYSTEMS
These mappings are also defined on the metadata level. For instance,
SQL/XML defines how the datatypes of SQL are represented in the
equivalent XML Schema. Each SQL type is derived from an equivalent
built-in W3C XML Schema type. Where needed, facets are used to
represent constraints added to those of the base type:
166
CIT 208 MODULE 3
XML is the basis of XQuery's type system and data model. The
fundamental types of XQuery include the kinds of nodes found in XML
documents: document nodes, elements, attributes, processing
instructions, comments, and text nodes. XQuery also supports the built-
in datatypes of W3C XML Schema for representing integers, strings,
dates, and other datatypes - these built-in datatypes are predefined in
XQuery, and are available with or without a schema.
167
CIT 208 INFORMATION SYSTEMS
Most other languages used to process XML, including Java, C#, Perl,
and Python are not. SQL/XML is fundamentally an extension to a
relational query language, providing a bridge to XML.
168
CIT 208 MODULE 3
This solution would have been much messier if Adam had not used the
path expressions of XPath, a simple Native XML language. In XQuery,
path expressions are part of the language, and numeric conversions are
automatically done for untyped data. If the data is validated against a
schema, the types assigned by the schema are used. This makes it
possible to solve the same problem much more simply:
169
CIT 208 INFORMATION SYSTEMS
170
CIT 208 MODULE 3
In this paper, we assume that XML will remain as is, and that for
general processing, the best approach is to use an XML parser to build a
data model instance from the XML documents, and query the data
model instance. Not everybody believes this is the best approach. Tim
Bray, one of the editors of the original XML specification, objects to the
Native XML Programming solution because he objects to the notion of
an XML data model: [Bray]
The notion that there is an "XML data model" is silly and unsupported
by real-world evidence. The definition of XML is syntactic: the
"Infoset" is an afterthought and in any case is far indeed from being a
data model specification that a programmer could work with. Empirical
evidence:
XQuery, XPath, and XSLT now use one common data model, which can
represent both XML and the XML Schema datatypes. Although it would
have been convenient if XML had defined a data model, there is no
requirement that the data model used by a Native XML Programming
Language be the same as any particular data model used in a Java API.
As long as the data model supports the structure of XML directly,
171
CIT 208 INFORMATION SYSTEMS
Tim also suggests that XML is "syntactic", as though this implies that
there is no data model. This implies that syntax and structure are
opposites, which is rather surprising, since the purpose of syntax is to
describe the structure of a language. In the XML Recommendation, the
structure that corresponds to a data model is called the logical structure:
172
CIT 208 MODULE 3
When XQuery uses the syntax of XML, a curly brace escapes to the
syntax of XQuery, allowing dynamic expressions to be inserted. Here is
an example that creates a customer with a new unique identifier:
173
CIT 208 INFORMATION SYSTEMS
combines customers and projects to show the name of a customer and all
projects associated with that customer:
Some people seem to believe that the purpose of XQuery is largely the
same as that of SQL/XML - to allow XML structures to be created from
relational data. Although XQuery is useful for this task, it has relatively
few advantages over SQL/XML when this is all that is required. The
reason for this is simple: SQL is a language designed for handling SQL
data sources, and it does that very well. Adding XML publishing
functions to SQL is a simple way to let it create XML. However, it is
interesting to note that the SQL/XML views of relational tables have a
very constrained structure, and XQuery performed on such views is
generally quite similar to the equivalent SQL/XML.
174
CIT 208 MODULE 3
175
CIT 208 INFORMATION SYSTEMS
176
CIT 208 MODULE 3
According to the SOAP Primer, the proper response is to point out that
there are three airports that depart from New York, so that the user can
be prompted to pick one. Here is the desired output:
We will assume that when there is only one airport for a city, the output
should simply list that city, and that an error should be raised if there is
no airport for a given city. The following XQuery handles all three of
these cases:
177
CIT 208 INFORMATION SYSTEMS
Note that this code operates at a level very close to the application
domain, rather than navigating XML documents and converting from
XML to appropriate types in the host language. XML data sources and
relational data sources are treated in the same way - to the query, they
both look like XML documents.
SQL programmers are used to using APIs such as ODBC or JDBC to set
up the environment, execute queries, and do processing in the business
domain using the data returned by a query. Similar APIs are expected to
emerge for XQuery. The first standard API for this purpose is now being
developed under Java Community Process. It is known as XQuery for
Java (XQJ), or JSR 225.
Although SQL/XML and XQuery are both XML query standards, they
are based on quite different models, and fit best in different
architectures. SQL/XML fits cleanly into the relational model as a
reasonably small extension to traditional SQL. This means that it works
well in traditional SQL environments, providing full access to the
existing SQL language, including features like updates and full-text
queries that are not going to be part of XQuery
178
CIT 208 MODULE 3
known. Also, it has existing APIs, including ODBC and JDBC. In short,
SQL/XML provides the functionality needed for creating XML from
relational data while still fitting cleanly into the existing SQL
environment. SQL/XML implementations will be available from Oracle
and IBM, but not Microsoft, and a cross-database implementation is
available from DataDirect Technologies. Oracle's implementation also
provides functionality for querying and processing XML as well as
SQL, and there is some interest in adding extensions along these lines to
SQL/XML. Some members of the SQL/XML task force would also like
to see parts of XQuery added to SQL/XML. XQuery fits more cleanly
into the XML environment, providing Native XML Programming for
both XML sources and non-XML sources accessed via an XML view. It
is well designed for combining data from multiple sources, and is very
efficient for a variety of XML programming tasks. However, XQuery is
a brand new language – in fact, at the time of writing, XQuery 1.0 is
merely a Working Draft, not likely to emerge until the second half of
2004. There is a great deal of enthusiasm surrounding XQuery, most
major database vendors have announced support for it, and there is a
great deal of research on optimizing XQuery. However, XQuery is a
much younger language, the industry has little experience optimizing it,
and it lacks some features, including updates and fulltext, which are very
important for some kinds of tasks. Also, the API for XQuery, XQuery
for Java (JSR 225) is just now being developed.
4.0 CONCLUSION
In this unit we are confident that you have learned both SQL/XML and
XQuery which will play an important role in XML queries, and that
XQuery will become very important for general purpose XML
processing.
5.0 SUMMARY
XQuery is best for XML programmers who are working only with
XML, or need to work with XML and relational data together. In the
short term, implementers and users of XQuery should be aware that it is
both new and revolutionary - it shows great promise, but we have less
industry experience with XQuery than with SQL/XML.
179
CIT 208 INFORMATION SYSTEMS
180
CIT 208 MODULE 3
CONTENTS
1.0 Introduction
2.0 Objectives
3.0 Main Content
3.1 Data Integrity and Reliability
3.2 Database Recovery
3.3 Database Recovery Log
3.3.1 Definition of Data Recovery
3.3.2 Several Techniques for Damaged Media
3.4 Classification Criteria for Heterogeneous Database
3.4.1 Database Sharing in a Heterogeneous Database
System
4.0 Conclusion
5.0 Summary
6.0 Tutor-Marked Assignment
7.0 References/Further Reading
1.0 INTRODUCTION
2.0 OBJECTIVES
181
CIT 208 INFORMATION SYSTEMS
Computer systems can fail, hardware can break down, programmes have
bugs. Human procedures contain errors and people make mistakes. All
these failure occur in database application. It is then important to
recover database without any damage as soon as possible. This can be
achieved by going back to a known point and reprocess the workload
from there. The simplest form of this type is to make a copy periodically
of the database and keep a record of all transaction that have been
processed. Database recovery can be done in two ways.
Rollforward: The database is restored using the sort data, and all valid
transactions since the save are reapplied.
182
CIT 208 MODULE 3
all committed transactions, which may not have been physically written
to disk, are redone. These actions ensure the integrity of the database.
Restoring data from disks, tapes, CDs and digital photo memory cards
that have been damaged by accidents, disasters, power surges and
malfunctioning electronics. Laptop hard disks are especially vulnerable
if users are constantly on the move.
183
CIT 208 INFORMATION SYSTEMS
SELF-ASSESSMENT EXERCISE
184
CIT 208 MODULE 3
4.0 CONCLUSION
5.0 SUMMARY
185