CS Unit 5
CS Unit 5
5.1.3 Hypertext
5.2.3 HTTP
5.2.4 HTML
5.5.2 Explanation
5.6.2 Applications
5.6.3 Forms
5.7.6 Architecture
5.8.3 Architecture
5.10.3 Architecture
5.11 EIS/DSS
5.11.1 EIS (environment impact statement)
The WWW project has the potential to do for the Internet what Graphical User
Interfaces (GUIs) have done for personal computers -- make the Net useful to end users.
2. Hypertext
Hypertext provides the links between different documents and different document types.
If you have used Microsoft Windows Win Help system or the Macintosh HyperCard
application, you likely know how to use hypertext. In a hypertext document, links from
one place in the document to another are included with the text.
Servers use the CGI interface to execute local programs. CGIs provide a gateway
between the HTTP server software and the host machine.
6. Forms
One of the most prominent uses of CGI is in processing forms. Forms are a subset of
HTML that allows the user to supply information. The forms interface makes Web
browsing an interactive process for the user and the provider and it shows a simple
form.
7. Gateways
Web gateways are programs or scripts used to access information that is not directly
readable by the client. CGI provides a solution to the problem in the form of a gateway.
8. Communication threads are used to handle parts of the communication between the
applications and the database server.
9. Request threads perform the SQL operations requested by the applications. When the
Database Server is requested to perform a SQL operation it allocates one of its Request threads
to perform the task.
The WWW is a new way of viewing information -- and a rather different one. If, for
example, you are viewing this paper as a WWW document, you will view it with a browser, in
which case you can immediately access hypertext links. If you are reading this on paper, you will
see the links indicated in parentheses and in a different font. Keep in mind that the WWW is
constantly evolving. We have tried to pick stable links, but sites reorganize and sometimes they
even move. By the time you read the printed version of this paper, some WWW links may have
changed.
The WWW project has the potential to do for the Internet what Graphical User Interfaces
(GUIs) have done for personal computers -- make the Net useful to end users. The Internet
contains vast resources in many fields of study (not just in computer and technical
information). In the past, finding and using these resources has been difficult.
The Web provides consistency: Servers provide information in a consistent way and clients
show information in a consistent way. To add a further thread of consistency, many users
view the Web through graphical browsers which are like other windows (Microsoft
Windows, Macintosh windows, or X-Windows) applications that they use.
The Web project was started by Tim Berners-Lee at the European Particle Physics Laboratory
(CERN) in Geneva, Switzerland. Tim wanted to find a way for scientists doing projects at CERN
to collaborate with each other on-line. He thought of hypertext as one possible method for this
collaboration.
Tim started the WWW project at CERN in March 1989. In January 1992, the first
versions of WWW software, known as Hypertext Transfer Protocol (HTTP), appeared on
the Internet.
By October 1993, 500 known HTTP servers were active.
When Robelle joined the Internet in June 1994, we were about the 80,000th registered
HTTP server.
By the end of 1994, it was estimated that there were over 500,000 HTTP servers.
Attempts to keep track of the number of HTTP servers on the Internet have not been
successful. Programs that try to automatically count HTTP servers never stop -- new
servers are being added constantly.
5.1.3 Hypertext
Hypertext provides the links between different documents and different document types.
If you have used Microsoft Windows Win Help system or the Macintosh HyperCard application,
you likely know how to use hypertext.
In a hypertext document, links from one place in the document to another are included
with the text. By selecting a link, you are able to jump immediately to another part of the
document or even to a different document. In the WWW, links can go not only from one
document to another, but from one computer to another.
Client/server describes the relationship between two computer programs in which one
program, the client, makes a service request from another program, the server, which fulfills the
request. Although the client/server idea can be used by programs within a single computer, it is a
more important idea in a network. In a network, the client/server model provides a convenient
way to interconnect programs that are distributed efficiently across different locations. Computer
transactions using the client/server model are very common.
The client/server model has become one of the central ideas of network computing. Most
business applications being written today use the client/server model. So does the Internet's main
program, TCP/IP. In marketing, the term has been used to distinguish distributed computing by
smaller dispersed computers from the "monolithic" centralized computing of mainframe
computers. But this distinction has largely disappeared as mainframes and their applications have
also turned to the client/server model and become part of network computing.
In the usual client/server model, one server, sometimes called a daemon, is activated and awaits
client requests. Typically, multiple client programs share the services of a common server
program. Both client programs and server programs are often part of a larger program or
application.
Similarly, your computer with TCP/IP installed allows you to make client requests for
files from File Transfer Protocol (FTP) servers in other computers on the Internet.
Other program relationship models included master/slave, with one program being in
charge of all other programs, and peer-to-peer, with either of two programs able to initiate a
transaction.
The Web Client-Server installation topology enables PR-Tracker clients to connect to the
PR-Tracker server over the Internet or an intranet. It is the recommended installation topology
when PR-Tracker users are working remotely or are in a network domain that is not the that
same domain as the PR-Trace
Ker Server. It may also be used when the server's Firewall software blocks network
communication.
Clients connect to the PR-Tracker Web service by specifying the address of the
prtracker.asmx file in the PR-Tracker.
The diagram above shows a Web Client-Server installation topology where the PR-Tracker
Server hosts the PR-Tracker Web Service. To use this configuration option, a virtual directory
must be created in IIS to host the PR-Tracker Web Service. The PR-Tracker Server configuration
wizard can do this for you. By default, PR-Tracker configures the virtual directory so that it can
be accessed anonymously. If you want additional security on this virtual directory, you must add
it manually.
When you use a WWW client, it communicates with a WWW server using the Hypertext
Transfer Protocol. When you select a WWW link, the following things happen:
The client looks up the hostname and makes a connection with the WWW server.
The HTTP software on the server responds to the client's request.
The client and the server close the connection.
WWW clients use the same technique for other protocols. For example, if you request a directory
at an, the WWW client makes an FTP connection, logs on as an anonymous user, switches to the
directory, requests the directory contents, and then logs off the FTP server.
When you write documents for WWW, you use the Hypertext Markup Language (HTML). In a
markup language, you mix your text with the marks that indicate how formatting is to take place.
Most WWW browsers have an option to "View Source" that will show you the HTML for the
current document that you are viewing. Each WWW browser renders HTML in its own way.
If you want to see how your browser handles standard and non-standard HTML, try the
WWW Test Pattern. The test pattern will show differences between your browser, standard
HTML, and other browsers.
Creating HTML
Creating HTML is awkward, but not that difficult. The most common method of creating
HTML is to write the raw mark-up language using a standard text editor. If you are creating
Bob Green, founder of Robelle, finds to be useful for learning HTML. Instead of hiding
the HTML tags, HTML Writer provides menus with all of the HTML elements and inserts these
into a text window. To see how your documents look, you must use a separate Web browser.
Microsoft has produced a new add-on to Microsoft Word that produces HTML is
available from Microsoft at no charge. You will need to know the basic concepts of Microsoft
Word to take advantage of the Internet Assistant. Since we are not experienced Microsoft Word
users, we found that the Internet Assistant didn't help us much.
The three-tier design has many advantages over traditional two-tier or single-tier designs, the
chief ones being:
The added modularity makes it easier to modify or replace one tier without
affecting the other tiers.
Separating the application functions from the database functions makes it easier to
implement load balancing.
A client, i.e. the computer, which requests the resources, equipped with a user interface
(usually a web browser ) for presentation purposes
The application server (also called middleware), whose task it is to provide the requested
resources, but by calling on another server
The data server, which provides the application server with the data it requires.
flat files
mainframe
robust security
For large (e.g., 1,000 user) applications, a TP monitor is one of the most effective
solutions.
The three-tier design has many advantages over traditional two-tier or single-tier designs,
the chief ones being:
Separating the application functions from the database functions makes it easier to
implement
Both aim at pulling the main body of application logic off the desktop and running it on a
shared host.
An HTTP server is often used as a gateway to a legacy information system; for example, an
existing body of documents or an existing database application. The Common Gateway Interface
is an agreement between HTTP server implementers about how to integrate such gateway scripts
and programs.
How is a form’s data passed to a program that hangs off an HTTP server? It gets passed
using an end-to-end client/server protocol that includes both HTTP and CGI. The So best way to
explain the dynamics of the protocol is to walk you through a POST method invocation.
A CGI:
How the client and server programs play together to process a form’s request. Here’s the step-by-
step explanation of this interaction:
This causes the Web browser to collect the data within the form, and then assemble it into
one long string of name/value pairs each separated by an ampersand (&). The browser translates
spaces within the data into plus (+) symbols. No, it’s not very pretty.
This is an ordinary HTTP request that specifies a POST method, the URL of the target
program in the “cgi-bin” dictionary, and the typical HTTP headers. The message body-HTTP
calls it the “entity”- contains the form’s data. This is the string: name=value &name=value&...
3. The HTTP server receives the method invocation via a socket connection.
The server parses the message and discovers that it’s a POST for the “cgi-bin” program.
So it starts a CGI interaction.
The CGI protocol uses environment variables as a shared bulletin board for
communicating information between the HTTP server and the CGI program. The server typically
provides the following environmental information: server_name, request_method, path_info,
script_name, content_type, and content-length.
The HTTP server executes an instance of the CGI program specified in the URL;it’s
typically in the “cgi=bin” directory.
7. The CGI program receives the message body via the standard input pipe (stdin).
Remember, the message body contains the famous string of name=value items separated
by ampersands (&). The content length environment variable tells the program how much data is
in the string. The CGI program parses the string contents to retrieve the form data. It uses the
content length environment variable to determine how many characters to read in from the
standard input pipe. Cheer up, we’re half way there.
9. The CGI program returns the results via the standard output pipe (stdout).
The program pipes back the results to the HTTP server via its standard output. The HTTP
server receives the results on its standard input. This concludes the CGI interaction.
10. The HTTP server returns the results to the Web browser.
The HTTP server can either append some response headers to the information it receives
from the CGI program, or it sends it “as is” if it’s an nph program.
As you can see, a CGI program is executed in real time; it gets the information and then
builds a dynamic Web page to satisfy a client’s request. CGI makes the Web more dynamic. In
contrast, a plain HTML document is static, which means the text file does not change. CGI may
be clumsy, but it does allow us to interface Web clients to general-purpose back-end services-
such as Amazon.com-as well as to Internet search utilities such as Yahoo! And Excite. You can
even stretch CGI to its limits to create general-purpose client/server programs like the Federal
Express package-tracking Web page. However, Federal Express uses CGI to connect to a TP
Monitor in the backend.
Server-side scripting is used to customize the server response based on the user's
requirements, access rights, or queries into data stores. From a security point of view, the source
code of server-side scripts are never visible to the browser as these scripts are executed on the
server and emit HTML corresponding to the user's input to the page.
When the server serves data in a commonly used manner, for example according to the
HTTP or FTP protocols, users may have their choice of a number of client programs (most
modern web browsers can request and receive data using both of those protocols). In the case of
more specialized applications, programmers may write their own server, client, and
communications protocol that can only be used with one another.
Programs that run on a user's local computer without ever sending or receiving data over a
network are not considered clients, and so the operations of such programs would not be
considered client-side operations.
5.5.1 History
Server-side scripting was invented in early 1995 by Fred DuFresne while developing the first
web site for Boston, MA television station WCVB. The technology is described in US patent
5835712. The patent was issued in 1998 and is now owned by Open Invention Network (OIN).
In 2010 OIN named Fred DuFresne a "Distinguished Inventor" for his work on server-side
scripting.
In the earlier days of the web, server-side scripting was almost exclusively performed by
using a combination of C programs, Perl scripts and shell scripts using the Common Gateway
Interface (CGI). Those scripts were executed by the operating system, mnemonic coding and the
results simply served back by the web server. These and other on-line scripting languages such
as ASP and PHP can often be executed directly by the web server itself or by extension modules
(e.g. mod_perl or mod php) to the web server.
WebDNA includes its own embedded database system. Either form of scripting (i.e., CGI
or direct execution) can be used to build up complex multi-page sites, but direct execution
usually results in lower overhead due to the lack of calls to external interpreters.
Dynamic websites are also sometimes powered by custom web application servers, for
example the Python "Base HTTP Server" library, although some may not consider this to be
server-side scripting. When working with dynamic Web-based scripting technologies, like
classic ASP or PHP, developers must have a keen understanding of the logical, temporal, and
physical separation between the client and the server.
5.6.1 Introduction
A CGI program is any program designed to accept and return data that conforms to the
CGI specification. The program could be written in any programming language, including C,
Perl, Java, or Visual Basic.
CGI programs are the most common way for Web servers to interact dynamically with
users. Many HTML pages that contain forms, for example, use a CGI program to process the
form's data once it's submitted. Another increasingly common way to provide dynamic feedback
for Web users is to include scripts or programs that run on the user's machine rather than the
Web server. These programs can be Java applets, Java scripts, or ActiveX controls. These
One problem with CGI is that each time a CGI script is executed, a new process is started. For
busy Web sites, this can slow down the server noticeably. A more efficient solution, but one that
it is also more difficult to implement, is to use the server's API, such as ISAPI or NSAPI.
Another increasingly popular solution is to use Java servlets.
DIAGRAM OF CGI
CGI turns the Web from a simple collection of static hypermedia documents into a whole
new interactive medium, in which users can ask questions and run applications. Let's take a look
at some of the possible applications that can be designed using CGI.
5.6.3 Forms
One of the most prominent uses of CGI is in processing forms. Forms are a subset of
HTML that allows the user to supply information. The forms interface makes Web browsing an
interactive process for the user and the provider shows a simple form.
Generally, forms are used for two main purposes. At their simplest, forms can be used to collect
information from the user. But they can also be used in a more complex manner to provide back-
and-forth interaction. For example, the user can be presented with a form listing the various
documents available on the server, as well as an option to search for particular information
5.6.4 Gateways
Web gateways are programs or scripts used to access information that is not directly
readable by the client.
CGI provides a solution to the problem in the form of a gateway. You can use a language
such as oraperl (see Chapter 9, Gateways, Databases, and Search/Index Utilities, for more
information) or a DBI extension to Perl to form SQL queries to read the information contained
within the database. Once you have the information, you can format and send it to the client. In
this case, the CGI program serves as a gateway to the Oracle database, as shown in Figure 1.3
Similarly, you can write gateway programs to any other Internet information service,
including Archie, WAIS, and NNTP (Usenet News), shows examples of interacting with other
Internet services. In addition, you can amplify the power of gateways by using the forms
Virtual, or dynamic, document creation is at the heart of CGI. Virtual documents are created
on the fly in response to a user's information request. You can create virtual HTML, plain text,
image, and even audio documents. A simple example of a virtual document could be something
as trivial as this:
In this example, there are two pieces of dynamic information: the alphanumeric address (IP
name) of the remote user and the load average on the serving machine. This is a very simple
example, indeed!
On the other hand, very complex virtual documents can be created by writing programs
that use a combination of graphics libraries, gateways, and forms. As a more sophisticated
example, say you are the manager of an art gallery that specializes in selling replicas of ancient
Renaissance paintings and you are interested in presenting images of these masterpieces on the
Web. You start out by creating a form that asks for user information for the purpose of
promotional mailings, presents a search field for the user to enter the name of a painting, as well
as a selection list containing popular paintings.
Once the user submits the form to the server, a program can email the user information to
a certain address, or store it in a file. And depending on the user's selection, either a message
stating that the painting does not exist or an image of the painting can be displayed along with
some historical information located elsewhere on the Internet.
This is an example of a more complex CGI program using many aspects of CGI programming.
Several such examples will be presented in this book.
SQL Server is a relational database management system from that’s designed for the
environment. SQL Server runs on (Transact -), a set of programming s from and Microsoft that
add several features to standard SQL, including transaction control, exception and , row
processing, and declared s.
Code named in development, SQL Server 2005 was released in November 2005. The
2005 product is said to provide enhanced flexibility,, reliability, and to applications, and to make
them easier to create and deploy, thus reducing the complexity and tedium involved in . SQL
Server 2005 also includes more administrative support.
The original SQL Server code was developed by Sybase; in the late 1980s, Microsoft,
Sybase and Ashton-Tate collaborated to produce the first version of the product, SQL Server 4.2
for . Subsequently, both Sybase and offered SQL Server products. Sybase has since renamed
their product
5.7.1 History
Prior to version 7.0 the for MS SQL Server was sold by to Microsoft, and was
Microsoft's entry to the enterprise-level database market, competing against
Oracle, IBM, and, later.
SQL Server 2005 (codename Yukon) was released in October 2005. It included native
support for managing XML data, in addition to relational data. For this purpose, it defined an
xml data type that could be used either as a data type in database columns or as literals in
queries. XML columns can be associated with XSD schemas such as;
XML data being stored is verified against the schema. XML is converted to an internal
binary data type before being stored in the database. Specialized indexing methods
were made available for XML data.
SQL Server 2005 also allows a database server to be exposed over web services using
Tabular Data Stream (TDS) packets encapsulated within SOAP (protocol) requests.
When the data is accessed over web services, results are returned as XML.
Common Language Runtime (CLR) integration was introduced with this version,
enabling one to write SQL code as Managed Code by the CLR.
For relational data, T-SQL has been augmented with error handling features (try/catch)
and support for recursive queries with CTEs (Common Table Expressions).
SQL Server 2005 has also been enhanced with new indexing algorithms, syntax and
better error recovery systems. Data pages are check summed for better error resiliency, and
optimistic concurrency support has been added for better performance. Permissions and access
control have been made more granular and the query processor handles concurrent execution of
queries in a more efficient way. Partitions on tables and indexes are supported natively, so
scaling out a database onto a cluster is easier.
SQL CLR was introduced with SQL Server 2005 to let it integrate with the .NET
Framework.
SQL Server 2005 introduced "MARS" (Multiple Active Results Sets), a method of
allowing usage of database connections for multiple purposes.
SQL Server 2005 introduced DMVs (Dynamic Management Views), which are
specialized views and functions that return server state information that can be used to
monitor the health of a server instance, diagnose problems, and tune performance.
Service Pack 1 (SP1) of SQL Server 2005 introduced Database Mirroring, a high
availability option that provides redundancy and failover capabilities at the database
level.
SQL Server 2008 (codename Katmai) was released on August 6, 2008 and aims to make
data management self organizing, and self maintaining with the development of SQL Server
Always On technologies, to provide near-zero downtime. SQL Server 2008 also includes support
for and semi-structured data, including digital media formats for pictures, audio, video and other
multimedia data. In current versions, such multimedia data can be stored as (binary large
objects), but they are generic bitstreams. Intrinsic awareness of multimedia data will allow
specialized functions to be performed on them. According to, senior Vice President, Server
Applications, ., SQL Server 2008 can be a data storage backend for different varieties of data:
XML, email, time/calendar, file, document, spatial, etc as well as perform search, query,
analysis, sharing, and synchronization across all data types.
Other new data types include specialized date and time types and a Spatial data type for
location-dependent data. Better support for unstructured and semi-structured data is provided
using the new FILESTREAM data type, which can be used to reference any file stored on the file
system.
Structured data and metadata about the file is stored in SQL Server database, whereas the
unstructured component is stored in the file system. Such files can be accessed both via file
handling as well as via SQL Server using; doing the latter accesses the file data as a BLOB.
Backing up and restoring the database backs up or restores the referenced files as well. SQL
Server 2008 also natively supports hierarchical data, and includes constructs to directly deal with
them, without using recursive queries.
The Full-text search functionality has been integrated with the database engine.
According to a Microsoft technical article, this simplifies management and improves
performance.
SQL Server 2008 R2 (formerly codenamed SQL Server "Kilimanjaro") was announced at
TechEd 2009, and was released to manufacturing on April 21, 2010. SQL Server 2008 R2 adds
that enables packaging of tiered databases as part of an application, and a SQL Server
Utility named UC (Utility Control Point), part of AMSM (Application and Multi-Server
Management) that is used to manage multiple SQL Servers.
At the 2011 Professional Association for SQL Server (PASS) summit on October 11,
Microsoft announced that the next major version of SQL Server, codenamed Denali, would be
SQL Server 2012. It was released to manufacturing on March 6, 2012.
It was announced to be last version to natively support OLE DB and instead to prefer
ODBC for native connectivity. This announcement has caused some controversy.
SQL Server 2012's new features and enhancements include always on SQL Server
Failover Cluster Instances and Availability Groups which provides a set of options to
The Mimer SQL DBMS is based on client/server architecture. The Database Server
executes in one single, multi-threaded process with multiple Request and Background threads.
On some platforms Communication threads are used. The Mimer SQL architecture is truly
multi-threaded, with requests being dynamically allocated to the different Request threads. As
threads scale very well over multiple CPUs, Mimer SQL is particularly well suited for symmetric
multiprocessor (SMP) environments. By the use of threads within the Database Server, optimal
efficiency is achieved when context-switching in the Database Server.
It also ensures that the application can only view data that has been formerly passed to the client
side, which is extremely important from a data security point of view.
The Communication threads are used to handle parts of the communication between the
applications and the database server. On some platforms other mechanisms are used to handle
the communication between the applications and the database server. Whatever the mechanism,
all communication with the database server is multi-threaded, allowing large numbers of
simultaneous user requests.
Both local and remote applications are handled directly by the Database Server. This means that
in Client/Server environments, where Mimer SQL executes in a distributed environment with
the client and server on different machines, all remote clients connect directly to the Database
Server. Thereby avoiding any additional overhead of network service processes being started,
either on the client or on the server machine.
The Request threads perform the SQL operations requested by the applications. When the
Database Server is requested to perform a SQL operation it allocates one of its Request threads
When a SQL query or a stored routine is executed by a Request thread, the compiled version of
the query or the routine is stored within the Database Server. In this way the same, compiled
version of the query or routine can be used again by other applications. This leads to improved
performance, since a SQL query or a stored routine only need to be compiled once by the
Database Server.
The Background threads perform database services including all database updates, online
backup and database shadowing. These services are performed asynchronously in the
background to the application processes, which means that the application process does not have
to wait for the physical completion of a transaction or a shadow update, but can continue as soon
as the transaction has been prepared and secured to disk.
I/O-operations are performed in parallel directly by the request and background threads using
asynchronous I/O. Thereby any need for separate I/O-threads are avoided.
In a large modern enterprise, it is almost inevitable that different portions of the organization will
use different database management systems to store and search their critical data. Competition,
evolving technology, mergers, acquisitions, geographic distribution, and the inevitable
decentralization of growth all contribute to this diversity. Yet it is only by combining the
information from these systems that the enterprise can realize the full value of the data they
contain.
For example, in the finance industry, mergers are an almost commonplace occurrence. The
newly created entity inherits the data stores of the original institutions. Many of those stores will
be relational database management systems, but often from different manufacturers; for instance,
The Garlic project demonstrated the feasibility of extending this idea to build a federated
database system that effectively exploits the query capabilities of diverse, possibly non-relational
data sources. In both of these systems, as in today's DB2, a middleware query processor develops
optimized execution plans and compensates for any functionality that the data sources may lack.
In this article, we describe the key characteristics of IBM's federated technology: transparency,
heterogeneity, a high degree of function, autonomy for the underlying federated sources,
extensibility, openness, and optimized performance. We then "roll back the covers" to show how
IBM's database federation capabilities work. We illustrate how the federated capabilities can be
used in a variety of scenarios, and conclude with some directions for the future.
TRANSPARENCY
If a federated system is transparent, it masks from the user the differences, idiosyncracies, and
implementations of the underlying data sources. Ideally, it makes the set of federated sources
look to the user like a single system. The user should not need to be aware of where the data is
stored (location transparency), what language or programming interface is supported by the data
source (invocation transparency), if SQL is used, what dialect of SQL the source supports
(dialect transparency), how the data is physically stored, or whether it is partitioned and/or
replicated (physical data independence, fragmentation and replication transparency), or what
networking protocols are used (network transparency). The user should see a single uniform
interface, complete with a single set of error codes (error code transparency). IBM provides all
these features, allowing applications to be written as if all the data were in a single database,
although, in fact, the data may be stored in a heterogeneous collection of data sources.
Heterogeneity is the degree of differentiation in the various data sources. Sources can differ in
many ways. They may run on different hardware, use different network protocols, and have
different software to manage their data stores. They may have different query languages,
different query capabilities, and even different data models. They may handle errors differently,
or provide different transaction semantics. They may be as much alike as two Oracle instances,
one running Oracle 8i, and the other Oracle 9i, with the same or different schemas. Or they may
be as diverse as a high-powered relational database, a simple, structured flat file, a web site that
takes queries in the form of URLs and spits back semi-structured XML according to some DTD,
a Web service, and an application that responds to a particular set of function calls. IBM's
federated database can accommodate all of these differences, encompassing systems such as
these in a seamless, transparent federation.
IBM's federated capability provides users with the best of both worlds: all the function of its rich,
standard-compliant DB2 SQL capability against all the data in the federation, as well as all the
function of the underlying data sources. DB2's SQL includes support for many complex query
features, including inner and outer joins, nested sub queries and table expressions, recursion,
user-defined functions, aggregation, statistical analyses, automatic summary tables, and others
too numerous to mention. Many data sources may not provide all of these features. However,
users still get the full power of DB2 SQL on these sources' data, because of function
compensation. Function compensation means that if a data source cannot do a particular query
function, the federated database retrieves the necessary data and applies the function itself. For
example, a file system typically cannot do arbitrary sorts. However, users can still request that
data from that source (ie, some subset of a file) be retrieved in some order, or ask that duplicates
be eliminated from that data. The federated database will simply retrieve the relevant data, and
do the sort itself.
All systems need to evolve over time. In a federated system, new sources may be needed to meet
the changing needs of the users' business. IBM makes it easy to add new sources. The federated
database engine accesses sources via a software component know as a wrapper. Accessing a new
type of data source is done by acquiring or creating a wrapper for that source. The wrapper
architecture enables the creation of new wrappers. Once a wrapper exists, simple data definition
(DDL) statements allow sources to be dynamically added to the federation without stopping
ongoing queries or transactions.
Any data source can be wrapped. IBM supports the ANSI SQL/MED standard (MED stands for
Management of External Data). This standard documents the protocols used by a federated
server to communicate with external data sources. Any wrapper written to the SQL/MED
interface can be used with IBM's federated database. Thus wrappers can be written by third
parties as well as IBM, and used in conjunction with IBM's federated database.
Typically a data source has existing applications and users. It is important, therefore, that the
operation of the source is not affected when it is brought into a federation. IBM's federated
database does not disturb the local operation of an existing data source. Existing applications will
Unlike other products, our wrapper architecture does not require any software to be installed on
the machine that hosts the data source.
Optimized Performance
The optimizer is the component of a relational database management system that determines the
best way to execute each query. Relational queries are non-procedural and there are typically
several different implementations of each relational operator and many possible orderings of
operators to choose from in executing a query. While some optimizers use heuristic rules to
choose an execution strategy, IBM's federated database considers the various possible strategies,
modeling the likely cost of each, and choosing the one with the least cost. (Typically, cost is
measured in terms of system resources consumed).
In a federated system, the optimizer must decide whether the different operations involved in a
query should be done by the federated server or by the source where the data is stored. It must
also determine the order of the operations, and what implementations to use to do local portions
of the query. To make these decisions, the optimizer must have some way of knowing what each
data source can do, and how much it costs. For example, if the data source is a file, it would not
make sense to assume it was smart, and ask it to perform a sort or to apply some function. On the
other hand, if the source is a relational database system capable of applying predicates and doing
joins, it might be a good idea to take advantage of its power if it will reduce the amount of data
that needs to be brought back to the federated engine. This will typically depend on the details of
federated database architecture is shown in. Applications can use any supported interface
(including ODBC, JDBC, or a Web service client) to interact with the federated server.
The federated server communicates with the data sources by means of software modules called
wrappers
The AS TEMPLATE clause tells the federated database that there is no local implementation of
the function. Next, a CREATE FUNCTION MAPPING statement tells the federated database
what server can evaluate the function. Several function mappings may be created for the same
function. For our example, the following statement accomplishes the mapping:
The above DDL statements produce metadata describing the information about nicknames and
the signatures of mapped functions. This metadata is used by the federated query processing
engine and is stored in the global catalogues of the federated database.
After the federated system is configured, an application can submit a query written in SQL to a
federated server. The federated server optimizes the query, developing an execution plan in
which the query has been decomposed into fragments that can be executed at individual data
sources. As mentioned above, many decompositions of the query are possible, and the optimizer
chooses among alternatives on the basis of minimum estimated total resource consumption. Once
a plan has been selected, the federated database drives the execution, invoking the wrappers to
execute the fragments assigned to them
The optimizer works differently with relational and non-relational wrappers. The optimizer
models relational sources in detail, using information provided by the wrapper to generate plans
that represent what it expects the source to do.
However, because non-relational sources do not have a common set of operations or common
data model, a more flexible arrangement is required with these sources.
The IBM federated database submits candidate query fragments called "requests" to a
wrapper if the query fragments apply to a single source.
When a non-relational wrapper receives a request, it determines what portion, if any, of
the corresponding query fragment can be performed by the data source.
The wrapper returns a reply that describes the accepted portion of the fragment.
The reply also includes an estimate of the number of rows that will be produced, an
estimate of the total execution time, and a wrapper plan.
An encapsulated representation of everything the wrapper will need to know to execute
the accepted portion of the fragment.
The federated database optimizer incorporates the reply into a global plan, introducing
additional operators as necessary to compensate for portions of fragments that were not
accepted by a wrapper.
The cost and cardinality information from the replies is used to estimate the total cost of
the plan, and the plan with minimum total cost is selected from among all the candidates.
When a plan is selected, it need not be executed immediately; it can be stored in the
database catalogues and subsequently used one or more times to execute the query.
Even if a plan is used immediately, it need not be executed in the same process in which
it was created, as illustrated in Figure 3.
A data warehouse is a relational database that is designed for query and analysis rather than for
transaction processing. It usually contains historical data derived from transaction data, but it can
include data from other sources. It separates analysis workload from transaction workload and
enables an organization to consolidate data from several sources.
Subject Oriented
Integrated
Non volatile
Time Variant
Subject Oriented
Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented.
Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be
integrated.
Nonvolatile
Nonvolatile means that, once entered into the warehouse, data should not change. This is logical
because the purpose of a warehouse is to enable you to analyze what has occurred.
Time Variant
In order to discover trends in business, analysts need large amounts of data. This is very much in
contrast to online transaction processing (OLTP) systems, where performance requirements
demand that historical data be moved to an archive. A data warehouse's focus on change over
time is what is meant by the term time variant
One major difference between the types of system is that data warehouses are not usually in
third normal form (3NF), a type of data normalization common in OLTP environments.
Data warehouses and OLTP systems have very different requirements. Here are some examples
of differences between typical data warehouses and OLTP systems:
Implementing a distributed data warehouse has been shown to provide higher availability and
lower overall cost.
An enterprise can create several data marts that store only high level summaries of data derived
from the warehouse. With IBM's federated technology, data marts and warehouse can be on
separate systems, yet users of the data mart can still drill down with ease from their local level of
summarization into the warehouse. Federated technology shields the users, who have no need to
know that the data warehouse is distributed, by providing a virtual data warehouse.
Data warehouses and their architectures vary depending upon the specifics of an organization's
situation. Three common architectures are:
shows a simple architecture for a data warehouse. End users directly access data derived from
several source systems through the data warehouse.
In Figure 1-2, the metadata and raw data of a traditional OLTP system is present, as is an
additional type of data, summary data. Summaries are very valuable in data warehouses because
they pre-compute long operations in advance. For example, a typical data warehouse query is to
retrieve something like August sales. A summary in Oracle is called a materialized view.
In Figure 1-2, you need to clean and process your operational data before putting it into the
warehouse. You can do this programmatically, although most data warehouses use a staging
area instead. A staging area simplifies building summaries and general warehouse management.
Figure 1-3 illustrates this typical architecture.
Although the architecture in Figure 1-3 is quite common, you may want to customize your
warehouse's architecture for different groups within your organization. You can do this by
adding data marts, which are systems designed for a particular line of business
A massively parallel processing (MPP) SQL Server appliance optimized for large-scale data
warehousing such as hundreds of terabytes.
5.10.3 Architecture
Protocol layer
Protocol layer implements the external interface to SQL Server. All operations that can be
invoked on SQL Server are communicated to it via a Microsoft-defined format, called Tabular
Data Stream (TDS). TDS is an application layer protocol, used to transfer data between a
database server and a client. Initially designed and developed by Sybase Inc. for their Sybase
SQL Server relational database engine in 1984, and later by Microsoft in Microsoft SQL Server,
TDS packets can be encased in other physical transport dependent protocols, including TCP/IP,
Named pipes, and Shared memory. Consequently, access to SQL Server is available over these
protocols. In addition, the SQL Server API is also exposed over web services.
Data storage
The main unit of data storage is a database, which is a collection of tables with typed columns.
SQL Server supports different data types, including primary types such as Integer, Float,
Microsoft SQL Server also allows user-defined composite types (UDTs) to be defined and used.
It also makes server statistics available as virtual tables and views (called Dynamic Management
Views or DMVs). In addition to tables, a database can also contain other objects including views,
stored procedures, indexes and constraints, along with a transaction log.
Buffer management
SQL Server buffers pages in RAM to minimize disc I/O. Any 8 KB page can be buffered in-
memory, and the set of all pages currently buffered is called the buffer cache. The amount of
memory available to SQL Server decides how many pages will be cached in memory.
The buffer cache is managed by the Buffer Manager. Either reading from or writing to any page
copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy,
rather than the on-disc version. The page is updated on the disc by the Buffer Manager only if
the in-memory cache has not been referenced for some time. While writing pages back to disc,
asynchronous I/O is used whereby the I/O operation is done in a background thread so that other
operations do not have to wait for the I/O operation to complete. Each page is written along with
its checksum when it is written. When reading the page back, its checksum is computed again
and matched with the stored version to ensure the page has not been damaged or tampered with
in the meantime.
SQL Server ensures that any change to the data is ACID-compliant, i.e. it uses transactions to
ensure that the database will always revert to a known consistent state on failure. Each
transaction may consist of multiple SQL statements all of which will only make a permanent
Any changes made to any page will update the in-memory cache of the page, simultaneously all
the operations performed will be written to a log, along with the transaction ID which the
operation was a part of. Each log entry is identified by an increasing Log Sequence Number
(LSN) which is used to ensure that all changes are written to the data files. Also during a log
restore it is used to check that no logs are duplicated or skipped. SQL Server requires that the log
is written onto the disc before the data page is written back. It must also ensure that all
operations in a transaction are written to the log before any COMMIT operation is reported as
completed.
SQL Server allows multiple clients to use the same database concurrently. As such, it needs to
control concurrent access to shared data, to ensure data integrity - when multiple clients update
the same data, or clients attempt to read data that is in the process of being changed by another
client. SQL Server provides two modes of concurrency control: pessimistic concurrency and
optimistic concurrency.
When pessimistic concurrency control is being used, SQL Server controls concurrent access by
using locks. Locks can be either shared or exclusive. Exclusive lock grants the user exclusive
access to the data - no other user can access the data as long as the lock is held. Shared locks are
used when some data is being read - multiple users can read from data locked with a shared lock,
but not acquire an exclusive lock. The latter would have to wait for all shared locks to be
released. Locks can be applied on different levels of granularity - on entire tables, pages, or even
on a per-row basis on tables. For indexes, it can either be on the entire index or on index leaves.
The Lock Manager maintains an in-memory table that manages the database objects and locks, if
any, on them along with other metadata about the lock. Access to any shared object is mediated
by the lock manager, which either grants access to the resource or blocks it.
Data retrieval
The main mode of retrieving data from an SQL Server database is querying for it. The query is
expressed using a variant of SQL called T-SQL, a dialect Microsoft SQL Server shares with
Sybase SQL Server due to its legacy. The query declaratively specifies what is to be retrieved.
It is processed by the query processor, which figures out the sequence of steps that will be
necessary to retrieve the requested data. The sequence of actions necessary to execute a query is
called a query plan. There might be multiple ways to process the same query.
5.11 EIS/DSS
Purpose
Contrary to a widespread misconception, NEPA does not prohibit the federal government
or its licensees/permittees from harming the environment, but merely requires that the
prospective impacts be understood and disclosed in advance. The intent of NEPA is to
help key decision makers and stakeholders balance the need to implement an action with
its impacts on the surrounding human and natural environment
DSSs include. A properly designed DSS is an interactive software-based system intended to help
decision makers compile useful information from a combination of raw data, documents, and
personal knowledge, or business models to identify and solve problems and make decisions.
Typical information that a decision support application might gather and present includes:
Data
Data are any facts, numbers, or text that can be processed by a computer. Today, organizations
are accumulating vast and growing amounts of data in different formats and different databases.
This includes:
operational or transactional data such as, sales, cost, inventory, payroll, and accounting
nonoperational data, such as industry sales, forecast data, and macro economic data
meta data - data about the data itself, such as logical database design or data dictionary
definitions
Information
The patterns, associations, or relationships among all this data can provide information. For
example, analysis of retail point of sale transaction data can yield information on which products
are selling and when.
Information can be converted into knowledge about historical patterns and future trends. For
example, summary information on retail supermarket sales can be analyzed in light of
promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or
retailer could determine which items are most susceptible to promotional efforts.
Data mining is primarily used today by companies with a strong consumer focus - retail,
financial, communication, and marketing organizations. It enables these companies to determine
relationships among "internal" factors such as price, product positioning, or staff skills, and
"external" factors such as economic indicators, competition, and customer demographics. And, it
enables them to determine the impact on sales, customer satisfaction, and corporate profits.
Finally, it enables them to "drill down" into summary information to view detail transactional
data.
With data mining, a retailer could use point-of-sale records of customer purchases to send
targeted promotions based on an individual's purchase history. By mining demographic data
from comment or warranty cards, the retailer could develop products and promotions to appeal to
specific customer segments.
While large-scale information technology has been evolving separate transaction and analytical
systems, data mining provides the link between the two. Data mining software analyzes
relationships and patterns in stored transaction data based on open-ended user queries. Several
types of analytical software are available: statistical, machine learning, and neural networks.
Generally, any of four types of relationships are sought:
Classes: Stored data is used to locate data in predetermined groups. For example, a
restaurant chain could mine customer purchase data to determine when customers visit
and what they typically order. This information could be used to increase traffic by
having daily specials.
Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a backpack being
purchased based on a consumer's purchase of sleeping bags and hiking shoes.
Extract, transform, and load transaction data onto the data warehouse system.
Artificial neural networks: Non-linear predictive models that learn through training and
resemble biological neural networks in structure.
Nearest neighbour method: A technique that classifies each record in a dataset based on
a combination of the classes of the k record(s) most similar to it in a historical dataset
(where k 1). Sometimes called the k-nearest neighbour technique.
Rule induction: The extraction of useful if-then rules from data based on statistical
significance.
Groupware packages offered by different software vendors will include different features
and functions, but most typically include the following components:
Groupware functionality may also include the ability to control who sees a given piece of
information. Access can be limited to specifically named individuals or to members of a group,
such as all managers, members of the accounting department, or those employees working on a
sensitive or confidential project. For example, job descriptions may be accessible to all users, but
Groupware software can be divided into two categories: server and client. Depending on
the size of an organization and the number of users, the software is installed on one or more
computers (called "servers") in a network. Server software can also be installed on computers
located in other locations, from a nearby city to another state or country. The server houses the
actual applications and the associated information entered into them by the various users. If more
than one server is used, the servers will "speak" to one another in a process called "replication."
In this way, information held in the same database, but in a different locations or on a different
server, is exchanged between the servers. Once this is accomplished, the servers are said to be
"synchronized."
Each person using the groupware has the client software installed on his or her desktop or
laptop computer. The client software enables the user to access and interact with the applications
stored on the servers. Some users may be "remote;" that is, they are not in the office on a full-
time basis but rather use a modem or other type of connection to access and use the groupware.
PART – B
1. Briefly explain about Client server and internet? (April-2009)
2. Discuss about Web client server.
3. Briefly explain about 3 tier client server web style?
4. Briefly explain about CGI and State?
5. Discuss SQL database servers. (April-2010)
6. Discuss merits and demerits Middleware and federated databases.
7. Briefly explain about Data warehouses?
8. Explain EIS/DSS to data mining? (April-2010)
9. Briefly explain about GroupWare Server?
10. Explain Components of GroupWare?
11. Describe about 3-tier client/server web style. (April-2009)
12. Explain CGI scenario based on the web client/server in the interactive era.
13. Brief description about SQL Database Server Architectures with ISO Standards.
14. Give brief explanation for CGI and STATE.
15. How to structure the flow of text in HTML document.