0% found this document useful (0 votes)
8 views

CS Unit 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
8 views

CS Unit 5

Copyright
© © All Rights Reserved
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 61

UNIT-V

CLIENT SERVER AND INTERNET

Know more @ www.vidyarthiplus.com


CONTENTS

5.1 Introduction about Internet


5.1.1 WWW

5.1.2 History of Web

5.1.3 Hypertext

5.2 Web Client/Server


5.2.1 Web Client /Server Topology

5.2.2 Languages of Web

5.2.3 HTTP

5.2.4 HTML

5.3 3-Tiers Client/Server Web Style


5.3.1 Introduction

5.3.2 3-Tier TP monitors

5.3.3 3-Tier Applications

5.4 CGI (Common Gateway Interface)

5.5 Server Side of Web


5.5.1 History

5.5.2 Explanation

5.6 CGI and State


5.6.1 Introduction of CGI

5.6.2 Applications

5.6.3 Forms

Know more @ www.vidyarthiplus.com


5.6.4 Gateways

5.6.5 Virtual Documents

5.7 SQL Database Server


5.7.1 History

5.7.2 SQL Server 2005

5.7.3 SQL Server 2008

5.7.4 SQL Server 2008 R2

5.7.5 SQL Server 2012

5.7.6 Architecture

5.8 Middleware and Federated Database


5.8.1 Technology

5.8.2 Characteristics of Federated Solution

5.8.3 Architecture

5.9 Query Processing

5.10 Data Warehouse Concepts


5.10.1 Characteristics of data warehouses

5.10.2 Distributed data warehouses

5.10.3 Architecture

5.10.4 Data warehouse (PDW parallel)

5.11 EIS/DSS
5.11.1 EIS (environment impact statement)

5.11.2 DSS (decision support system)

5.12 Data mining

Know more @ www.vidyarthiplus.com


5.12.1 Overview

5.12.2 Data, information & knowledge

5.12.3 What can data mining do?

5.12.4 How does data mining work?

5.13 Groupware server


5.13.1 Definition

5.13.2 Features of groupware server

5.13.3 How groupware works?

5.13.4 Groupware in action

5.14 Question Bank

Know more @ www.vidyarthiplus.com


TECHNICAL TERMS

1. World Wide Web

The WWW project has the potential to do for the Internet what Graphical User
Interfaces (GUIs) have done for personal computers -- make the Net useful to end users.

2. Hypertext
Hypertext provides the links between different documents and different document types.
If you have used Microsoft Windows Win Help system or the Macintosh HyperCard
application, you likely know how to use hypertext. In a hypertext document, links from
one place in the document to another are included with the text.

3. Uniform Resource Locators (URLs)


URLs provide the hypertext links between one document and another. These links can
access a variety of protocols (e.g., ftp, gopher, or http) on different machines (or your
own machine).
4. Common Gateway Interfaces (CGI)

Servers use the CGI interface to execute local programs. CGIs provide a gateway
between the HTTP server software and the host machine.

5. Hypertext Markup Language (HTML)

Know more @ www.vidyarthiplus.com


In a markup language, the text is mixed with the marks that indicate how formatting is
to take place. For example, Lynx and Mosaic do not insert a blank line before
unnumbered user lists, but Netscape does.

6. Forms

One of the most prominent uses of CGI is in processing forms. Forms are a subset of
HTML that allows the user to supply information. The forms interface makes Web
browsing an interactive process for the user and the provider and it shows a simple
form.

7. Gateways

Web gateways are programs or scripts used to access information that is not directly
readable by the client. CGI provides a solution to the problem in the form of a gateway.

8. Communication threads are used to handle parts of the communication between the
applications and the database server.

9. Request threads perform the SQL operations requested by the applications. When the
Database Server is requested to perform a SQL operation it allocates one of its Request threads
to perform the task.

Know more @ www.vidyarthiplus.com


CLIENT/SERVER AND INTERNET

5.1 Introduction about Internet

The WWW is a new way of viewing information -- and a rather different one. If, for
example, you are viewing this paper as a WWW document, you will view it with a browser, in
which case you can immediately access hypertext links. If you are reading this on paper, you will
see the links indicated in parentheses and in a different font. Keep in mind that the WWW is
constantly evolving. We have tried to pick stable links, but sites reorganize and sometimes they
even move. By the time you read the printed version of this paper, some WWW links may have
changed.

5.1.1 World Wide Web

 The WWW project has the potential to do for the Internet what Graphical User Interfaces
(GUIs) have done for personal computers -- make the Net useful to end users. The Internet
contains vast resources in many fields of study (not just in computer and technical
information). In the past, finding and using these resources has been difficult.
 The Web provides consistency: Servers provide information in a consistent way and clients
show information in a consistent way. To add a further thread of consistency, many users
view the Web through graphical browsers which are like other windows (Microsoft
Windows, Macintosh windows, or X-Windows) applications that they use.

Know more @ www.vidyarthiplus.com


 A principal feature of the Web is its links between one document and another. These links,
described in the section on hypertext, allow you to move from one document to another.
Hypertext links can point to any server connected to the Internet and to any type of file.
These links are what transform the Internet into a web.

5.1.2 History of the Web

The Web project was started by Tim Berners-Lee at the European Particle Physics Laboratory
(CERN) in Geneva, Switzerland. Tim wanted to find a way for scientists doing projects at CERN
to collaborate with each other on-line. He thought of hypertext as one possible method for this
collaboration.
 Tim started the WWW project at CERN in March 1989. In January 1992, the first
versions of WWW software, known as Hypertext Transfer Protocol (HTTP), appeared on
the Internet.
 By October 1993, 500 known HTTP servers were active.
 When Robelle joined the Internet in June 1994, we were about the 80,000th registered
HTTP server.
 By the end of 1994, it was estimated that there were over 500,000 HTTP servers.
Attempts to keep track of the number of HTTP servers on the Internet have not been
successful. Programs that try to automatically count HTTP servers never stop -- new
servers are being added constantly.

5.1.3 Hypertext
Hypertext provides the links between different documents and different document types.
If you have used Microsoft Windows Win Help system or the Macintosh HyperCard application,
you likely know how to use hypertext.
In a hypertext document, links from one place in the document to another are included
with the text. By selecting a link, you are able to jump immediately to another part of the
document or even to a different document. In the WWW, links can go not only from one
document to another, but from one computer to another.

Know more @ www.vidyarthiplus.com


5.2 Web Client/server

Client/server describes the relationship between two computer programs in which one
program, the client, makes a service request from another program, the server, which fulfills the
request. Although the client/server idea can be used by programs within a single computer, it is a
more important idea in a network. In a network, the client/server model provides a convenient
way to interconnect programs that are distributed efficiently across different locations. Computer
transactions using the client/server model are very common.

For example, to check your bank account from your computer,

 A client program in your computer forwards your request to a server program at


the bank.
 That program may in turn forward the request to its own client program that sends
a request to a database server at another bank computer to retrieve your account
balance.
 The balance is returned back to the bank data client, which in turn serves it back
to the client in your personal computer, which displays the information for you.

The client/server model has become one of the central ideas of network computing. Most
business applications being written today use the client/server model. So does the Internet's main
program, TCP/IP. In marketing, the term has been used to distinguish distributed computing by
smaller dispersed computers from the "monolithic" centralized computing of mainframe
computers. But this distinction has largely disappeared as mainframes and their applications have
also turned to the client/server model and become part of network computing.

In the usual client/server model, one server, sometimes called a daemon, is activated and awaits
client requests. Typically, multiple client programs share the services of a common server
program. Both client programs and server programs are often part of a larger program or
application.

Know more @ www.vidyarthiplus.com


Relative to the Internet, your Web browser is a client program that requests services (the
sending of Web pages or files) from a Web server (which technically is called a Hypertext
Transport Protocol or HTTP server) in another computer somewhere on the Internet.

Similarly, your computer with TCP/IP installed allows you to make client requests for
files from File Transfer Protocol (FTP) servers in other computers on the Internet.

Other program relationship models included master/slave, with one program being in
charge of all other programs, and peer-to-peer, with either of two programs able to initiate a
transaction.

5.2.1 Web Client-Server Topology

The Web Client-Server installation topology enables PR-Tracker clients to connect to the
PR-Tracker server over the Internet or an intranet. It is the recommended installation topology
when PR-Tracker users are working remotely or are in a network domain that is not the that
same domain as the PR-Trace

 Ker Server. It may also be used when the server's Firewall software blocks network
communication.
 Clients connect to the PR-Tracker Web service by specifying the address of the
prtracker.asmx file in the PR-Tracker.

Know more @ www.vidyarthiplus.com


Figure 5.1 Web client-server installation topology

The diagram above shows a Web Client-Server installation topology where the PR-Tracker
Server hosts the PR-Tracker Web Service. To use this configuration option, a virtual directory
must be created in IIS to host the PR-Tracker Web Service. The PR-Tracker Server configuration
wizard can do this for you. By default, PR-Tracker configures the virtual directory so that it can
be accessed anonymously. If you want additional security on this virtual directory, you must add
it manually.

An alternate Web Client-Server installation topology is depicted below.


In this topology the PR-Tracker Web Service runs on a PR-Tracker Client instead of the
server. This topology is preferred when you don't want to store the PR-Tracker database on a
corporate web server for security or performance reasons.
To implement this installation topology, you will need to create a virtual directory to host
the PR-Tracker Web Service manually. You will also need to start PR-Tracker on the Web server
at least once and connect to the PR-Tracker Server in client-server mode. This step enables the
information the PR-Tracker Web Service needs to connect to the PR-Tracker Server Service to
be loaded into the Settings.xml file.

5.2.2The language of the web

Know more @ www.vidyarthiplus.com


In order to use the WWW, you must know something about the language used to communicate
in the Web. There are three main components to this language:

 Uniform Resource Locators (URLs)


o URLs provide the hypertext links between one document and another. These links
can access a variety of protocols (e.g., ftp, gopher, or http) on different machines
(or your own machine).

 Hypertext Markup Language (HTML)


o WWW documents contain a mixture of directives (markup), and text or graphics.
The markup directives do such things as make a word appear in bold type. This is
similar to the way UNIX users write nroff or troff documents, and MPE users
write with Galley, TDP, or Prose. For PC users, this is completely different from
WYSIWYG editing. However, a number of tools are now available on the market
that hides the actual HTML.
 Common Gateway Interfaces (CGI)
o Servers use the CGI interface to execute local programs. CGIs provide a gateway
between the HTTP server software and the host machine.

5.2.3 Hypertext Transfer Protocol

When you use a WWW client, it communicates with a WWW server using the Hypertext
Transfer Protocol. When you select a WWW link, the following things happen:

 The client looks up the hostname and makes a connection with the WWW server.
 The HTTP software on the server responds to the client's request.
 The client and the server close the connection.

Know more @ www.vidyarthiplus.com


Compare this with traditional terminal/host computing. Users usually logon (connect) to the
server and remain connected until they logoff (disconnect). An HTTP connection, on the other
hand, is made only for as long as it takes for the server to respond to a request. Once the request
is completed, the client and the server are no longer in communication.

WWW clients use the same technique for other protocols. For example, if you request a directory
at an, the WWW client makes an FTP connection, logs on as an anonymous user, switches to the
directory, requests the directory contents, and then logs off the FTP server.

5.2.4 Hypertext Mark up Language (HTML)

When you write documents for WWW, you use the Hypertext Markup Language (HTML). In a
markup language, you mix your text with the marks that indicate how formatting is to take place.
Most WWW browsers have an option to "View Source" that will show you the HTML for the
current document that you are viewing. Each WWW browser renders HTML in its own way.

 Character-mode browsers use terminal highlights (e.g., inverse video, dim, or


underline) to show links, bold, italics, and so on.
 Graphical browsers use different typefaces, colors, and bold and italic formats to
display different HTML marks. Writers have to remember that each browser in
effect has its own HTML style sheet. For example, Lynx and Mosaic do not insert
a blank line before unnumbered user lists, but Netscape does.

If you want to see how your browser handles standard and non-standard HTML, try the
WWW Test Pattern. The test pattern will show differences between your browser, standard
HTML, and other browsers.

Creating HTML

Creating HTML is awkward, but not that difficult. The most common method of creating
HTML is to write the raw mark-up language using a standard text editor. If you are creating

Know more @ www.vidyarthiplus.com


HTML yourself, we have found the chapter authoring for the Web in the O'Reilly book
"Managing Internet Information Services" to be an excellent resource.

Bob Green, founder of Robelle, finds to be useful for learning HTML. Instead of hiding
the HTML tags, HTML Writer provides menus with all of the HTML elements and inserts these
into a text window. To see how your documents look, you must use a separate Web browser.

Microsoft has produced a new add-on to Microsoft Word that produces HTML is
available from Microsoft at no charge. You will need to know the basic concepts of Microsoft
Word to take advantage of the Internet Assistant. Since we are not experienced Microsoft Word
users, we found that the Internet Assistant didn't help us much.

5.3 3-TIER CLIENT SERVER WEB STYLE

 A special type of client/server architecture consisting of three well-defined and separate


processes, each running on a different platform:
 The user interface, which runs on the user's computer (the client).
 The functional modules that actually process data. This middle tier runs on a server and is
often called the application server.
 A database management system (DBMS) that stores the data required by the middle tier.
This tier runs on a second server called the database server.

The three-tier design has many advantages over traditional two-tier or single-tier designs, the
chief ones being:

 The added modularity makes it easier to modify or replace one tier without
affecting the other tiers.
 Separating the application functions from the database functions makes it easier to
implement load balancing.

5.3.1 Introduction to 3-Tier Architecture

Know more @ www.vidyarthiplus.com


In 3-tier architecture, there is an intermediary level, meaning the architecture is generally split up
between:

 A client, i.e. the computer, which requests the resources, equipped with a user interface
(usually a web browser ) for presentation purposes
 The application server (also called middleware), whose task it is to provide the requested
resources, but by calling on another server
 The data server, which provides the application server with the data it requires.

Three Tier Architecture

Figure 5.2-Tier Architecture

 To overcome the limitations of Two-Tier Architecture


 Middle tier between UI and DB
 Ways of incorporating Middle-Tier
 Transaction processing Monitors

5.3.2 3-Tier Tp Monitor

 Online access through

Know more @ www.vidyarthiplus.com


 Time sharing or Transaction Processing
 Client connects to TP instead of DB
 Monitor accepts transaction, queues it and takes responsibility until it is completed
 Asynchrony is achieved

Key services provided by the monitor

 ability to update multiple different DBMS in a single transaction

 connectivity to a variety of data sources, including

 flat files

 non relational DBMS

 mainframe

 more scalable than a 2-tier approach

 ability to attach priorities to transactions

 robust security

 For large (e.g., 1,000 user) applications, a TP monitor is one of the most effective
solutions.

 The three-tier design has many advantages over traditional two-tier or single-tier designs,
the chief ones being:

 Separating the application functions from the database functions makes it easier to
implement

5.3.3 3 Tier Applications

Know more @ www.vidyarthiplus.com


rs.

 Most of Application’s business logic is moved to Shared host server

 PC is used only for presentation services

 Approach is similar to X Architecture

 Both aim at pulling the main body of application logic off the desktop and running it on a
shared host.

5.4 CGI: Common Gateway Interface

An HTTP server is often used as a gateway to a legacy information system; for example, an
existing body of documents or an existing database application. The Common Gateway Interface
is an agreement between HTTP server implementers about how to integrate such gateway scripts
and programs.

Know more @ www.vidyarthiplus.com


It is typically used in conjunction with HTML forms to build database applications.

How is a form’s data passed to a program that hangs off an HTTP server? It gets passed
using an end-to-end client/server protocol that includes both HTTP and CGI. The So best way to
explain the dynamics of the protocol is to walk you through a POST method invocation.

A CGI:

How the client and server programs play together to process a form’s request. Here’s the step-by-
step explanation of this interaction:

1. User clicks on the form’s “submit” button.

This causes the Web browser to collect the data within the form, and then assemble it into
one long string of name/value pairs each separated by an ampersand (&). The browser translates
spaces within the data into plus (+) symbols. No, it’s not very pretty.

2. The Web Browser invokes a POST HTTP method.

This is an ordinary HTTP request that specifies a POST method, the URL of the target
program in the “cgi-bin” dictionary, and the typical HTTP headers. The message body-HTTP
calls it the “entity”- contains the form’s data. This is the string: name=value &name=value&...

3. The HTTP server receives the method invocation via a socket connection.

The server parses the message and discovers that it’s a POST for the “cgi-bin” program.
So it starts a CGI interaction.

4. The HTTP server sets up the environment variables.

The CGI protocol uses environment variables as a shared bulletin board for
communicating information between the HTTP server and the CGI program. The server typically
provides the following environmental information: server_name, request_method, path_info,
script_name, content_type, and content-length.

5. The HTTP server CGI program starts a.

The HTTP server executes an instance of the CGI program specified in the URL;it’s
typically in the “cgi=bin” directory.

6. The CGI program reads the environment variables.

Know more @ www.vidyarthiplus.com


In this case, the program discovers by reading the environment variables that it is
responding to a POST.

7. The CGI program receives the message body via the standard input pipe (stdin).

Remember, the message body contains the famous string of name=value items separated
by ampersands (&). The content length environment variable tells the program how much data is
in the string. The CGI program parses the string contents to retrieve the form data. It uses the
content length environment variable to determine how many characters to read in from the
standard input pipe. Cheer up, we’re half way there.

8. The CGI program does some work.

Typically, a CGI program interacts with some back-end resource-like a DBMS or


transaction program-to service the other acceptable MIME type. This information goes into the
HTTP response to provide all the information that goes into the HTTP response headers. The
HTTP server will then send the reply “as is” to the client. Why would you do this? Because it
removes the extra overhead of having the HTTP server parse the output to create the response
headers. Programs whose name begins with “nph-” indicate that they do not require HTTP server
assistance; CGI calls them nonparsed header programs (nph).

9. The CGI program returns the results via the standard output pipe (stdout).

The program pipes back the results to the HTTP server via its standard output. The HTTP
server receives the results on its standard input. This concludes the CGI interaction.

10. The HTTP server returns the results to the Web browser.

The HTTP server can either append some response headers to the information it receives
from the CGI program, or it sends it “as is” if it’s an nph program.

As you can see, a CGI program is executed in real time; it gets the information and then
builds a dynamic Web page to satisfy a client’s request. CGI makes the Web more dynamic. In
contrast, a plain HTML document is static, which means the text file does not change. CGI may
be clumsy, but it does allow us to interface Web clients to general-purpose back-end services-
such as Amazon.com-as well as to Internet search utilities such as Yahoo! And Excite. You can
even stretch CGI to its limits to create general-purpose client/server programs like the Federal
Express package-tracking Web page. However, Federal Express uses CGI to connect to a TP
Monitor in the backend.

Know more @ www.vidyarthiplus.com


5.5 THE SERVER SIDE OF THE WEB

Server-side scripting is a web server technology in which a user's (client's) request is


handled by a script running on the web server to generate dynamic web pages. It is usually used
to provide interactive web sites that interface to databases or other data stores. This is different
from client-side scripting where scripts, usually JavaScript, are run in the web browser.

Server-side scripting is used to customize the server response based on the user's
requirements, access rights, or queries into data stores. From a security point of view, the source
code of server-side scripts are never visible to the browser as these scripts are executed on the
server and emit HTML corresponding to the user's input to the page.

When the server serves data in a commonly used manner, for example according to the
HTTP or FTP protocols, users may have their choice of a number of client programs (most
modern web browsers can request and receive data using both of those protocols). In the case of
more specialized applications, programmers may write their own server, client, and
communications protocol that can only be used with one another.

Programs that run on a user's local computer without ever sending or receiving data over a
network are not considered clients, and so the operations of such programs would not be
considered client-side operations.

5.5.1 History

Server-side scripting was invented in early 1995 by Fred DuFresne while developing the first
web site for Boston, MA television station WCVB. The technology is described in US patent
5835712. The patent was issued in 1998 and is now owned by Open Invention Network (OIN).
In 2010 OIN named Fred DuFresne a "Distinguished Inventor" for his work on server-side
scripting.

Know more @ www.vidyarthiplus.com


5.5.2 Explanation

In the earlier days of the web, server-side scripting was almost exclusively performed by
using a combination of C programs, Perl scripts and shell scripts using the Common Gateway
Interface (CGI). Those scripts were executed by the operating system, mnemonic coding and the
results simply served back by the web server. These and other on-line scripting languages such
as ASP and PHP can often be executed directly by the web server itself or by extension modules
(e.g. mod_perl or mod php) to the web server.

WebDNA includes its own embedded database system. Either form of scripting (i.e., CGI
or direct execution) can be used to build up complex multi-page sites, but direct execution
usually results in lower overhead due to the lack of calls to external interpreters.

Dynamic websites are also sometimes powered by custom web application servers, for
example the Python "Base HTTP Server" library, although some may not consider this to be
server-side scripting. When working with dynamic Web-based scripting technologies, like
classic ASP or PHP, developers must have a keen understanding of the logical, temporal, and
physical separation between the client and the server.

5.6 CGI AND STATE

5.6.1 Introduction

A CGI program is any program designed to accept and return data that conforms to the
CGI specification. The program could be written in any programming language, including C,
Perl, Java, or Visual Basic.

CGI programs are the most common way for Web servers to interact dynamically with
users. Many HTML pages that contain forms, for example, use a CGI program to process the
form's data once it's submitted. Another increasingly common way to provide dynamic feedback
for Web users is to include scripts or programs that run on the user's machine rather than the
Web server. These programs can be Java applets, Java scripts, or ActiveX controls. These

Know more @ www.vidyarthiplus.com


technologies are known collectively as client-side solutions, while the use of CGI is a server-side
solution because the processing occurs on the Web server

One problem with CGI is that each time a CGI script is executed, a new process is started. For
busy Web sites, this can slow down the server noticeably. A more efficient solution, but one that
it is also more difficult to implement, is to use the server's API, such as ISAPI or NSAPI.
Another increasingly popular solution is to use Java servlets.

DIAGRAM OF CGI

Figure 5.4 CGI

5.6.2 CGI Applications

CGI turns the Web from a simple collection of static hypermedia documents into a whole
new interactive medium, in which users can ask questions and run applications. Let's take a look
at some of the possible applications that can be designed using CGI.

5.6.3 Forms

One of the most prominent uses of CGI is in processing forms. Forms are a subset of
HTML that allows the user to supply information. The forms interface makes Web browsing an
interactive process for the user and the provider shows a simple form.

Know more @ www.vidyarthiplus.com


As can be seen from the figure, a number of graphical widgets are available for form
creation, such as radio buttons, text fields, checkboxes, and selection lists. When the form is
completed by the user, the Submit Order! button is used to send the information to the server,
which executes the program associated with the particular form to "decode" the data.

Figure 5.5 Simple form illustrating different widgets

Generally, forms are used for two main purposes. At their simplest, forms can be used to collect
information from the user. But they can also be used in a more complex manner to provide back-
and-forth interaction. For example, the user can be presented with a form listing the various
documents available on the server, as well as an option to search for particular information

Know more @ www.vidyarthiplus.com


within these documents. A CGI program can process this information and return document(s)
that match the user's selection criteria.

5.6.4 Gateways

Web gateways are programs or scripts used to access information that is not directly
readable by the client.

CGI provides a solution to the problem in the form of a gateway. You can use a language
such as oraperl (see Chapter 9, Gateways, Databases, and Search/Index Utilities, for more
information) or a DBI extension to Perl to form SQL queries to read the information contained
within the database. Once you have the information, you can format and send it to the client. In
this case, the CGI program serves as a gateway to the Oracle database, as shown in Figure 1.3

Figure 5.6 A gateway to a database

Similarly, you can write gateway programs to any other Internet information service,
including Archie, WAIS, and NNTP (Usenet News), shows examples of interacting with other
Internet services. In addition, you can amplify the power of gateways by using the forms

Know more @ www.vidyarthiplus.com


interface to request a query or search string from the user to retrieve and display dynamic, or
virtual, information. We will discuss these special documents next.

5.6.5 Virtual Documents

Virtual, or dynamic, document creation is at the heart of CGI. Virtual documents are created
on the fly in response to a user's information request. You can create virtual HTML, plain text,
image, and even audio documents. A simple example of a virtual document could be something
as trivial as this:

 Welcome to Shishir's WWW Server!


 You are visiting from diamond.com. The load average on this machine is 1.25.
 Happy navigating!

In this example, there are two pieces of dynamic information: the alphanumeric address (IP
name) of the remote user and the load average on the serving machine. This is a very simple
example, indeed!

On the other hand, very complex virtual documents can be created by writing programs
that use a combination of graphics libraries, gateways, and forms. As a more sophisticated
example, say you are the manager of an art gallery that specializes in selling replicas of ancient
Renaissance paintings and you are interested in presenting images of these masterpieces on the
Web. You start out by creating a form that asks for user information for the purpose of
promotional mailings, presents a search field for the user to enter the name of a painting, as well
as a selection list containing popular paintings.

Once the user submits the form to the server, a program can email the user information to
a certain address, or store it in a file. And depending on the user's selection, either a message
stating that the painting does not exist or an image of the painting can be displayed along with
some historical information located elsewhere on the Internet.

Know more @ www.vidyarthiplus.com


Along with the picture and history, another form with several image processing options to
modify the brightness, contrast, and/or size of the picture can be displayed. You can write
another CGI program to modify the image properties on the fly using certain graphics libraries,
such as god, sending the resultant picture to the client.

This is an example of a more complex CGI program using many aspects of CGI programming.
Several such examples will be presented in this book.

5.7 SQL Database Server

SQL Server is a relational database management system from that’s designed for the
environment. SQL Server runs on (Transact -), a set of programming s from and Microsoft that
add several features to standard SQL, including transaction control, exception and , row
processing, and declared s.

Code named in development, SQL Server 2005 was released in November 2005. The
2005 product is said to provide enhanced flexibility,, reliability, and to applications, and to make
them easier to create and deploy, thus reducing the complexity and tedium involved in . SQL
Server 2005 also includes more administrative support.

The original SQL Server code was developed by Sybase; in the late 1980s, Microsoft,
Sybase and Ashton-Tate collaborated to produce the first version of the product, SQL Server 4.2
for . Subsequently, both Sybase and offered SQL Server products. Sybase has since renamed
their product

5.7.1 History

 Prior to version 7.0 the for MS SQL Server was sold by to Microsoft, and was
Microsoft's entry to the enterprise-level database market, competing against
Oracle, IBM, and, later.

Know more @ www.vidyarthiplus.com


 Microsoft, Sybase and originally teamed up to create and market the first version
named SQL Server 1.0 for (about 1989) which was essentially the same as
Sybase SQL Server 3.0 on , etc.
 Microsoft SQL Server 4.2 was shipped around 1992 (available bundled with IBM
version 1.3). Later Microsoft SQL Server 4.21 for Windows NT was released at
the same time as Windows NT 3.1. Microsoft SQL Server v6.0 was the first
version designed for NT, and did not include any direction from Sybase.
 About the time Windows NT was released, Sybase and Microsoft parted ways and
each pursued its own design and marketing schemes. Microsoft negotiated
exclusive rights to all versions of SQL Server written for Microsoft operating
systems.
 Later, Sybase changed the name of its product to Adaptive Server Enterprise to
avoid confusion with Microsoft SQL Server. Until 1994, Microsoft's SQL Server
carried three Sybase copyright notices as an indication of its origin.
 SQL Server 7.0 and SQL Server 2000 included modifications and extensions to
the Sybase code base, adding support for the IA-64 architecture. By SQL Server
2005 the legacy Sybase code had been completely rewritten.
 In the ten years since release of Microsoft's previous SQL Server product (SQL
Server 2000), advancements have been made in performance, the client IDE tools,
and several complementary systems that are packaged with SQL Server 2005.

5.7.2 SQL Server 2005

SQL Server 2005 (codename Yukon) was released in October 2005. It included native
support for managing XML data, in addition to relational data. For this purpose, it defined an
xml data type that could be used either as a data type in database columns or as literals in
queries. XML columns can be associated with XSD schemas such as;

 XML data being stored is verified against the schema. XML is converted to an internal
binary data type before being stored in the database. Specialized indexing methods
were made available for XML data.

Know more @ www.vidyarthiplus.com


 XML data is queried using XQuery; SQL Server 2005 added some extensions to the
T-SQL language to allow embedding XQuery queries in T-SQL.
 In addition, it also defines a new extension to XQuery, called XML DML that allows
query-based modifications to XML data.

SQL Server 2005 also allows a database server to be exposed over web services using

 Tabular Data Stream (TDS) packets encapsulated within SOAP (protocol) requests.
When the data is accessed over web services, results are returned as XML.
 Common Language Runtime (CLR) integration was introduced with this version,
enabling one to write SQL code as Managed Code by the CLR.
 For relational data, T-SQL has been augmented with error handling features (try/catch)
and support for recursive queries with CTEs (Common Table Expressions).

SQL Server 2005 has also been enhanced with new indexing algorithms, syntax and
better error recovery systems. Data pages are check summed for better error resiliency, and
optimistic concurrency support has been added for better performance. Permissions and access
control have been made more granular and the query processor handles concurrent execution of
queries in a more efficient way. Partitions on tables and indexes are supported natively, so
scaling out a database onto a cluster is easier.

 SQL CLR was introduced with SQL Server 2005 to let it integrate with the .NET
Framework.
 SQL Server 2005 introduced "MARS" (Multiple Active Results Sets), a method of
allowing usage of database connections for multiple purposes.
 SQL Server 2005 introduced DMVs (Dynamic Management Views), which are
specialized views and functions that return server state information that can be used to
monitor the health of a server instance, diagnose problems, and tune performance.
 Service Pack 1 (SP1) of SQL Server 2005 introduced Database Mirroring, a high
availability option that provides redundancy and failover capabilities at the database
level.

Know more @ www.vidyarthiplus.com


5.7.3 SQL Server 2008

SQL Server 2008 (codename Katmai) was released on August 6, 2008 and aims to make
data management self organizing, and self maintaining with the development of SQL Server
Always On technologies, to provide near-zero downtime. SQL Server 2008 also includes support
for and semi-structured data, including digital media formats for pictures, audio, video and other
multimedia data. In current versions, such multimedia data can be stored as (binary large
objects), but they are generic bitstreams. Intrinsic awareness of multimedia data will allow
specialized functions to be performed on them. According to, senior Vice President, Server
Applications, ., SQL Server 2008 can be a data storage backend for different varieties of data:
XML, email, time/calendar, file, document, spatial, etc as well as perform search, query,
analysis, sharing, and synchronization across all data types.

Other new data types include specialized date and time types and a Spatial data type for
location-dependent data. Better support for unstructured and semi-structured data is provided
using the new FILESTREAM data type, which can be used to reference any file stored on the file
system.

Structured data and metadata about the file is stored in SQL Server database, whereas the
unstructured component is stored in the file system. Such files can be accessed both via file
handling as well as via SQL Server using; doing the latter accesses the file data as a BLOB.
Backing up and restoring the database backs up or restores the referenced files as well. SQL
Server 2008 also natively supports hierarchical data, and includes constructs to directly deal with
them, without using recursive queries.

The Full-text search functionality has been integrated with the database engine.
According to a Microsoft technical article, this simplifies management and improves
performance.

Spatial data will be stored in two types,

Know more @ www.vidyarthiplus.com


 A "Flat Earth" (GEOMETRY or planar) data type represents geospatial data which has
been projected from its native, spherical, coordinate system into a plane.
 A "Round Earth" data type (GEOGRAPHY) uses an ellipsoidal model in which the Earth
is defined as a single continuous entity which does not suffer from the singularities such
as the international dateline, poles, or map projection zone "edges". Approximately 70
methods are available to represent spatial operations for the Open Geospatial Consortium
Simple Features for SQL, Version 1.1.
 SQL Server includes better compression features, which also helps in improving
scalability. It enhanced the indexing algorithms and introduced the notion of filtered
indexes. It also includes Resource Governor that allows reserving resources for certain
users or workflows. It also includes capabilities for transparent encryption of data (TDE)
as well as compression of backups.
 SQL Server 2008 supports the ADO.NET Entity Framework and the reporting tools,
replication, and data definition will be built around the Entity Data Model. SQL Server
Reporting Services will gain charting capabilities from the integration of the data
visualization products from Dundas Data Visualization, Inc., which was acquired by
Microsoft.
 On the management side, SQL Server 2008 includes the Declarative Management
Framework which allows configuring policies and constraints, on the entire database or
certain tables, declaratively.
 The version of SQL Server Management Studio included with SQL Server 2008 supports
IntelliSense for SQL queries against a SQL Server 2008 Database Engine.
 SQL Server 2008 also makes the databases available via Windows PowerShell providers
and management functionality available as Cmdlets, so that the server and all the running
instances can be managed from Windows PowerShell.

5.7.4 SQL Server 2008 R2

SQL Server 2008 R2 (formerly codenamed SQL Server "Kilimanjaro") was announced at
TechEd 2009, and was released to manufacturing on April 21, 2010. SQL Server 2008 R2 adds

Know more @ www.vidyarthiplus.com


certain features to SQL Server 2008 including a master data management system branded as
Master Data Services, a central management of master data entities and hierarchies. Also Multi
Server Management, a centralized console to manage multiple SQL Server 2008 instances and
services including relational databases, Reporting Services, Analysis Services & Integration
Services.

SQL Server 2008 R2 includes a number of new services, including

 PowerPivot for Excel and SharePoint,


 Master Data Services,
 StreamInsight, Report Builder 3.0,
 Reporting Services Add-in for SharePoint,
 a Data-tier function in Visual Studio

that enables packaging of tiered databases as part of an application, and a SQL Server
Utility named UC (Utility Control Point), part of AMSM (Application and Multi-Server
Management) that is used to manage multiple SQL Servers.

5.7.5 SQL Server 2012

At the 2011 Professional Association for SQL Server (PASS) summit on October 11,
Microsoft announced that the next major version of SQL Server, codenamed Denali, would be
SQL Server 2012. It was released to manufacturing on March 6, 2012.

It was announced to be last version to natively support OLE DB and instead to prefer
ODBC for native connectivity. This announcement has caused some controversy.

SQL Server 2012's new features and enhancements include always on SQL Server
Failover Cluster Instances and Availability Groups which provides a set of options to

 improve database availability,


 Contained Databases which simplify the moving of databases between instances,

Know more @ www.vidyarthiplus.com


 new and modified Dynamic Management Views and Functions,
 programmability enhancements including new Spatial features,
 Metadata discovery,
 Sequence objects and the THROW statement,
 performance enhancements such as Column Store Indexes as well as
improvements to Online and
 Partition level operations and security enhancements including Provisioning
During Setup, new permissions, improved role management and default schema
assignment for groups.

5.7.6 Database Server Architecture

The Mimer SQL DBMS is based on client/server architecture. The Database Server
executes in one single, multi-threaded process with multiple Request and Background threads.
On some platforms Communication threads are used. The Mimer SQL architecture is truly
multi-threaded, with requests being dynamically allocated to the different Request threads. As
threads scale very well over multiple CPUs, Mimer SQL is particularly well suited for symmetric
multiprocessor (SMP) environments. By the use of threads within the Database Server, optimal
efficiency is achieved when context-switching in the Database Server.

It also ensures that the application can only view data that has been formerly passed to the client
side, which is extremely important from a data security point of view.

Know more @ www.vidyarthiplus.com


Figure 5.7 Mimer SQL Database Server Architecture

The Communication threads are used to handle parts of the communication between the
applications and the database server. On some platforms other mechanisms are used to handle
the communication between the applications and the database server. Whatever the mechanism,
all communication with the database server is multi-threaded, allowing large numbers of
simultaneous user requests.

Both local and remote applications are handled directly by the Database Server. This means that
in Client/Server environments, where Mimer SQL executes in a distributed environment with
the client and server on different machines, all remote clients connect directly to the Database
Server. Thereby avoiding any additional overhead of network service processes being started,
either on the client or on the server machine.

The Request threads perform the SQL operations requested by the applications. When the
Database Server is requested to perform a SQL operation it allocates one of its Request threads

Know more @ www.vidyarthiplus.com


to perform the task. When the SQL operation is complete the result is returned back to the
application, and the Request thread that has performed the operation returns to a waiting state
until it receives another server request. Since the SQL operations are evaluated entirely within
the Database Server, inter-process communication is reduced to a minimum.

When a SQL query or a stored routine is executed by a Request thread, the compiled version of
the query or the routine is stored within the Database Server. In this way the same, compiled
version of the query or routine can be used again by other applications. This leads to improved
performance, since a SQL query or a stored routine only need to be compiled once by the
Database Server.

The Background threads perform database services including all database updates, online
backup and database shadowing. These services are performed asynchronously in the
background to the application processes, which means that the application process does not have
to wait for the physical completion of a transaction or a shadow update, but can continue as soon
as the transaction has been prepared and secured to disk.

I/O-operations are performed in parallel directly by the request and background threads using
asynchronous I/O. Thereby any need for separate I/O-threads are avoided.

5.8 Middleware and Federated Database Technology

In a large modern enterprise, it is almost inevitable that different portions of the organization will
use different database management systems to store and search their critical data. Competition,
evolving technology, mergers, acquisitions, geographic distribution, and the inevitable
decentralization of growth all contribute to this diversity. Yet it is only by combining the
information from these systems that the enterprise can realize the full value of the data they
contain.

For example, in the finance industry, mergers are an almost commonplace occurrence. The
newly created entity inherits the data stores of the original institutions. Many of those stores will
be relational database management systems, but often from different manufacturers; for instance,

Know more @ www.vidyarthiplus.com


one company may have used primarily Sybase, and another Informix IDS. They may both have
one or more document management systems -- such as Documentum or IBM Content Manager --
for storing text documents such as copies of loans, etc.

The Garlic project demonstrated the feasibility of extending this idea to build a federated
database system that effectively exploits the query capabilities of diverse, possibly non-relational
data sources. In both of these systems, as in today's DB2, a middleware query processor develops
optimized execution plans and compensates for any functionality that the data sources may lack.

In this article, we describe the key characteristics of IBM's federated technology: transparency,
heterogeneity, a high degree of function, autonomy for the underlying federated sources,
extensibility, openness, and optimized performance. We then "roll back the covers" to show how
IBM's database federation capabilities work. We illustrate how the federated capabilities can be
used in a variety of scenarios, and conclude with some directions for the future.

5.8.1 Characteristics Of The Federated Solution

TRANSPARENCY

If a federated system is transparent, it masks from the user the differences, idiosyncracies, and
implementations of the underlying data sources. Ideally, it makes the set of federated sources
look to the user like a single system. The user should not need to be aware of where the data is
stored (location transparency), what language or programming interface is supported by the data
source (invocation transparency), if SQL is used, what dialect of SQL the source supports
(dialect transparency), how the data is physically stored, or whether it is partitioned and/or
replicated (physical data independence, fragmentation and replication transparency), or what
networking protocols are used (network transparency). The user should see a single uniform
interface, complete with a single set of error codes (error code transparency). IBM provides all
these features, allowing applications to be written as if all the data were in a single database,
although, in fact, the data may be stored in a heterogeneous collection of data sources.

Know more @ www.vidyarthiplus.com


Heterogeneity

Heterogeneity is the degree of differentiation in the various data sources. Sources can differ in
many ways. They may run on different hardware, use different network protocols, and have
different software to manage their data stores. They may have different query languages,
different query capabilities, and even different data models. They may handle errors differently,
or provide different transaction semantics. They may be as much alike as two Oracle instances,
one running Oracle 8i, and the other Oracle 9i, with the same or different schemas. Or they may
be as diverse as a high-powered relational database, a simple, structured flat file, a web site that
takes queries in the form of URLs and spits back semi-structured XML according to some DTD,
a Web service, and an application that responds to a particular set of function calls. IBM's
federated database can accommodate all of these differences, encompassing systems such as
these in a seamless, transparent federation.

High Degree Of Function

IBM's federated capability provides users with the best of both worlds: all the function of its rich,
standard-compliant DB2 SQL capability against all the data in the federation, as well as all the
function of the underlying data sources. DB2's SQL includes support for many complex query
features, including inner and outer joins, nested sub queries and table expressions, recursion,
user-defined functions, aggregation, statistical analyses, automatic summary tables, and others
too numerous to mention. Many data sources may not provide all of these features. However,
users still get the full power of DB2 SQL on these sources' data, because of function
compensation. Function compensation means that if a data source cannot do a particular query
function, the federated database retrieves the necessary data and applies the function itself. For
example, a file system typically cannot do arbitrary sorts. However, users can still request that
data from that source (ie, some subset of a file) be retrieved in some order, or ask that duplicates
be eliminated from that data. The federated database will simply retrieve the relevant data, and
do the sort itself.

Know more @ www.vidyarthiplus.com


While many sources do not provide all the function of DB2 SQL, it is also true that many
sources have specialized functionality that the IBM federated database lacks. For example,
document management systems often have scoring functions that let them estimate the relevancy
of retrieved documents to a user's search. In the financial industry, time-series data is especially
important, and systems exist that can compare, plot, analyse, and subset time-series data in
specialized ways. In the pharmaceutical industry, new drugs are based on existing compounds
with particular properties. Special-purpose systems can compare chemical structures, or simulate
the binding of two molecules. While such functions could be implemented directly, it is often
more efficient and cost-effective to exploit the functionality that already exists in data sources
and application systems.

Extensibility And Openness Of The Federation

All systems need to evolve over time. In a federated system, new sources may be needed to meet
the changing needs of the users' business. IBM makes it easy to add new sources. The federated
database engine accesses sources via a software component know as a wrapper. Accessing a new
type of data source is done by acquiring or creating a wrapper for that source. The wrapper
architecture enables the creation of new wrappers. Once a wrapper exists, simple data definition
(DDL) statements allow sources to be dynamically added to the federation without stopping
ongoing queries or transactions.

Any data source can be wrapped. IBM supports the ANSI SQL/MED standard (MED stands for
Management of External Data). This standard documents the protocols used by a federated
server to communicate with external data sources. Any wrapper written to the SQL/MED
interface can be used with IBM's federated database. Thus wrappers can be written by third
parties as well as IBM, and used in conjunction with IBM's federated database.

Autonomy For Data Sources

Typically a data source has existing applications and users. It is important, therefore, that the
operation of the source is not affected when it is brought into a federation. IBM's federated
database does not disturb the local operation of an existing data source. Existing applications will

Know more @ www.vidyarthiplus.com


run unchanged, data is neither moved nor modified, and interfaces remain the same. The way the
data source processes requests for data is not affected by the execution of global queries against
the federated system, though those global queries may touch many different data sources.
Likewise, there is no impact on the consistency of the local system when a data source enters or
leaves a federation. The sole exception is during federated two phase commit processing for
sources that participate. (While not available in V7 of DB2, federated two phase commit was
used in DataJoiner.) Data sources involved in the same unit of work will need to participate in
commit processing and can be requested to roll back the associated changes if necessary.

Unlike other products, our wrapper architecture does not require any software to be installed on
the machine that hosts the data source.

Optimized Performance

The optimizer is the component of a relational database management system that determines the
best way to execute each query. Relational queries are non-procedural and there are typically
several different implementations of each relational operator and many possible orderings of
operators to choose from in executing a query. While some optimizers use heuristic rules to
choose an execution strategy, IBM's federated database considers the various possible strategies,
modeling the likely cost of each, and choosing the one with the least cost. (Typically, cost is
measured in terms of system resources consumed).

In a federated system, the optimizer must decide whether the different operations involved in a
query should be done by the federated server or by the source where the data is stored. It must
also determine the order of the operations, and what implementations to use to do local portions
of the query. To make these decisions, the optimizer must have some way of knowing what each
data source can do, and how much it costs. For example, if the data source is a file, it would not
make sense to assume it was smart, and ask it to perform a sort or to apply some function. On the
other hand, if the source is a relational database system capable of applying predicates and doing
joins, it might be a good idea to take advantage of its power if it will reduce the amount of data
that needs to be brought back to the federated engine. This will typically depend on the details of

Know more @ www.vidyarthiplus.com


the individual query. The IBM optimizer works with the wrappers for the different sources
involved in a query to evaluate the possibilities. Often the difference between a good and a bad
decision on the execution strategy is several orders of magnitude in performance. IBM's
federated database is unique in the industry in its ability to work with wrappers to model the
costs of federated queries over diverse sources. As a result, users can expect the best
performance possible from their federated system.

To further enhance performance, each wrapper implementation takes advantage of the


operational knobs provided by each data source using the source's native API.

5.8.3 Fedarated Architecture

federated database architecture is shown in. Applications can use any supported interface
(including ODBC, JDBC, or a Web service client) to interact with the federated server.

The federated server communicates with the data sources by means of software modules called
wrappers

Figure 5.8 Architecture of an federated system

Configuring a federated system

Know more @ www.vidyarthiplus.com


A federated system is created by installing the federated engine and then configuring it to talk to
the data sources. There are several steps to add a new data source to a federated system. First, a
wrapper for the source must be installed, and IBM's federated database must then be told where
to find this wrapper. This is done by means of a CREATE WRAPPER statement. If multiple
sources of the same type are desired, only one wrapper is needed. For example, even if the
federated system will include five Oracle database instances, possibly on different machines,
only one Oracle wrapper is needed, and hence, only one CREATE WRAPPER statement will be
required.

The AS TEMPLATE clause tells the federated database that there is no local implementation of
the function. Next, a CREATE FUNCTION MAPPING statement tells the federated database
what server can evaluate the function. Several function mappings may be created for the same
function. For our example, the following statement accomplishes the mapping:

The above DDL statements produce metadata describing the information about nicknames and
the signatures of mapped functions. This metadata is used by the federated query processing
engine and is stored in the global catalogues of the federated database.

5.9 QUERY PROCESSING

After the federated system is configured, an application can submit a query written in SQL to a
federated server. The federated server optimizes the query, developing an execution plan in
which the query has been decomposed into fragments that can be executed at individual data
sources. As mentioned above, many decompositions of the query are possible, and the optimizer
chooses among alternatives on the basis of minimum estimated total resource consumption. Once
a plan has been selected, the federated database drives the execution, invoking the wrappers to
execute the fragments assigned to them

Know more @ www.vidyarthiplus.com


Figure 5.9 Query Processing

The optimizer works differently with relational and non-relational wrappers. The optimizer
models relational sources in detail, using information provided by the wrapper to generate plans
that represent what it expects the source to do.

However, because non-relational sources do not have a common set of operations or common
data model, a more flexible arrangement is required with these sources.

Know more @ www.vidyarthiplus.com


Hence the optimizer works with the non-relational wrappers:

 The IBM federated database submits candidate query fragments called "requests" to a
wrapper if the query fragments apply to a single source.
 When a non-relational wrapper receives a request, it determines what portion, if any, of
the corresponding query fragment can be performed by the data source.
 The wrapper returns a reply that describes the accepted portion of the fragment.
 The reply also includes an estimate of the number of rows that will be produced, an
estimate of the total execution time, and a wrapper plan.
 An encapsulated representation of everything the wrapper will need to know to execute
the accepted portion of the fragment.
 The federated database optimizer incorporates the reply into a global plan, introducing
additional operators as necessary to compensate for portions of fragments that were not
accepted by a wrapper.
 The cost and cardinality information from the replies is used to estimate the total cost of
the plan, and the plan with minimum total cost is selected from among all the candidates.
 When a plan is selected, it need not be executed immediately; it can be stored in the
database catalogues and subsequently used one or more times to execute the query.
 Even if a plan is used immediately, it need not be executed in the same process in which
it was created, as illustrated in Figure 3.

Know more @ www.vidyarthiplus.com


Figure 5.10 Compilation and runtime for non-relational sources

5.10 Data Warehouse

A data warehouse is a relational database that is designed for query and analysis rather than for
transaction processing. It usually contains historical data derived from transaction data, but it can
include data from other sources. It separates analysis workload from transaction workload and
enables an organization to consolidate data from several sources.

In addition to a relational database, a data warehouse environment includes an extraction,


transportation, transformation, and loading (ETL) solution, an online analytical processing
(OLAP) engine, client analysis tools, and other applications that manage the process of gathering
data and delivering it to business users.

5.10.1 Characteristics of data warehouses

A common way of introducing data warehousing is to refer to the characteristics of a data


warehouse as set forth by William Inmon:

 Subject Oriented
 Integrated
 Non volatile
 Time Variant

Subject Oriented

Data warehouses are designed to help you analyze data. For example, to learn more about your
company's sales data, you can build a warehouse that concentrates on sales. Using this
warehouse, you can answer questions like "Who was our best customer for this item last year?"
This ability to define a data warehouse by subject matter, sales in this case, makes the data
warehouse subject oriented.

Know more @ www.vidyarthiplus.com


Integrated

Integration is closely related to subject orientation. Data warehouses must put data from
disparate sources into a consistent format. They must resolve such problems as naming conflicts
and inconsistencies among units of measure. When they achieve this, they are said to be
integrated.

Nonvolatile

Nonvolatile means that, once entered into the warehouse, data should not change. This is logical
because the purpose of a warehouse is to enable you to analyze what has occurred.

Time Variant

In order to discover trends in business, analysts need large amounts of data. This is very much in
contrast to online transaction processing (OLTP) systems, where performance requirements
demand that historical data be moved to an archive. A data warehouse's focus on change over
time is what is meant by the term time variant

Know more @ www.vidyarthiplus.com


Figure 5.11 Contrasting OLTP and Data Warehousing Environments

One major difference between the types of system is that data warehouses are not usually in
third normal form (3NF), a type of data normalization common in OLTP environments.

Data warehouses and OLTP systems have very different requirements. Here are some examples
of differences between typical data warehouses and OLTP systems:

5.10.2 Distributed Data Warehouse

Implementing a distributed data warehouse has been shown to provide higher availability and
lower overall cost.

An enterprise can create several data marts that store only high level summaries of data derived
from the warehouse. With IBM's federated technology, data marts and warehouse can be on
separate systems, yet users of the data mart can still drill down with ease from their local level of
summarization into the warehouse. Federated technology shields the users, who have no need to
know that the data warehouse is distributed, by providing a virtual data warehouse.

5.10.3 Data Warehouse Architectures

Data warehouses and their architectures vary depending upon the specifics of an organization's
situation. Three common architectures are:

 Data Warehouse Architecture (Basic)


 Data Warehouse Architecture (with a Staging Area)
 Data Warehouse Architecture (with a Staging Area and Data Marts)

Data Warehouse Architecture (Basic)

shows a simple architecture for a data warehouse. End users directly access data derived from
several source systems through the data warehouse.

Know more @ www.vidyarthiplus.com


Figure 5.12 Architecture of a Data Warehouse

In Figure 1-2, the metadata and raw data of a traditional OLTP system is present, as is an
additional type of data, summary data. Summaries are very valuable in data warehouses because
they pre-compute long operations in advance. For example, a typical data warehouse query is to
retrieve something like August sales. A summary in Oracle is called a materialized view.

Data Warehouse Architecture (with a Staging Area)

In Figure 1-2, you need to clean and process your operational data before putting it into the
warehouse. You can do this programmatically, although most data warehouses use a staging
area instead. A staging area simplifies building summaries and general warehouse management.
Figure 1-3 illustrates this typical architecture.

Know more @ www.vidyarthiplus.com


Fig 5.13 Architecture of a Data Warehouse with a Staging Area

Data Warehouse Architecture (with a Staging Area and Data Marts)

Although the architecture in Figure 1-3 is quite common, you may want to customize your
warehouse's architecture for different groups within your organization. You can do this by
adding data marts, which are systems designed for a particular line of business

Data Warehouse (PDW Parallel)

A massively parallel processing (MPP) SQL Server appliance optimized for large-scale data
warehousing such as hundreds of terabytes.

5.10.3 Architecture

This architecture of MS SQL Server contains different layers and services.

Protocol layer

Protocol layer implements the external interface to SQL Server. All operations that can be
invoked on SQL Server are communicated to it via a Microsoft-defined format, called Tabular
Data Stream (TDS). TDS is an application layer protocol, used to transfer data between a
database server and a client. Initially designed and developed by Sybase Inc. for their Sybase
SQL Server relational database engine in 1984, and later by Microsoft in Microsoft SQL Server,
TDS packets can be encased in other physical transport dependent protocols, including TCP/IP,
Named pipes, and Shared memory. Consequently, access to SQL Server is available over these
protocols. In addition, the SQL Server API is also exposed over web services.

Data storage

The main unit of data storage is a database, which is a collection of tables with typed columns.
SQL Server supports different data types, including primary types such as Integer, Float,

Know more @ www.vidyarthiplus.com


Decimal, Char (including character strings), Varchar (variable length character strings), binary
(for unstructured blobs of data), Text (for textual data) among others. The rounding of floats to
integers uses either Symmetric Arithmetic Rounding or Symmetric Round Down (Fix)
depending on arguments: SELECT Round(2.5, 0) gives 3.

Microsoft SQL Server also allows user-defined composite types (UDTs) to be defined and used.
It also makes server statistics available as virtual tables and views (called Dynamic Management
Views or DMVs). In addition to tables, a database can also contain other objects including views,
stored procedures, indexes and constraints, along with a transaction log.

Buffer management

SQL Server buffers pages in RAM to minimize disc I/O. Any 8 KB page can be buffered in-
memory, and the set of all pages currently buffered is called the buffer cache. The amount of
memory available to SQL Server decides how many pages will be cached in memory.

The buffer cache is managed by the Buffer Manager. Either reading from or writing to any page
copies it to the buffer cache. Subsequent reads or writes are redirected to the in-memory copy,
rather than the on-disc version. The page is updated on the disc by the Buffer Manager only if
the in-memory cache has not been referenced for some time. While writing pages back to disc,
asynchronous I/O is used whereby the I/O operation is done in a background thread so that other
operations do not have to wait for the I/O operation to complete. Each page is written along with
its checksum when it is written. When reading the page back, its checksum is computed again
and matched with the stored version to ensure the page has not been damaged or tampered with
in the meantime.

Logging and Transaction

SQL Server ensures that any change to the data is ACID-compliant, i.e. it uses transactions to
ensure that the database will always revert to a known consistent state on failure. Each
transaction may consist of multiple SQL statements all of which will only make a permanent

Know more @ www.vidyarthiplus.com


change to the database if the last statement in the transaction (a COMMIT statement) completes
successfully. If the COMMIT successfully completes the transaction is safely on disk.

Any changes made to any page will update the in-memory cache of the page, simultaneously all
the operations performed will be written to a log, along with the transaction ID which the
operation was a part of. Each log entry is identified by an increasing Log Sequence Number
(LSN) which is used to ensure that all changes are written to the data files. Also during a log
restore it is used to check that no logs are duplicated or skipped. SQL Server requires that the log
is written onto the disc before the data page is written back. It must also ensure that all
operations in a transaction are written to the log before any COMMIT operation is reported as
completed.

Concurrency and locking

SQL Server allows multiple clients to use the same database concurrently. As such, it needs to
control concurrent access to shared data, to ensure data integrity - when multiple clients update
the same data, or clients attempt to read data that is in the process of being changed by another
client. SQL Server provides two modes of concurrency control: pessimistic concurrency and
optimistic concurrency.

When pessimistic concurrency control is being used, SQL Server controls concurrent access by
using locks. Locks can be either shared or exclusive. Exclusive lock grants the user exclusive
access to the data - no other user can access the data as long as the lock is held. Shared locks are
used when some data is being read - multiple users can read from data locked with a shared lock,
but not acquire an exclusive lock. The latter would have to wait for all shared locks to be
released. Locks can be applied on different levels of granularity - on entire tables, pages, or even
on a per-row basis on tables. For indexes, it can either be on the entire index or on index leaves.

The level of granularity to be used is defined on a per-database basis by the database


administrator. While a fine grained locking system allows more users to use the table or index
simultaneously, it requires more resources. So it does not automatically turn into higher
performing solution. SQL Server also includes two more lightweight mutual exclusion solutions

Know more @ www.vidyarthiplus.com


- latches and spinlocks - which are less robust than locks but are less resource intensive. SQL
Server uses them for DMVs and other resources that are usually not busy. SQL Server also
monitors all worker threads that acquire locks to ensure that they do not end up in deadlocks - in
case they do, SQL Server takes remedial measures, which in many cases is to kill one of the
threads entangled in a deadlock and rollback the transaction it started. To implement locking,
SQL Server contains the Lock Manager.

The Lock Manager maintains an in-memory table that manages the database objects and locks, if
any, on them along with other metadata about the lock. Access to any shared object is mediated
by the lock manager, which either grants access to the resource or blocks it.

Data retrieval

The main mode of retrieving data from an SQL Server database is querying for it. The query is
expressed using a variant of SQL called T-SQL, a dialect Microsoft SQL Server shares with
Sybase SQL Server due to its legacy. The query declaratively specifies what is to be retrieved.

It is processed by the query processor, which figures out the sequence of steps that will be
necessary to retrieve the requested data. The sequence of actions necessary to execute a query is
called a query plan. There might be multiple ways to process the same query.

5.11 EIS/DSS

5.11.1 Environmental Impact Statement

An environmental impact statement (EIS), under United States environmental law, is a


document required by the National Environmental Policy Act (NEPA) for certain actions
"significantly affecting the quality of the human environment".An EIS is a tool for
decision making. It describes the positive and negative environmental effects of a
proposed action, and it usually also lists one or more alternative actions that may be
chosen instead of the action described in the EIS. Several US state governments require
that a document similar to an EIS be submitted to the state for certain actions. For

Know more @ www.vidyarthiplus.com


example, in California, an Environmental Impact Report (EIR) must be submitted to the
state for certain actions, as described in the California Environmental Quality Act
(CEQA).

Purpose

The purpose of the NEPA is to promote informed decision-making by federal agencies by


making "detailed information concerning significant environmental impacts" available to
both agency leaders and the public. The NEPA was the first piece of legislation that
created a comprehensive method to assess potential and existing environmental risks at
once. One of the primary authors of the act was Lynton K. Caldwell. It also encourages
communication and cooperation between all the actors involved in environmental
decisions, including government officials, private businesses, and citizens.

Contrary to a widespread misconception, NEPA does not prohibit the federal government
or its licensees/permittees from harming the environment, but merely requires that the
prospective impacts be understood and disclosed in advance. The intent of NEPA is to
help key decision makers and stakeholders balance the need to implement an action with
its impacts on the surrounding human and natural environment

5.11.2 Decision Support System

A decision support system (DSS) is a computer-based that supports business or organizational


activities. DSSs serve the management, operations, and planning levels of an organization and
help to make decisions, which may be rapidly changing and not easily specified in advance.

DSSs include. A properly designed DSS is an interactive software-based system intended to help
decision makers compile useful information from a combination of raw data, documents, and
personal knowledge, or business models to identify and solve problems and make decisions.

Typical information that a decision support application might gather and present includes:

Know more @ www.vidyarthiplus.com


 comparative sales figures between one period and the next,
 projected revenue figures based on product sales assumptions.

5.12 Data Mining


Generally, data mining (sometimes called data or knowledge discovery) is the process of
analyzing data from different perspectives and summarizing it into useful information -
information that can be used to increase revenue, cuts costs, or both. Data mining software is one
of a number of analytical tools for analyzing data. It allows users to analyze data from many
different dimensions or angles, categorize it, and summarize the relationships identified.
Technically, data mining is the process of finding correlations or patterns among dozens of fields
in large relational databases.

5.12.1 Data, Information, and Knowledge

Data

Data are any facts, numbers, or text that can be processed by a computer. Today, organizations
are accumulating vast and growing amounts of data in different formats and different databases.

This includes:

 operational or transactional data such as, sales, cost, inventory, payroll, and accounting

 nonoperational data, such as industry sales, forecast data, and macro economic data

 meta data - data about the data itself, such as logical database design or data dictionary
definitions

Information

The patterns, associations, or relationships among all this data can provide information. For
example, analysis of retail point of sale transaction data can yield information on which products
are selling and when.

Know more @ www.vidyarthiplus.com


Knowledge

Information can be converted into knowledge about historical patterns and future trends. For
example, summary information on retail supermarket sales can be analyzed in light of
promotional efforts to provide knowledge of consumer buying behavior. Thus, a manufacturer or
retailer could determine which items are most susceptible to promotional efforts.

Data mining is primarily used today by companies with a strong consumer focus - retail,
financial, communication, and marketing organizations. It enables these companies to determine
relationships among "internal" factors such as price, product positioning, or staff skills, and
"external" factors such as economic indicators, competition, and customer demographics. And, it
enables them to determine the impact on sales, customer satisfaction, and corporate profits.
Finally, it enables them to "drill down" into summary information to view detail transactional
data.

With data mining, a retailer could use point-of-sale records of customer purchases to send
targeted promotions based on an individual's purchase history. By mining demographic data
from comment or warranty cards, the retailer could develop products and promotions to appeal to
specific customer segments.

5.12.2 Work of data mining

While large-scale information technology has been evolving separate transaction and analytical
systems, data mining provides the link between the two. Data mining software analyzes
relationships and patterns in stored transaction data based on open-ended user queries. Several
types of analytical software are available: statistical, machine learning, and neural networks.
Generally, any of four types of relationships are sought:

 Classes: Stored data is used to locate data in predetermined groups. For example, a
restaurant chain could mine customer purchase data to determine when customers visit
and what they typically order. This information could be used to increase traffic by
having daily specials.

Know more @ www.vidyarthiplus.com


 Clusters: Data items are grouped according to logical relationships or consumer
preferences. For example, data can be mined to identify market segments or consumer
affinities.

 Associations: Data can be mined to identify associations. The beer-diaper example is an


example of associative mining.

 Sequential patterns: Data is mined to anticipate behavior patterns and trends. For
example, an outdoor equipment retailer could predict the likelihood of a backpack being
purchased based on a consumer's purchase of sleeping bags and hiking shoes.

Data mining consists of five major elements:

 Extract, transform, and load transaction data onto the data warehouse system.

 Store and manage the data in a multidimensional database system.

 Provide data access to business analysts and information technology professionals.

 Analyze the data by application software.

 Present the data in a useful format, such as a graph or table.

Different levels of analysis are available:

 Artificial neural networks: Non-linear predictive models that learn through training and
resemble biological neural networks in structure.

 Genetic algorithms: Optimization techniques that use processes such as genetic


combination, mutation, and natural selection in a design based on the concepts of natural
evolution.

Know more @ www.vidyarthiplus.com


 Decision trees: Tree-shaped structures that represent sets of decisions. These decisions
generate rules for the classification of a dataset. Specific decision tree methods include
Classification and Regression Trees (CART) and Chi Square Automatic Interaction
Detection (CHAID) . CART and CHAID are decision tree techniques used for
classification of a dataset. They provide a set of rules that you can apply to a new
(unclassified) dataset to predict which records will have a given outcome. CART
segments a dataset by creating 2-way splits while CHAID segments using chi square tests
to create multi-way splits. CART typically requires less data preparation than CHAID.

 Nearest neighbour method: A technique that classifies each record in a dataset based on
a combination of the classes of the k record(s) most similar to it in a historical dataset
(where k 1). Sometimes called the k-nearest neighbour technique.

 Rule induction: The extraction of useful if-then rules from data based on statistical
significance.

 Data visualization: The visual interpretation of complex relationships in


multidimensional data. Graphics tools are used to illustrate data relationships.

5.13 GROUPWARE SERVER

Groupware addresses the management of semi-structured information such as text,


image, mail, bulletin boards and the flow of work. These Client/Server systems have people in
direct contact with other people.

Groupware is a category of software designed to help groups work together by


facilitating the exchange of information among group members who may or may not be located
in the same office. Often, groupware users are collaborating on the same project, although
groupware can be used to share a variety of information throughout an entire organization and
can also be extended to clients, suppliers, and other users outside the organization.

Know more @ www.vidyarthiplus.com


Groupware is an ideal mechanism for sharing less-structured information (for example,
text or diagrams, as opposed to fielded or structured data) that might not otherwise be accessible
to others. It is also used to define workflow, so that as one user completes a step in a project or
process, the person responsible for the next step is notified automatically.

5.13.1 Features Of Groupware Server

Groupware packages offered by different software vendors will include different features
and functions, but most typically include the following components:

 Calendaring and Scheduling. Each user maintains an online calendar to track


appointments, days out of the office, and other times when he or she is unavailable. Other
users can view their colleagues' calendars to look for "free" time for scheduling a new
meeting.
 Discussion Databases. These are topic-specific databases where a user can post an idea,
question, or suggestion on a given subject, and other users can post their responses. A
discussion board may be set up for a short period of time to gather comments, for
example, on an upcoming event, or left up indefinitely, say to solicit new product ideas
on an ongoing basis. Usually, the name of each person who posted an item is recorded,
but anonymous postings are an option.
 Reference Libraries. These are collections of reference materials, such as employee
handbooks, policy and procedure manuals, and similar documents. Typically, only certain
users are able to post materials to a reference database, while other users have "read only"
access—that is, they can view the materials but are not authorized to make any changes
to them.
 Email. This is probably the most heavily used groupware feature and is used to send
messages to other groupware users.
 A message may be addressed to one or more individuals or sent to a group, such as
"Sales," that includes the names of all people within a given department. Generally, users
are also able to send messages to individuals located outside the organization.

Know more @ www.vidyarthiplus.com


Figure 5.14 Features Of Groupware Server

Although email is an essential component of groupware, email and groupware employ


different methods for disseminating information. Every email message that is sent must have one
or more recipients listed in the "To:" field. This is called the "push" model because it pushes the
message out to the recipients whether or not a given recipient is interested in receiving it.
Groupware uses the "pull" model, in that each user accesses and pulls from the various group-
ware applications that information which is of relevance to him or her.

Groupware functionality may also include the ability to control who sees a given piece of
information. Access can be limited to specifically named individuals or to members of a group,
such as all managers, members of the accounting department, or those employees working on a
sensitive or confidential project. For example, job descriptions may be accessible to all users, but

Know more @ www.vidyarthiplus.com


access to related salary information may be limited to managers and members of the human
resources department.

5.13.2 Work Of Groupware

Groupware software can be divided into two categories: server and client. Depending on
the size of an organization and the number of users, the software is installed on one or more
computers (called "servers") in a network. Server software can also be installed on computers
located in other locations, from a nearby city to another state or country. The server houses the
actual applications and the associated information entered into them by the various users. If more
than one server is used, the servers will "speak" to one another in a process called "replication."
In this way, information held in the same database, but in a different locations or on a different
server, is exchanged between the servers. Once this is accomplished, the servers are said to be
"synchronized."

Each person using the groupware has the client software installed on his or her desktop or
laptop computer. The client software enables the user to access and interact with the applications
stored on the servers. Some users may be "remote;" that is, they are not in the office on a full-
time basis but rather use a modem or other type of connection to access and use the groupware.

Know more @ www.vidyarthiplus.com


KET NOTES

 Client server and internet


 Web client server
 3 tier client server web style
 CGI
 The server side of web
 CGI and State
 SQL database servers
 Middleware and federated databases
 Data warehouses
 EIS/DSS to data mining
 GroupWare Server
 What is GroupWare?
 Components of GroupWare

Know more @ www.vidyarthiplus.com


5.14 QUESTION BANK
PART-A
1. Define Internet. (April-20010)
2. What is Middleware?
3. What is CGI?
4. What is EIS? (April-2009)
5. What is DSS?
6. Define Data mining.
7. Define Middleware.
8. Define Data warehouses.
9. What is GroupWare?
10. List out Components of GroupWare.
11. What is the Era in web client server?
12. What are the web application protocols?
13. Draw the URL structure. (April-2009)
14. What are the types of HTTP header fields?
15. Define CGI.
16. Define hyperlink with its syntax.
17. Define CGI and its state.
18. What are the two security protocols in web?
19. Define SSL.
20. What are the ISO standards and define it?

Know more @ www.vidyarthiplus.com


21. What are the three types of SQL Server Architecture? (April-2010)
22. Differentiate between static and dynamic SQL.
23. Define SQL middleware.
24. What are the middleware solutions?
25. Draw the MDI gateway structure?

PART – B
1. Briefly explain about Client server and internet? (April-2009)
2. Discuss about Web client server.
3. Briefly explain about 3 tier client server web style?
4. Briefly explain about CGI and State?
5. Discuss SQL database servers. (April-2010)
6. Discuss merits and demerits Middleware and federated databases.
7. Briefly explain about Data warehouses?
8. Explain EIS/DSS to data mining? (April-2010)
9. Briefly explain about GroupWare Server?
10. Explain Components of GroupWare?
11. Describe about 3-tier client/server web style. (April-2009)
12. Explain CGI scenario based on the web client/server in the interactive era.
13. Brief description about SQL Database Server Architectures with ISO Standards.
14. Give brief explanation for CGI and STATE.
15. How to structure the flow of text in HTML document.

Know more @ www.vidyarthiplus.com

You might also like