UNIT V
UNIT V
APPLICATION LAYER
The documents in the WWW can be grouped into three broad categories:
static, dynamic, and active.
The category is based on the time at which the contents of the document are determined.
Dynamic documents are sometimes referred to as server-site dynamic documents.
Active documents are sometimes referred to as client-site dynamic documents. An active web document consists
of a computer program that the server sends to the browser and that the browser must run locally.
URL (Uniform Resource Locator)
A URL (Uniform Resource Locator) is a unique identifier used to locate a resource on the Internet.
It is also referred to as a web address. URLs consist of multiple parts -- including a protocol and domain name --
that tell a web browser how and where to retrieve a resource.
End users use URLs by typing them directly into the address bar of a browser or by clicking a hyperlink found on
a webpage, bookmark list, in an email or from another application.
The first part of a URL identifies what protocol to use as the primary access medium. The second part identifies
the IP address or domain name.
Optionally, after the domain, a URL can also specify:
a path to a specific page or file within a domain;
a network port to use to make the connection;
a specific reference point within a file, such as a named anchor in an HTML file; and
a query or search parameters used -- commonly found in URLs for search results.
Hypertext Transfer Protocol (HTTP)
The Hypertext Transfer Protocol (HTTP) is a protocol used mainly to access data on the World Wide Web.
HTTP functions as a combination of FTP and SMTP.
HTTP uses the services of TCP on well-known port 80.
Two choices: to retrieve each object using a new TCP connection or to make a TCP connection and retrieve them
all.
The first method is referred to as a nonpersistent connection, the second as a persistent connection.
HTTP, prior to version 1.1, specified nonpersistent connections, while persistent connections are the default in
version 1.1, but it can be changed by the user.
Nonpersistent Connections
In a nonpersistent connection, one TCP connection is made for each request/response.
The following lists the steps in this strategy:
1. The client opens a TCP connection and sends a request.
2. The server sends the response and closes the connection.
3. The client reads the data until it encounters an end-of-file marker; it then closes the connection.
In this strategy, if a file contains links to N different pictures in different files (all located on the same server), the
connection must be opened and closed N + 1 times.
The nonpersistent strategy imposes high overhead on the server because the server needs N + 1 different buffers
each time a connection is opened.
Persistent Connections
HTTP version 1.1 specifies a persistent connection by default.
In a persistent connection, the server leaves the connection open for more requests after sending a response.
The server can close the connection at the request of a client or if a time-out has been reached.
Basic Features
There are three basic features that make HTTP a simple but powerful protocol:
HTTP is connectionless: The HTTP client, i.e., a browser initiates an HTTP request and after a request is made, the
client waits for the response. The server processes the request and sends a response back after which client disconnect the
connection. So client and server knows about each other during current request and response only. Further requests are
made on new connection like client and server are new to each other.
HTTP is media independent: It means, any type of data can be sent by HTTP as long as both the client and the server
know how to handle the data content. It is required for the client as well as the server to specify the content type using
appropriate MIME-type.
HTTP is stateless: As mentioned above, HTTP is connectionless and it is a direct result of HTTP being a stateless
protocol. The server and client are aware of each other only during a current request. Afterwards, both of them forget
about each other. Due to this nature of the protocol, neither the client nor the browser can retain information between
different requests across the web pages.
NOTE:
HTTP/1.0 uses a new connection for each request/response exchange, where as HTTP/1.1 connection may be used for
one or more request/response exchanges.
Message Formats
The HTTP protocol defines the format of the request and response messages.
1. Request Message
The initial line is different for the request and for the response.
A request-line consists of three parts: a method name, requested resource's local path, and the HTTP
version being used. All these parts are separated by spaces.
The method field defines the request types.
In version 1.1 of HTTP, several methods are defined, as shown in Table.
Most of the time, the client uses the GET method to send a request.
In this case, the body of the message is empty.
The HEAD method is used when the client needs only some information about the web page from the server,
such as the last time it was modified.
It can also be used to test the validity of a URL.
The response message in this case has only the header section; the body section is empty.
The PUT method is the inverse of the GET method; it allows the client to post a new web page on the server (if
permitted).
The POST method is similar to the PUT method, but it is used to send some information to the server to be
added to the web page or to modify the web page.
The TRACE method is used for debugging; the client asks the server to echo back the request to check whether
the server is getting the requests.
The DELETE method allows the client to delete a web page on the server if the client has permission to do so.
The CONNECT method was originally made as a reserve method; it may be used by proxy servers, as discussed
later.
Finally, the OPTIONS method allows the client to ask about the properties of a web page.
The second field, URL, was discussed earlier.
It defines the address and name of the corresponding web page.
The third field, version, gives the version of the protocol; the most current version of HTTP is 1.1.
After the request line, we can have zero or more request header lines.
Each header line sends additional information from the client to the server.
For example, the client can request that the document be sent in a special format.
Each header line has a header name, a colon, a space, and a header value.
Table shows some header names commonly used in a request.
The value field defines the values associated with each header name.
The list of values can be found in the corresponding RFCs.
The body can be present in a request message.
Usually, it contains the comment to be sent or the file to be published on the website when the method is PUT or
POST.
2. Response message
A response message consists of a status line, header lines, a blank line, and sometimes a body.
The first line in a response message is called the status line.
There are three fields in this line separated by spaces and terminated by a carriage return and line feed.
The first field defines the version of HTTP protocol, currently 1.1.
The status code field defines the status of the request.
It consists of three digits.
Whereas the codes in the 100 range are only informational, the codes in the 200 range indicate a
successful request.
The codes in the 300 range redirect the client to another URL, and the codes in the 400 range indicate
an error at the client site.
Finally, the codes in the 500 range indicate an error at the server site.
1xx: Information
Message Description
100 Continue The server has received the request headers, and the client should proceed to send the
request body
101 Switching Protocols The requester has asked the server to switch protocols
103 Checkpoint Used in the resumable requests proposal to resume aborted PUT or POST requests
2xx: Successful
Message Description
200 OK The request is OK (this is the standard response for successful HTTP requests)
201 Created The request has been fulfilled, and a new resource is created
202 Accepted The request has been accepted for processing, but the processing has not been completed
203 Non-Authoritative The request has been successfully processed, but is returning information that may be
Information from another source
204 No Content The request has been successfully processed, but is not returning any content
205 Reset Content The request has been successfully processed, but is not returning any content, and
requires that the requester reset the document view
206 Partial Content The server is delivering only part of the resource due to a range header sent by the client
3xx: Redirection
Message Description
300 Multiple Choices A link list. The user can select a link and go to that location. Maximum five
addresses
301 Moved Permanently The requested page has moved to a new URL
302 Found The requested page has moved temporarily to a new URL
303 See Other The requested page can be found under a different URL
304 Not Modified Indicates the requested page has not been modified since last requested
306 Switch Proxy No longer used
307 Temporary Redirect The requested page has moved temporarily to a new URL
308 Resume Incomplete Used in the resumable requests proposal to resume aborted PUT or POST requests
After the status line, we can have zero or more response header lines.
Each header line sends additional information from the server to the client.
For example, the sender can send extra information about the document.
Each header line has a header name, a colon, a space, and a header value.
Table shows some header names commonly used in a response message.
The body contains the document to be sent from the server to the client.
The body is present unless the response is an error message.
Conditional Request
A client can add a condition in its request.
In this case, the server will send the requested web page if the condition is met or inform the client
otherwise.
One of the most common conditions imposed by the client is the time and date the web page is
modified.
The client can send the header line If-Modified-Since with the request to tell the server that it needs the
page only if it is modified after a certain point in time.
Cookies
The World Wide Web was originally designed as a stateless entity.
A client sends a request; a server responds.
Their relationship is over.
The original purpose of the Web, retrieving publicly available documents, exactly fits this design.
Today the Web has other functions that need to remember some information about the clients; some are
listed below:
Websites are being used as electronic stores that allow users to browse through the store, select
wanted items, put them in an electronic cart, and pay at the end with a credit card.
Some websites need to allow access to registered clients only.
Some websites are used as portals: the user selects the web pages he wants to see.
Some websites are just advertising agencies.
For these purposes, the cookie mechanism was devised.
Using Cookies
When a client sends a request to a server, the browser looks in the cookie directory to see if it can find a
cookie sent by that server. If found, the cookie is included in the request.
When the server receives the request, it knows that this is an old client, not a new one.
Note that the contents of the cookie are never read by the browser or disclosed to the user.
It is a cookie made by the server and eaten by the server.
Web Caching
Proxy Servers HTTP supports proxy servers.
A proxy server is a computer that keeps copies of responses to recent requests.
The HTTP client sends a request to the proxy server.
The proxy server checks its cache.
If the response is not stored in the cache, the proxy server sends the request to the corresponding server.
Incoming responses are sent to the proxy server and stored for future requests from other clients.
The proxy server reduces the load on the original server, decreases traffic, and improves latency.
However, to use the proxy server, the client must be configured to access the proxy instead of the target
server.
Note that the proxy server acts as both server and client.
When it receives a request from a client for which it has a response, it acts as a server and sends
the response to the client. When it receives a request from a client for which it does not have a
response, it first acts as a client and sends a request to the target server. When the response has
been received, it acts again as a server and sends the response to the client.
Cache Update
A very important question is how long a response should remain in the proxy server before being
deleted and replaced.
Several different strategies are used for this purpose.
One solution is to store the list of sites whose information remains the same for a while.
For example, a news agency may change its news page every morning.
This means that a proxy server can get the news early in the morning and keep it until the next day.
Another recommendation is to add some headers to show the last modification time of the information.
The proxy server can then use the information in this header to guess how long the information would
be valid.