HTTP Lecture
HTTP Lecture
Description
Dr Eckhard Pfluegel
Room SB2015 Penrhyn Road
[email protected]
Part I – Overview
1.0 Introduction
1.1 HTTP Language Elements
1.2 HTTP/1.0 Request Methods
1.3 HTTP/1.0 Headers
1.4 HTTP/1.0 Response Classes
1.5 HTTP Extensibility
1.6 SSL and Security
Page 2
1.0 Introduction
• Definition of the term Protocol: language with
grammar, syntax and semantic
– Well-known examples: networking (TCP/IP), FTP
• Hypertext Transfer Protocol (HTTP): request-
response protocol for World Wide Web
– Is application-level protocol
– Is stateless protocol
– Syntax based on MIME
Page 3
Overview of HTTP
• TSL/SSL
– 1989: Initial proposal by Tim Berners-Lee (CERN)
– He invents the World Wide Web and proposes a simple
protocol to exchange hypertext documents: HTTP.
• Goals:
– access documents anywhere in the Internet
– help navigate between them via hypertext
• History: two distinct early phases
– HTTP/0.9 to HTTP/1.0 specification: period of four years
(1992 – 1996)
– HTTP/1.0 to HTTP/1.1: another four years (1997 – 2001)
HTTP Protocol Page 4
History
• 1996-1997 (HTTP/0.9 & 1.0): Early versions focus on basic file
transfer, lacking features like persistent connections and security
– Limited Security: Vulnerable to eavesdropping and man-in-the-middle
attacks.
• 1997 (HTTP/1.1): Introduces crucial improvements
– Persistent connections: Reduces overhead and improves performance.
– Content negotiation: Allows servers to adjust content based on client
capabilities.
• 1999 (HTTP/1.1 update): Further refines error handling and caching
mechanisms
• 2015 (HTTP/2): Prioritizes security, speed and efficiency
– Secure connection (SSL) becomes default
– Enables handling multiple requests simultaneously.
– Header compression: Reduces header size for faster transmission.
• 2022 (HTTP/3): Protocol continues to evolve, with ongoing efforts to
enhance security, performance, and adaptability
Page 5
1.1 HTTP Language Elements
• We will explore building blocks of
HTTP/1.0, based on [1]
• Key elements of HTTP:
1. Messages (request/response)
2. Entities
3. Resources
4. User Agent
• We will review these different notions,
before going into the details of HTTP
Page 6
HTTP Message
• Sequence of octets sent over transport
connection
• Some parts of HTTP messages can be
coded in ASCII
• Fundamental unit of communication in
HTTP
• Two types of message:
– Request, sent from client and server
– Response, sent from server to client
Page 7
HTTP Request Format
• Syntax:
Request-Line
General/Request/Entity Header(s)
CRLF
Optional Message Body
• Example:
GET /index.html HTTP/1.0
Date: Wed, 22 Mar 2000 08:09:01 GMT
Pragma: No-cache
From: [email protected]
User Agent: Mozilla/4.03
CRLF
Page 8
HTTP Response Format
• Syntax:
Status-Line
General/Response/Entity Header(s)
CRLF
Optional Message Body
• Example:
HTTP/1.0 200 OK
Date: Wed, 22 Mar 2000 08:09:03 GMT
Server: Netscape-Enterprise/3.5.1
Content-Length: 34
CRLF
<html><body>Welcome!</body></html>
Page 9
Entity
• Represents a resource, enclosed in
request or response message
• Consists of entity header and optional an
entity body
• Entity header: metadata about entity
• Entity body: contains data, e.g.
– Request: content of HTML form, submitted by
user
– Response: a HTML document
Page 10
Resource
• “Network data object or service that can
be identified by URI”
– Can be located anywhere as long as
accessible via network connection
– Note: can be service. Web is used as transfer
medium for initiating service and handling
response
Page 11
1.2 HTTP/1.0 Request Methods
• Purpose: notify HTTP server what action to perform on
resource identified by Request-URI
• Principle of request-response transaction:
– Request including headers and URI sent to server
– Server applies methods to resource and sends back response
• HTTP/1.0 defines the following methods: GET, HEAD
and POST
• Some versions of clients and servers implemented also
PUT, DELETE, LINK, and UNLINK
• We will study the following methods: GET, HEAD, POST
and DELETE
Page 12
GET (HTTP/1.0)
• Most popular method in use today
• GET request is applied to resource specified by
URI
• There is no request body
• Generated response is current value of resource
– Static file: return content of file
– Service (Application): return generated data (e.g. CGI
script)
• Example:
GET /index.html HTTP/1.0
Page 13
GET Example
• Request:
GET /index.html HTTP/1.0
• Response:
HTTP/1.0 200 OK
Content-Length: 4175
Last-Modified: Mon, 06 July 2004 09:59:23 GMT
Content-Type: text/html
CRLF
<html>
<body>
<h> A Sample Webpage </h>
Welcome to my website!
<img src="http://images.com/logo.gif">
</body>
</html>
Page 14
HEAD (HTTP/1.0)
• Asks only for metadata associated with a resource
• Returned metadata is the same as for GET request
• No response body is returned
• Primary use: debugging server implementation
• Has no request body
• Example:
HEAD /index.html HTTP/1.0
might return
HTTP/1.0 200 OK
Content-Length: 4175
Last-Modified: Mon, 06 July 2004 09:59:23 GMT
Content-Type: text/html
Page 15
POST (HTTP/1.0)
• GET and HEAD are used to retrieve information
• POST is used to update existing resource or
provide input to process
• Body of request includes data
• Server carries out specific actions, depending on
Request-URI
• Example:
POST /app/user.php HTTP/1.0
Content Length: 143
<entity body>
Page 16
Submitting Input: GET vs POST
• Both GET and POST can be used to submit input
to application, e.g. CGI-script
• Using GET: input is encoded in Request-URI
GET /search.cgi?dest=Hawaii&season=winter
Page 17
DELETE (HTTP/1.0)
• This method deletes the resource identified by
the Request-URI
• Hence, allows deletion of resources remotely
• But: origin servers have control over deleting
action
• Two types of response:
– Deny action ( no success)
– Grant action, either instantly or at some point in the
future
Page 18
1.3 HTTP/1.0 Headers
• Headers “describe” a request or response
– Provide metadata about resource (e.g. length,
encoding format, language)
– Indicate whether a response can be cached
– Specify how to decode message
• HTTP is extensible: new headers can be defined
• Message can have any number of headers
• Headers can be required or optional
• Generic syntax: <name>: <value>
• Different types: general, request, response and
entity headers – interpreted in this order
Page 19
General Headers
• Appear in both request and response
messages
• Refer to message itself and not to entity
which is part of message
• HTTP/1.0 defines only two general header
fields:
– Date
– Pragma
Page 20
Date General Header
• Date: indicates date and time of message creation
• Syntax for date string follows RFC 822 (although other
formats possible)
• Example: a HTTP response from Google’s Web server
HTTP/1.x 200 OK
X-TR: 1
Cache-Control: private
Content-Type: text/html
Server: GWS/2.1
Date: Wed, 14 Sep 2005 11:39:53 GMT
Content-Length: 3036
CRLF
<Body Omitted>
Page 21
Pragma General Header
• Pragma: can be used to send a directive to recipient of
message
• In general, directives are optional as far as protocol is
concerned
• In practice, obeyed by most components
• Only defined directive, in request message: Pragma: no-
cache
• Informs proxies in path not to return cached copy
• Sender is interested in getting response directly from
origin server
• More details on this in Lecture 8 (Web Caching)
Page 22
Request Headers
• Can be used for sending additional information
about client
• In HTTP/1.0, there are 5 request headers
defined:
– Authorization
– From
– Referer (notice spelling mistake!)
– User-Agent
– If-Modified-Since
• We will discuss the first four requests headers
• More details about last header in Lecture 8!
Page 23
Authorization Request Header
• Used by user agent to include credentials required to access
resource
• Example: retrieving staff Klic homepage
Page 24
From Request Header
• Includes user e-mail address in request
• Note: might violate privacy of user
• In today’s world of security problems, probably
no user agent would do this
• Example (artificial):
GET /default.aspx HTTP/1.1
Host: staff.kingston.ac.uk
User-Agent: Mozilla/5.0 (Windows; U;
Windows NT 5.0; en-GB; rv:1.7.10)
Gecko/20050717 Firefox/1.0.6
From: [email protected]
Page 25
Referer Request Header
• Lets client include resource from which Request-
URI was obtained
• Useful for server logs or advertising purposes
• Example:
– User is visiting www.travel.com
– Clicks on link www.hawaii.com/condominiums/catalogue.html
– This sends request to www.hawaii.com, with following Referer
header:
GET /condominiums/catalogue.html HTTP/1.0
Referer: http://www.travel.com
– Host hawaii.com now knows that visitor came via travel.com
Page 26
User-Agent Request Header
• User-Agent :
– Includes information about version of browser
software being used and client machine’s operating
system version
– Advantage: send alternative version of resource,
supported by particular browser
– Disadvantage: tracking the user (confidentiality)
– Example (same as previous):
GET /default.aspx HTTP/1.1
Host: staff.kingston.ac.uk
User-Agent: Mozilla/5.0 (Windows; U; Windows
NT 5.0; en-GB; rv:1.7.10) Gecko/20050717
Firefox/1.0.6
Page 27
Response Headers
• Used to send more information about
response itself and/or origin server
– Server: analogous to User-Agent
• Contains server software version number
• Example:
Server: Apache/1.2.6 Red Hat
– Location: redirect request
• Example:
Location: www.eurotravel.com/hawaii/accommodation.htm
Page 28
Entity Headers
• Includes information about body of
entity/resource
• Can be in request or response
• There are six entity headers: Allow, Content-
Type, Content-Encoding, Content-
Length, Expires and Last-Modified
• We will look more in detail at Content-Type
and Content-Length
• Expires and Last-Modified: see Lecture 8
in the context of Web Caching
Page 29
Content-Type Entity Header
• Content-Type: indicates media type of entity body
• Sample media types:
– text/html
– image/gif
– application/x-javascript
• Example: HTTP response, returning an image resource
HTTP/1.x 200 OK
Content-Type: image/gif
Last-Modified: Mon, 25 Apr 2005 21:11:27 GMT
Expires: Sun, 17 Jan 2038 19:14:07 GMT
Server: GWS/2.1
Content-Length: 1514
Date: Fri, 16 Sep 2005 10:46:48 GMT
Page 30
Content-Length Entity Header
• Content-Length: indicates length of entity body
(in bytes)
• Allows recipient to verify that entity body was
received completely
• Can be in request and response
• Example: POST request, containing HTML form
data
POST /search.cgi HTTP/1.0
Content-Length: 24
CRLF
dest Hawaii
season winter
Page 31
1.4 HTTP/1.0 Response Classes
• Every HTTP response message begins
with Status-Line
• This consists of version number, response
code and natural language reason phrase
• Example: HTTP/1.0 200 OK
• Responses are grouped into response
classes having individual response codes
• We will now look at the five different
response classes
Page 32
Informational Class Responses
• Response codes: 1xx
• Although this class of response was
defined in HTTP/1.0, no actual response
codes allocated!
• Used in HTTP/1.1 – see next lecture!
• Good example of extensibility of protocol
Page 33
Success Class Responses
• Response codes: 2xx
• Send if server has received and accepted HTTP
request
• Successful response code doesn’t indicate that
result will meet clients expectation
• Following codes are defined:
– 200 OK
– 201 Created
– 202 Accepted
– 204 No Content
Page 34
Redirection Class Responses
• Response codes: 3xx
• Inform user agent that additional action is
needed to complete request
• For example, resource has to be retrieved from
different location
• Following four response codes are defined:
– 300 Multiple Choices
– 301 Moved Permanently
– 302 Moved Temporarily
– 304 Not Modified
Page 35
Client Error Class Responses
• Response codes: 4xx
• Identify errors presumably made by client
• User agent displays error code and reason
phrase
• The defined error codes are:
– 400 Bad Request
– 401 Unauthorized
– 403 Forbidden
– 404 Not Found
• The last error code is probably the most famous
one!
Page 36
Server Error Class Responses
• Response codes: 5xx
• Return errors related to server
• Difference to previous error codes: problem is on
server side, client cannot solve the problem
• List of defined error codes:
– 500 Internal Server Error
– 501 Not Implemented
– 502 Bad Gateway
– 503 Service Unavailable
Page 37
1.5 HTTP Extensibility
• Design decision: new request methods,
response classes and codes can be introduced
• HTTP/1.1: several new methods, headers and
response codes added
• Example where this is useful: streaming media
on the Web
• In principle, protocol should be independent
from its implementations
• In practice: things are sometimes different. See
next lecture!
Page 38
1.6 SSL and Security
• Security is a serious and important issue
• More and more business is done on the Web
• Networks, applications and Web transactions
are getting more and more complex
• Two possible levels of protection:
– Security at lower level (SSL)
– Security mechanisms in HTTP/1.0 (although this
needs improvement, see HTTP/1.1)
Page 39
SSL
• Introduced in 1994 by Netscape
• Different protocols (e.g. HTTP, LDAP, IMAP) can
use SSL
• Is a protocol between transport and applications
layers
• Principle: encrypted connection between
authenticated client and server
• Use encryption algorithm and key and certificate
authority [2]
• Note: this is transparent to higher layers
Page 40
HTTPS
• Uses SSL for transporting the HTTP message
• Everything is encrypted, including the Request-
URI
• Hence communication is secret, even if
intercepted
• Runs on port 443, URLs start with https:
• Most popular browsers support SSL these days
(watch the closed lock displayed on status bar)
Page 41
Summary
• HTTP defines request methods and response
classes
• HTTP requests and responses are made out of
header and body
• The most popular HTTP request method is the
GET method
• Knowing the HTTP protocol means being able to
talk to Web servers, this is what browsers do
• HTTP provides certain levels of security
Page 42
Further Reading
• [1] RFC 1945: Hypertext Transfer Protocol
– HTTP/1.0.
http://www.rfc-editor.org/rfc/rfc1945.txt
Page 43
Part II – Overview
2.0 Introduction
2.1 New Concepts in HTTP/1.1
2.2 Connection Management
2.3 Extensibility
2.4 Internet Address Conservation
2.5 Content Negotiation
2.6 Security Aspects in HTTP/1.1
Page 44
2.1 Introduction
• Problem: the Web developed so quickly and grew so fast
that HTTP could not mature
• Several problems led to proposal of enhancements in
HTTP/1.1
• It took several years for HTTP/1.0 to develop into
HTTP/1.1
• Many browsers are still not HTTP/1.1 compliant
• Note: formal process of protocol development can take
several years
• Standards are overlooked by the Internet Engineering
Task Force (IETF)
• HTTP/1.1 became standard in 2001 [2]
Page 45
Problems with HTTP/1.0
• Caching
– Clients, proxies and servers have inadequate control
• Download efficiency:
– Resources are always downloaded fully
– Inability to continue with interrupted transfers
• Extensibility problems
– Impossible to predict capabilities of server implementation
• Poor levels of security
– Authentication information is transmitted in plaintext
• Miscellaneous problems with various methods, headers
and response codes
– Mostly to do with incomplete specifications and hence
inconsistent implementations
Page 46
2.1 New Concepts in HTTP/1.1
• Apart from attempts to fix old problems, also new
concepts were introduced
• In this lecture, we will look in more details at:
– Connection management
– Extensibility
– Internet address conservation
– Support for variants of a resource
– Security
• This requires the introduction of new methods, headers
and response codes
• We will look at the differences in syntax/semantic
between HTTP/1.0 and HTTP/1.1
Page 47
HTTP/1.1 Request Methods
• The following request methods are
formally specified in HTTP/1.1:
– GET, HEAD, POST (HTTP/1.0)
– PUT, DELETE (non-standard in HTTP/1.0)
– OPTIONS (see Section 2.3), TRACE,
CONNECT (not part of HTTP/1.0)
Page 48
GET, HEAD, POST ( HTTP/1.1)
• GET:
– It is now possible to request parts of an entity
– This is useful e.g. when requesting large PDF
documents
• HEAD:
– Conditional request headers are now allowed
– Use of If-Modified-Since similarly as in GET,
see Lecture 8
• POST:
– Affected by improved connection management, see
Section 2.2
Page 49
PUT, DELETE ( HTTP/1.1)
• PUT:
– Was partially described in HTTP/1.0
– Has now been clarified and completed
– Allows clear distinction between PUT and POST:
• PUT allows creation of resource, whose name is given by
URI in request, on origin server
• POST applies resource to entity data in body of request
– There are security issues
• DELETE:
– Similar to PUT, the method has been formalised
– Definition of new response codes
– Similar requirements for proxies
Page 50
HTTP/1.1 Headers and
Response Codes
• New general headers: Cache-Control,
Connection and 5 more headers
• New request headers: Host, Max-Forwards,
Accept-Language and 11 more headers
• New response headers: Age and 5 more
headers
• New entity headers: Content-Language and
3 more headers
• There is also a new type of header (Hop-by-
Hop) with 8 specific headers
• There are new response codes as well
Page 51
2.2 Connection Management
• Remember: virtually all implementations of
HTTP use TCP
• TCP not optimal for short-lived
connections
• This slows down transfer of typical
container documents (embedded
resources)
• Therefore goal: extend TCP connection
beyond single request-response exchange
Page 52
Keep-Alive Mechanism of
HTTP/1.0
• Some implementations of HTTP/1.0 provided a
new request header which allows connections to
persist
• Client would send following request:
GET /index.html HTTP/1.0
...
Connection: Keep-Alive
• If server agrees to keep connection open:
HTTP/1.0 200 OK
...
Connection: Keep-Alive
Page 53
Persistent Connections in HTTP/1.1
• Variety of solutions was developed before final
approach was adopted
• Evolution:
– New HTTP methods:
• MGET: the ability to request several resources at once.
Similar to FTP.
• GETLIST/GETALL: similar – request a list of resources/all
embedded resources
– Simultaneous parallel connections
• Browser would open several connections in parallel. This
however increases load on network
– Persistent connections
• This is what finally was adopted!
Page 54
Persistent Connections (cont.)
• Principle:
– Reuse existing transport connection
– Minor changes to HTTP protocol
• Idea:
– Persistent connections are now default in
HTTP/1.1
– If connection should be closed, use
Connection: close
• Main design principle is pipelining
Page 55
Pipelining on
Persistent Connections
• Idea: a client sending request and waiting
for response introduces unnecessary
delay
• Therefore: keep sending request without
waiting for responses
• Requires server to handle requests in
same order
• The achieved gain is due to pipelining
Page 56
Pipelining Example
• The following request would be possible:
GET /index.html HTTP/1.1
CRLF
GET /image1.jpg HTTP/1.1
CRLF
GET /image2.jpg HTTP/1.1
CRLF
GET /image3.jpg HTTP/1.1
CRLF
Page 57
Pipelining Example
Web Client
Request 1 Web Server
Request 2
Request 3
Response 1
Response 2
Response 3
Page 58
Problems with
Persistent Connections
• “Head of Line Blocking”:
– One of the pipelined requests takes large
amount of time
– Therefore blocks later requests
– Difficult to prevent
• Unexpected closes
– HTTP connection can be aborted in many
ways (by client or server)
– Correct recovery needed
Page 59
Closing Persistent Connections
• When should persistent connection be
closed?
• Several interests compete (server
resources, optimising client requests)
• Different options were proposed:
– Timeout
– Maximal number of request per connection
• Final specification remain silent!
• Each server can use own heuristic
Page 60
2.3 Extensibility
• Extensibility is important when designing
protocols
• In case of HTTP: implementations of Web
components created different standards
• Learning from this, HTTP/1.1 allows future
extensions
• These comprise:
– Learning about server’s capability
– Learn about intermediate servers in path
– Support for upgrading to other protocols
• We will look at the first two features!
Page 61
Learning about the Server
• Idea: new method which explores capabilities of servers
• Syntax:
OPTIONS <URI> HTTP/1.1
• Example:
– Client sends request:
OPTIONS * HTTP/1.1
Host: example.com
– Server replies:
HTTP/1.1 200 OK
Allow: HEAD, GET, POST, TRACE, OPTIONS
• This list all the methods the server implementation
supports
• Here ’*’ means all methods, independent of specific URL
Page 62
Learning about Intermediate
Servers
• The OPTIONS method can also be used in order to learn
about proxies
• Use the Max-Forwards header
• Example:
OPTIONS * HTTP/1.1
Host: example.com
Max-Forwards: 1
Page 63
2.4 Internet Address Conservation
• Problem: explosion of used IP addresses
• Reason: Web hosting companies giving out
individual domain names to customers
• They all need individual IP addresses since
HTTP request doesn’t contain hostname:
GET /index.html HTTP/1.0
• Solution: introduce Host header line
GET /index.html HTTP/1.1
Host: www.paradise.com
• This new header line is mandatory in HTTP/1.1!
• It can be ignored in HTTP/1.0
Page 64
2.5 Content Negotiation
• Situation: there are different formats of a
resource
• Client and server should negotiate for preferred
representation
• Example: content in multiple languages
• Two different kinds of negotiations:
– Agent-driven: client receives alternatives and
indicates choice in second request
– Server-driven: server chooses representation based
on information on request and client
Page 65
Example: Server-Driven
Content Negotiation
• Client uses the Accept header to indicate accepted
entity characteristics
• Example:
GET /index.html HTTP/1.1
Host: www.asterix.com
Accept-Language: en-us, fr-BE
• Server chooses an available option, and provides
Content-Language header:
HTTP/1.1 200 OK
Content-Length: 23819
Content-Language: fr-BE
…
<response in French-Belgian>
…
Page 66
2.6 Security Aspects in HTTP/1.1
• Goal: ensure only authenticated users
have access to resources
• Consider a HTTP message:
– No one else should be able to read its content
– Receiver should be able to ensure that
message was indeed sent by sender
– Method should arrive without modification
Page 67
Message Authentication
• HTTP/1.0: Basic authentication scheme is
not very secure
– Username and password sent in plain text
– Not suitable for e-commerce
• This has been changed in HTTP/1.1
• For more information, see RFC 2617
Page 68
Message Integrity
• New entity header in HTTP/1.1 allows
(primitive) integrity check of message body
using a checksum
• New response (411 Length Required) asks
for content length of message body before
processing request
– Might help against buffer overflow attacks
Page 69
Summary
• HTTP/1.1 attempts to fix problems in HTTP/1.0
• This is done by introducing new request
methods, headers and response codes
• A lot of effort has been put in improving caching
• Other improvements concern Connection
Management, Extensibility, Internet Address
Conservation, Content Negotiation and Security
Page 70
Further Reading
• [1] RFC 1945: Hypertext Transfer Protocol
– HTTP/1.0.
http://www.rfc-editor.org/rfc/rfc1945.txt
Page 71