6.5.1. Proxy Uris Differ From Server Uris
6.5.1. Proxy Uris Differ From Server Uris
• In general, proxy servers should strive to be as tolerant as possible. They should not aim to be “protocol policemen” looking
to enforce strict protocol compliance, because this could involve significant disruption of previously functional services.
• But if the host isn’t found, many browsers attempt to provide some automatic “expansion” of hostnames, in case you typed
in a “shorthand” abbreviation of the host
6.5.7. URI Resolution Without a Proxy
Figure 6-16 shows an example of browser hostname auto-
expansion without a proxy. In steps 2a–3c, the browser looks
up variations of the hostname until a valid hostname is found.
Here’s what’s going on in this figure:
• In Step 1, the user types “oreilly” into the browser’s URI
window. The browser uses “oreilly” as the hostname and
assumes a default scheme of “http://”, a default port of
“80”, and a default path of “/”.
• In Step 2a, the browser looks up host “oreilly.” This fails.
• In Step 3a, the browser auto-expands the hostname and asks
the DNS to resolve “www.oreilly.com.” This is successful.
The browser then successfully connects to www.oreilly.com.
6.5.8. URI Resolution with an Explicit Proxy
When you use an explicit proxy the browser no longer performs any of these convenience
expansions, because the user’s URI is passed directly to the proxy.
• As shown in Figure 6-17, the browser does not auto-
expand the partial hostname when there is an explicit
proxy. As a result, when the user types “oreilly” into the
browser’s location window, the proxy is sent
“http://oreilly/” (the browser adds the default scheme
and path but leaves the hostname as entered).
• For this reason, some proxies attempt to mimic as much
as possible of the browser’s convenience services as
they can, including “www...com” auto-expansion and
addition of local domain suffixes.
6.5.9. URI Resolution with an Intercepting Proxy
• In Step 1, the user types “oreilly” into the browser’s URI location
window.
• In Step 2a, the browser looks up the host “oreilly” via DNS, but the
DNS server fails and responds that the host is unknown, as shown in
Step 2b.
• In Step 3a, the browser does auto-expansion, converting “oreilly”
into “www. oreilly.com.” In Step 3b, the browser looks up the host
“www.oreilly.com” via DNS. This time, as shown in Step 3c, the
DNS server is successful and returns IP addresses back to the
browser.
• In Step 4a, the client already has successfully resolved the hostname
and has a list of IP addresses.
• When the proxy finally is ready to interact with the real origin server
(Step 5b), the proxy may find that the IP address actually points to a
down server
6.6. Tracing Messages
Today, it’s not uncommon for web requests to go through a chain of two or more proxies on their way from the
client to the server (Figure 6-19).
6.6.1. The Via Header The Via header field lists information about each
intermediate node (proxy or gateway) through which a
message passes. Each time a message goes through another
node, the intermediate node must be added to the end of the
Via list.
The Via header field is used to track the forwarding of
messages, diagnose message loops, and identify the protocol
capabilities of all senders along the request/response chain
(Figure 6-20).
6.6.1.1. Via syntax
• The Via header field contains a comma-separated list of waypoints. Each waypoint represents an
individual proxy server or gateway hop and contains information about the protocol and address of
that intermediate node. Here is an example of a Via header with two waypoints:
Via = 1.1 cache.joes-hardware.com, 1.1 proxy.irenes-isp.net
• Note that each Via waypoint contains up to four components: an optional protocol name (defaults
to HTTP), a required protocol version, a required node name, and an optional descriptive comment
6.6.1.2. Via request and response paths
• Both request and response messages pass through proxies,
so both request and response messages have Via headers.
• Because requests and responses usually travel over the
same TCP connection, response messages travel backward
across the same path as the requests. If a request message
goes through proxies A, B, and C, the corresponding
response message travels through proxies C, B, then A. So,
the Via header for responses is almost always the reverse
of the Via header for requests (Figure 6-21)
6.6.1.3. Via and gateways
Some proxies provide gateway functionality to servers that speak non-HTTP protocols. The
Via header records these protocol conversions, so HTTP applications can be aware of
protocol capabilities and conversions along the proxy chain. Figure 6-22 shows an HTTP
client requesting an FTP URI through an HTTP/FTP gateway.
6.6.1.4. The Server and Via headers
• The Server response header field describes the software used by the origin server. Here are a few examples:
o Server: Apache/1.3.14 (Unix) PHP/4.0.4
o Server: Netscape-Enterprise/4.1
o Server: Microsoft-IIS/5.0
• If a response message is being forwarded through a proxy, make sure the proxy does not modify the Server header. The
Server header is meant for the origin server. Instead, the proxy should add a Via entry.
• There are some cases when we want don’t want exact hostnames in the Via string. In general, unless this behavior is
explicitly enabled, when a proxy server is part of a network firewall it should not forward the names and ports of hosts
behind the firewall, because knowledge of network architecture behind a firewall might be of use to a malicious party.
6.6.2. The TRACE Method
• Proxy servers can change messages as the messages are forwarded.
• HTTP/1.1’s TRACE method lets you trace a request message through a chain of proxies, observing what proxies the
message passes through and how each proxy modifies the request message. TRACE is very useful for debugging proxy
flows.
6.6.2.1 Max-Forwards