P1: IML/FFX P2: IML/FFX QC: IML/FFX T1: IML
Web ̇QOS WL040/Bidgoli-Vol III-Ch-58 July 16, 2003 9:36 Char Count= 0
712 WEBQUALITY OFSERVICEA significant body of literature has addressed the is-
sue of guaranteeing temporal QoS attributes in the ab-
sence of adequate prior knowledge of operating service
conditions such as load and resource capacity. Until re-
cently, the current state of the art in providing acceptable
temporal performance to the users has been over-design.
Throwing money and hardware at a performance prob-
lem eventually ensures that there are enough resources
to service all incoming requests sufficiently fast. This ap-
proach, however, is inadequate for several reasons. First, it
is rather expensive, because more resources are expended
than is strictly necessary. Second, it provides the same
service to all clients. In many cases, however, a service
provider might want to use performance differentiation
as a tool to entice clients to subscribe to a “better” (and
more expensive) service. Third, the server provides only a
best-effort service in that there are no bounds on worst-
case performance. It is sometimes advantageous to be able
to quantitatively state a performance guarantee for which
users can be commensurately charged.
In the following sections, we describe several ap-
proaches for QoS guarantees in more detail. We begin
by a brief review of the current Web architecture and the
underlying principles of Web server operation. We then
survey the modifications suggested to this architecture to
provide performance guarantees to Web clients.CURRENT WEB ARCHITECTURE
From an architectural standpoint, the World Wide Web is
a distributed client–server system glued together by the
hypertext transfer protocol (HTTP), which is simply a
request–reply interface that allows clients (browsers) to
download files from servers by (URL) name and allows
one page to reference another, creating a logical mesh
of links. The architecture is completely decentralized. By
creating links from existing content, new content is seam-
lessly integrated with the rest of the Web.The HTTP Protocol
The main exchange between clients and servers occurs
using the HTTP protocol. When a client requests a page
from a server only the text (HTML) portion is initially
downloaded. If the page contains images or other em-
bedded objects, the browser downloads them separately.
At present, two important versions of HTTP are popu-
lar, namely HTTP 1.0 and HTTP 1.1. The most quoted
difference of HTTP 1.1, and the primary motivation for
its existence (Mogul, 1995), is its support for persistent
connections. In HTTP 1.0, each browser request creates a
new TCP connection. Because most Web pages are short,
these connections are short-lived and are closed once the
requested page is downloaded. Unfortunately, TCP is
optimized for long data transfers. Each new TCP connec-
tion begins its life cycle with a connection set-up phase,
followed by a slow-start phase in which connection band-
width gradually ramps up from a low initial value to the
maximum bandwidth the network can support. Unfortu-
nately, short-lived connections, such as those of HTTP
1.0, are closed before reaching the maximum bandwidth.
Hence, transfers are slower than they need to be.Persistent connections in HTTP 1.1 avoid the above
problem by sending all browser requests on the same TCP
connection. The connection is reused as long as the bro-
wser is downloading additional objects from the same
server. This allows TCP to reach a higher connection band-
width. Additionally, the cost of setting up and tearing
down the TCP connection is amortized across multiple
transfers. The debate over which protocol is actually bet-
ter is stillgoing on. For example, a disadvantage of HTTP
1.1 is its reduced concurrency, because only one TCP con-
nection is used for all objects downloaded from the same
server, instead of multiple concurrent ones. Another prob-
lem with HTTP 1.1 is that the server, having received and
served a request from a client, does not know when to ter-
minate the underlying TCP connection. Ideally, the con-
nection must be kept alive in anticipation of subsequent
future requests. However, if the client does not intend to
send more requests, keeping the connection open only
wastes server resources. The present default is to wait for
a short period of time (around 30 s) after serving the last
request on a connection. If no additional requests arrive
during that period, the connection is closed. The problem
with this policy is that significant server resources can be
blocked waiting for future requests that may never arrive.
It is therefore not obvious that the bandwidth increase of
HTTP 1.1 outweighs its limitations.Caching and Content Distribution
To improve Web access delays and reduce backbone Web
traffic, caching and Web content distribution services
have gradually emerged. These services attempt to redis-
tribute content around the network backbone so that it is
closer to the clients who access it. The difference between
caching and content distribution lies in a data-pull versus
a data-push model. Whereas caches store content locally
in response to user requests, content distribution proxies
proactively get copies of the content in advance.
There are generally three types of caches, namely proxy
caches, client caches, and server caches. Proxy caches
are typically installed by the ISPs at the interface to
the network backbone. They intercept all Web requests
originating from the ISP’s clients and save copies of the
requested pages when replies are received from the con-
tacted servers. This process is called page caching. A
request to a page that is already cached can be served
directly from the proxy, thereby improving client-side
latency, reducing server load, and minimizing backbone
traffic for which the ISP is responsible to the backbone
provider. An important question is what to do when
the cache becomes full. To optimize its impact, a full cache
retains only the most recently requested URLs, replacing
those that have not been used the longest. This is known as
the least-recently-used replacement policy. Several varia-
tions and generalizations of this policy have been pro-
posed, e.g., to account for page size, cost of a page miss,
and the importance of the client.
To improve performance further, client browsers lo-
cally cache the most recently requested pages. This cache
is consulted when the page is revisited (e.g., when the
client pushes the “back” button), hence obviating an
extra access to the server. Finally, some server installations