P1: IML/FFX P2: IML/FFX QC: IML/FFX T1: IML
Web ̇QOS WL040/Bidgoli-Vol III-Ch-58 July 16, 2003 9:36 Char Count= 0
PERFORMANCEGUARANTEES INWEBSERVERS 717The above discussion identifies five different queues in-
volved in the Web server’s queuing structure: namely, the
IP queue, the listen queue, the ready queue, the I/O queue
(assuming a single I/O resource, e.g., disk), and the out-
going packet queue. The interconnection of these queues
creates a queuing network with loops and other depen-
dencies. For example, threads that repeatedly read and
process disk data essentially loop between the ready queue
and the I/O queue. Moreover, when the number of threads
is fixed, dequeuing from the listen queue is synchronized
with progress in the ready queue in that new requests
are dequeued from the former only when some runnable
thread has finished serving a request or has blocked.
These factors make accurate analysis of the server’s
queuing structure difficult. Instead, many approxima-
tions are possible. For example, it is possible to consider
only the queue for the bottleneck resource. The general
idea is that a Web request is likely to spend most of its
time waiting in the bottleneck queue.
The second challenge in providing statistical guaran-
tees in Web servers is to identify the stochastic nature of
the arrival process. Many queuing theory results assume
a continuous Poisson arrival process. This process is char-
acterized by an exponential distribution of interarrival
times. This assumption does not hold for Web traffic.
Research on Web characterization has identified that ar-
rival of Web requests is generally modeled by a heavy
tailed distribution (Crovella & Bestavros, 1997). One dis-
tribution that is commonly used to model request interar-
rival times and request execution times is the Pareto dis-
tribution. Breslau, Cao, Fan, Phillips, and Shenker (1999)
also determined that URL popularity follows a Zipf-like
distribution. This information is important for studies of
Web performance because it helps quantify the effects of
caching. To experiment with Web performance in labo-
ratory testbeds, Barford and Crovella (1998) developed a
synthetic Web workload generator that faithfully repro-
duces the characteristics of realistic Web traffic, includ-
ing its heavy-tailed distribution, URL popularity, and ref-
erence locality characteristics. The problem of providing
queuing-theoretic performance predictions for such Web
traffic arriving at a server modeled by the queuing struc-
ture outlined above is still an open research topic.Relative Guarantees
From queuing theory we know that delay experienced by
a request is a function of server load. Ultimately, the only
way one can reduce delay to meet deadline guarantees is
to keep the load low enough. This implies denying ser-
vice to some requests under high load to improve perfor-
mance. In many cases, however, it is preferred thatall
clients receive service when capacity permits. One QoS-
provisioning paradigm that subscribes to this model is
proportional relative differentiated services.
Proportional relative differentiation was first proposed
in the networking community in the context of delay dif-
ferentiation in routers. It was since extended to server end
systems. In the relative differentiated services model, it is
desired that the ratio between the performance levels of
different classes of traffic be fixed; e.g., it may be that the
delays of two traffic classes in a network router should
be fixed at a ratio of 3:1. In general, if there are multipletraffic classes in the system, and ifHiis the measured per-
formance of classi, the relative guarantee specifies that
H 1 :H 2 :...:Hn=C 1 :C 2 :...:Cn, whereCiis the weight
of classi. Hence, only relative delay between any pair of
classes is specified. The absolute delay can take any value.
When the system approaches overload, the delay of all
classes increases, although some classes see a larger delay
increase than others in accordance with the relative de-
lay guarantee. At present, mechanisms for providing rela-
tive delay differentiation are well understood, but not yet
deployed.
Relative delay guarantees make most sense under mod-
erate server load. When the load is light, all classes see
identically good service (no delay). When the load is very
high, all classes suffer unacceptable delays and timeouts.
Hence, it is often useful to combine relative and absolute
delay guarantees in a single framework. The combined
architecture bounds the maximum delay of a class in ad-
dition to providing the correct delay ratio when the bound
has not been reached. The architecture allows specifying
a partial order on absolute and relative time constraints
that determines which constraints should be relaxed first
when network conditions makes the combination unreal-
izable.Convergence Guarantees
In an environment where unpredictable traffic conditions
make it impossible to satisfy absolute constraints, an
alternative type of performance guarantees has recently
been defined. This guarantee views QoS provisioning as a
convergence problem and employs control theory to en-
sure stability and timeliness of convergence of system per-
formance to the right specification (Abdelzaher, Shin, &
Bhatti, 2002). The statement of the guarantee is that a
performance metric,R, will converge within a specified
exponentially decaying envelope to a fixed value, called
theset point,and that the maximum deviation from the
set point will be bounded at all times.
The absolute convergence guarantee is translated into
a control loop such as those used in industrial control
plants. The loop samples the measured performance met-
ric (e.g., delay), compares it to the set point, and uses
the difference to induce changes in resource allocation
and load. Typically, it performs admission control to re-
duce load, and reallocates resources to alleviate the bot-
tlenecks.
The use of control theory to control the performance of
software processes has gained much popularity in recent
years. Traditionally, control theory was used to model and
control industrial processes described by difference equa-
tions. The intuitive reason that such a theory would be
applicable to server performance control is that input load
on a server resembles input flow into a water tank. The fill
level of the server queue resembles the tank fill level. Ad-
mission control resembles a valve on the input flow pipe.
Hence, the delay dynamics of a Web server are similar to
flow and level control dynamics in a physical plant. The
latter are well understood and can be described by dif-
ference equations. The second reason that control theory
is becoming popular for server control is that feedback
control loops are very robust to modeling errors. Hence,
accurate models of software dynamics are not needed.