The Internet Encyclopedia (Volume 3)

(coco) #1

P1: IML/FFX P2: IML/FFX QC: IML/FFX T1: IML


Web ̇QOS WL040/Bidgoli-Vol III-Ch-58 July 16, 2003 9:36 Char Count= 0


PERFORMANCEGUARANTEES INWEBSERVERS 719

therefore calledrejection cost. The rejection cost of a re-
quest can be more than half the cost of processing the
request successfully. Hence, at overload, a significant por-
tion of server capacity is wasted on request rejection.
Note, in comparison, that a best-effort server, which
does not need to classify requests, incurs a lower cost per
failed request at overload. This is because when such a
server gets overloaded, the socket queue overflows in the
kernel. Subsequent requests fail to get enqueued in the
listen queue and are dropped much earlier in the protocol
stack, hence incurring a lower rejection cost. QoS-aware
servers ensure that indiscriminate tail dropping does not
occur. For example, a high-priority thread is often ded-
icated to dequeuing the listen queue and classifying the
requests, thereby increasing the cost of rejection. Mini-
mizing rejection cost in QoS-aware servers with complex
request classification policies is an important research
topic.

Consistent Prioritization
Many guarantee types, such as absolute delay guarantees,
usually rely on priority-driven scheduling. Prioritization
imposes a significant challenge in most mainstream oper-
ating systems. To be effective, all resource queues should
beidenticallyprioritized. Unfortunately, CPU priorities,
which can be set explicitly in many operating systems,
control only the order of the ready queue. It has been
shown in recent studies that this queue is often not the bot-
tleneck. In a previous section, we have identified at least
five resource queues involved in a Web server. In many
cases, the largest queue in the server is the listen queue
on the server’s well-known port. This queue is maintained
in the TCP layer and is handled in FIFO order. Correct
prioritization would imply prioritizing the socket listen
queues as well. In I/O intensive servers, such as those serv-
ing dynamically generated content, the I/O queue may be
the bottleneck. Hence, disk access should be prioritized.
Moreover, in a server implementing data structures pro-
tected by semaphores, it must be ensured that processes
queued on a semaphore are awakened in consistent pri-
ority order.
Communicating priority information among multiple
resources is a nontrivial undertaking. Proper operating
system support must exist for priority inheritance across
different resources. This support is complicated by the
fact that blocking over nonpreemptive resources may
cause involuntary priority inversion. The classical exam-
ple of that is the case of two requests, A and B, where A is
of higher priority. Let request B arrive first at some server
and be blocked on a nonpreemptive resource such as a
shared data structure protected by a semaphore. Request
A arrives later and is blocked waiting for B to release the
lock. Meanwhile, the progress of B may be interrupted
by an arbitrary number of requests of intermediate pri-
ority. In this scenario, A is forced to wait for an arbitrary
number of lower priority requests, even when all resource
queues (including the semaphore queue) are correctly pri-
oritized. The problem may be solved by thepriority ceiling
protocoldeveloped at CMU, which bounds priority inver-
sion. Unfortunately, current mainstream operating sys-
tems neither enforce resource priorities nor implement
mechanisms for bounding priority inversion, such as the

priority ceiling protocol. Thus, the current state of deploy-
ment is far from adequate for the purposes of implement-
ing priority-based QoS support on Web server platforms.

Automated Profiling and Capacity Planning
In many cases, providing QoS guarantees requires devel-
oping a service execution model that describes server ca-
pacity in units of contracted work. This problem is gen-
erally called capacity planning. For example, a content
provider may wish to make an agreement with a hosting
server to host the business Web site of the former. The
content provider may agree to pay for an expected client
access rate of 100 requests/second on static content of
an average size of 10 KB/request. The host contractually
agrees to serve that rate. The problem of the host is to de-
termine how much server capacity should be allocated to
this site so that the contractual service obligations are met.
This, in turn, requires knowing the execution overhead
per request received and per byte sent of the response. A
common approximation of service time of a request for
static content istime=A+Bx,wheretimeis the service
time,Ais a fixed per-request overhead associated with
protocol processing,xis the size of the response, andBis
the overhead per unit of response data sent.
The problem with computing such execution over-
heads is that they depend on the hardware and software
of the underlying platform. Thus, they need to be recom-
puted upon every platform or software upgrade. The cost
of recomputing these parameters may be excessive. Fortu-
nately, it can be reduced using automated profiling mid-
dleware. Automated profiling middleware transparently
instruments the server to measure various overheads dur-
ing normal operation. These overheads are then corre-
lated with measured load (such as the measured request
rate and response bandwidth) to yield the best value of
execution parametersAandB. Least squares estimation
is a particularly useful tool to perform such correlation.
Automated profiling eliminates manual profiling costs,
hence making it feasible to do accurate capacity planning
in QoS-aware Web services. Techniques for accurate and
robust automated profiling are currently under investiga-
tion.

QoS Adaptation
The forgoing discussion focused on controlling load to
provide time-related guarantees. The underlying assump-
tion is that service must be provided by the deadline.
There are no intermediate compromises. In the following,
we present a case for QoS adaptation algorithms, which
can negotiate intermediate performance levels within a
predefined range deemed acceptable by the user. We de-
scribe mechanisms that implement adaptation in Web
servers.

The Case for QoS Adaptation
Most QoS-sensitive applications have a certain degree of
flexibility in terms of resource requirements. For example,
JPEG images can be adapted to bandwidth limitations by
lossy compression or resolution reduction. Dynamically
generated pages can be replaced by approximate static
Free download pdf