The Internet Encyclopedia (Volume 3)

(coco) #1

P1: IML/FFX P2: IML/FFX QC: IML/FFX T1: IML


Web ̇QOS WL040/Bidgoli-Vol III-Ch-58 July 16, 2003 9:36 Char Count= 0


716 WEBQUALITY OFSERVICE

choice between acceptance and rejection, it is more realis-
tic to choose between acceptance andbackground service.
In other words, for the purposes of the following discus-
sion, a rejected client is put in a separate lowest-priority
queue to be served if resources permit.
Admission control has received much attention in Web
QoS literature. Admission control algorithms may be clas-
sified into optimistic and pessimistic. The former type
may admit clients optimistically even when they may miss
their deadlines. The latter may reject them unnecessarily.
Note that absence of admission control can be thought
of as the extreme of optimistic admission control tests.
Recent results (Abdelzaher & Lu, 2001) have shown that
in a server with randomly arriving requests it is possible to
use a constant-time admission control test to distinguish
clients who will meet their deadlines from those who may
not. The test is based on a running counter, which main-
tains a utilization-like metric. The counter is updated by
a constant-time operation upon request arrivals and de-
partures. If a request arrives while the counter is below
a certain high-water mark, it is guaranteed to meet its
deadline. Otherwise, the deadline may be missed and the
request is served at the lowest priority level. Recent eval-
uation results of this admission control algorithm show
that it rarely errs in the sense of unnecessarily reject-
ing requests. The test is shown to improve revenue at
overload not only by eliminating resource consumption
wasted on requests that miss their deadlines but also by
favoring smaller requests (all other factors being equal)
and hence improving server throughput. The derivation
of the high-water mark representing the client admission
threshold assumes that clients are served in a priority or-
der such that more urgent requests are served first. A gen-
eralization of this test has later been proposed for FIFO
scheduling.
When priority-driven scheduling is used to meet dead-
lines, priority should be set proportional to urgency. There
are two ways urgency can be defined. In the first, clients
with shorter per-class response times are considered
more urgent. The resulting priority assignment is called
deadline-monotonic. In the second, urgency is defined
by absolute deadline. The resulting priority assignment is
called earliest-deadline-first (EDF). For example, consider
a request arriving at time 0 with a maximum response
time constraint of 10 s. At timet=9, a second request
arrives of a different class with a maximum response time
constraints of 2 s. According to the deadline-monotonic
priority assignment, the second request should receive
higher priority because its maximum response time con-
straint, 2, is tighter than 10. According to EDF the first
request should receive higher priority because its abso-
lute deadline is 0+ 10 =10, which is before the absolute
deadline of the second request, 9+ 2 = 11.
EDF has been proved to be the optimal priority-driven
scheduling policy. Deadline-monotonic scheduling is op-
timal among time-independent scheduling policies, i.e.,
those where priorities are assigned independent of ab-
solute request arrival times. Optimality, here, is defined
in terms of the ability to meet a larger number of dead-
lines. EDF can meet deadlines when deadline-monotonic
scheduling fails because it takes arrival times into ac-
count. For instance, in the above example, if the first

request has an execution time of 9.5 s, and the second
request has an execution time of 1.5 s, both requests
meet their deadlines under EDF, but not under deadline-
monotonic scheduling. A problem with EDF is that it is
less commonly implemented on standard operating sys-
tems, with the exception of embedded real-time operating
systems where application timing is of prime importance.
Deadline-monotonic scheduling is therefore a good com-
promise.

Statistical Delay Guarantees
An entirely different line of reasoning is to provide statisti-
cal guarantees on delays and deadline misses. Statistical
guarantees require queuing analysis of the Web server.
This analysis makes two types of assumptions. First, it
must make assumptions regarding the queuing structure
of the Web server. Second, it must make assumptions
regarding the request arrival process. In the following,
we outline the most important challenges underlying the
statistical approach to QoS guarantees.
Consider the first challenge, namely, deriving the queu-
ing structure of the Web server. This queuing structure
depends on the protocol used. We describe HTTP 1.0 for il-
lustration. Consider a typical multithreaded (or multipro-
cess) Web server. Packets arrive from the network, caus-
ing hardware and software interrupts that, respectively,
read the packets from network interface cards and deposit
them into a kernel-level input queue called theIP queue.In
between interrupts, packets in the IP queue are processed
by the kernel and queued for the particular application
ports for which they are destined. These ports are often
called sockets. Each socket is associated with an input
queue, called thelisten queue,where incoming connec-
tion requests are queued. Independently schedulable en-
tities in the Web server, calledworker threads,are blocked
for data to be deposited at the listen queue. These threads
are unblocked by request arrivals to execute the arriving
requests. Each thread implements a loop that processes
each incoming request and generates a response. Worker
threads that have been unblocked by request arrival be-
come runnable.
Multiple runnable threads may exist at a given time.
The order in which such threads get the CPU to execute a
request is determined by the CPU scheduling policy. This
policy maintains a priority queue called theready queue.
The thread at the top of this queue executes for a particu-
lar time quantum or until it is blocked. Request processing
by a worker thread typically entails access to one or more
auxiliary server resources, the most notable being disk I/O.
For example, in a Web server, disk I/O is usually needed
to read the requested Web page from disk. Access to aux-
iliary resources blocks the calling thread, at which time
it is queued for I/O until the awaited resource becomes
available. Each resource usually has a queue, which de-
termines the order in which accesses are served. We call it
theI/O queue. The resource is made available to the thread
at the top of the queue, at which time the correspond-
ing thread becomes runnable again and re-enters the
CPU ready queue. When request processing is done, the
worker thread sends a response back to the client. Sending
the response entails queuing data into the outgoing
packet queuefor transmission on the network.
Free download pdf