The Internet Encyclopedia (Volume 3)

P1: IML/FFX P2: IML/FFX QC: IML/FFX T1: IML

Web ̇QOS WL040/Bidgoli-Vol III-Ch-58 July 16, 2003 9:36 Char Count= 0

PERFORMANCEGUARANTEES INWEBSERVERS 719

therefore calledrejection cost. The rejection cost of a request can be more than half the cost of processing the request successfully. Hence, at overload, a significant por- tion of server capacity is wasted on request rejection. Note, in comparison, that a best-effort server, which does not need to classify requests, incurs a lower cost per failed request at overload. This is because when such a server gets overloaded, the socket queue overflows in the kernel. Subsequent requests fail to get enqueued in the listen queue and are dropped much earlier in the protocol stack, hence incurring a lower rejection cost. QoS-aware servers ensure that indiscriminate tail dropping does not occur. For example, a high-priority thread is often ded- icated to dequeuing the listen queue and classifying the requests, thereby increasing the cost of rejection. Mini- mizing rejection cost in QoS-aware servers with complex request classification policies is an important research topic.

Consistent Prioritization Many guarantee types, such as absolute delay guarantees, usually rely on priority-driven scheduling. Prioritization imposes a significant challenge in most mainstream operating systems. To be effective, all resource queues should beidenticallyprioritized. Unfortunately, CPU priorities, which can be set explicitly in many operating systems, control only the order of the ready queue. It has been shown in recent studies that this queue is often not the bottleneck. In a previous section, we have identified at least five resource queues involved in a Web server. In many cases, the largest queue in the server is the listen queue on the server’s well-known port. This queue is maintained in the TCP layer and is handled in FIFO order. Correct prioritization would imply prioritizing the socket listen queues as well. In I/O intensive servers, such as those serv- ing dynamically generated content, the I/O queue may be the bottleneck. Hence, disk access should be prioritized. Moreover, in a server implementing data structures protected by semaphores, it must be ensured that processes queued on a semaphore are awakened in consistent priority order. Communicating priority information among multiple resources is a nontrivial undertaking. Proper operating system support must exist for priority inheritance across different resources. This support is complicated by the fact that blocking over nonpreemptive resources may cause involuntary priority inversion. The classical example of that is the case of two requests, A and B, where A is of higher priority. Let request B arrive first at some server and be blocked on a nonpreemptive resource such as a shared data structure protected by a semaphore. Request A arrives later and is blocked waiting for B to release the lock. Meanwhile, the progress of B may be interrupted by an arbitrary number of requests of intermediate priority. In this scenario, A is forced to wait for an arbitrary number of lower priority requests, even when all resource queues (including the semaphore queue) are correctly prioritized. The problem may be solved by thepriority ceiling protocoldeveloped at CMU, which bounds priority inversion. Unfortunately, current mainstream operating systems neither enforce resource priorities nor implement mechanisms for bounding priority inversion, such as the

priority ceiling protocol. Thus, the current state of deploy- ment is far from adequate for the purposes of implementing priority-based QoS support on Web server platforms.

Automated Profiling and Capacity Planning In many cases, providing QoS guarantees requires devel- oping a service execution model that describes server capacity in units of contracted work. This problem is gen- erally called capacity planning. For example, a content provider may wish to make an agreement with a hosting server to host the business Web site of the former. The content provider may agree to pay for an expected client access rate of 100 requests/second on static content of an average size of 10 KB/request. The host contractually agrees to serve that rate. The problem of the host is to de- termine how much server capacity should be allocated to this site so that the contractual service obligations are met. This, in turn, requires knowing the execution overhead per request received and per byte sent of the response. A common approximation of service time of a request for static content istime=A+Bx,wheretimeis the service time,Ais a fixed per-request overhead associated with protocol processing,xis the size of the response, andBis the overhead per unit of response data sent. The problem with computing such execution overheads is that they depend on the hardware and software of the underlying platform. Thus, they need to be recom- puted upon every platform or software upgrade. The cost of recomputing these parameters may be excessive. Fortu- nately, it can be reduced using automated profiling middleware. Automated profiling middleware transparently instruments the server to measure various overheads dur- ing normal operation. These overheads are then corre- lated with measured load (such as the measured request rate and response bandwidth) to yield the best value of execution parametersAandB. Least squares estimation is a particularly useful tool to perform such correlation. Automated profiling eliminates manual profiling costs, hence making it feasible to do accurate capacity planning in QoS-aware Web services. Techniques for accurate and robust automated profiling are currently under investiga- tion.

QoS Adaptation The forgoing discussion focused on controlling load to provide time-related guarantees. The underlying assump- tion is that service must be provided by the deadline. There are no intermediate compromises. In the following, we present a case for QoS adaptation algorithms, which can negotiate intermediate performance levels within a predefined range deemed acceptable by the user. We de- scribe mechanisms that implement adaptation in Web servers.

The Case for QoS Adaptation Most QoS-sensitive applications have a certain degree of flexibility in terms of resource requirements. For example, JPEG images can be adapted to bandwidth limitations by lossy compression or resolution reduction. Dynamically generated pages can be replaced by approximate static

The Internet Encyclopedia (Volume 3)

Get our desktop app

Company

Features

Documentation

Resources