Mathematics for Computer Science

(avery) #1
19.6. Sums of Random Variables 807

So confidence levels refer to the results of estimation procedures for real-world
quantities. The phrase “confidence level” should be heard as a reminder that some
statistical procedure was used to obtain an estimate, and in judging the credibility
of the estimate, it may be important to learn just what this procedure was.

19.6 Sums of Random Variables


If all you know about a random variable is its mean and variance, then Cheby-
shev’s Theorem is the best you can do when it comes to bounding the probabil-
ity that the random variable deviates from its mean. In some cases, however, we
know more—for example, that the random variable has a binomial distribution—
and then it is possible to prove much stronger bounds. Instead of polynomially
small bounds such as1=c^2 , we can sometimes even obtain exponentially small
bounds such as1=ec. As we will soon discover, this is the case whenever the ran-
dom variableTis the sum ofnmutually independent random variablesT 1 ,T 2 ,... ,
Tnwhere 0 Ti 1. A random variable with a binomial distribution is just one
of many examples of such aT. Here is another.

19.6.1 A Motivating Example
Fussbook is a new social networking site oriented toward unpleasant people. Like
all major web services, Fussbook has a load balancing problem: it receives lots of
forum posts that computer servers have to process. If any server is assigned more
work than it can complete in a given interval, then it is overloaded and system
performance suffers. That would be bad, because Fussbook users arenota tolerant
bunch. So balancing the work load across mutliple servers is vital.
An early idea was to assign each server an alphabetic range of forum topics.
(“That oughta work!”, one programmer said.) But after the computer handling the
“privacy” and “preferred text editor” threads melted from overload, the drawback
of anad hocapproach was clear: it’s easy to miss something that will mess up your
plan.
If the length of every task were known in advance, then finding a balanced distri-
bution would be a kind of “bin packing” problem. Such problems are hard to solve
exactly, but approximation algorithms can come close. Unfortunately, in this case
task lengths are not known in advance, which is typical of workload problems in
the real world.
So the load balancing problem seems sort of hopeless, because there is no data
available to guide decisions. So the programmers of Fussbook gave up and just
Free download pdf