30 Chapter 2:Descriptive Statistics
Proof
Letyi=xi− ̄x,i=1,...,n. For anyb>0, we have that
∑n
i= 1
(yi+b)^2 ≥
∑
i:yi≥ks
(yi+b)^2
≥
∑
i:yi≥ks
(ks+b)^2
=N(k)(ks+b)^2 (2.4.1)
where the first inequality follows because (yi+b)^2 ≥0, and the second because bothks
andbare positive. However,
∑n
i= 1
(yi+b)^2 =
∑n
i= 1
(yi^2 + 2 byi+b^2 )
=
∑n
i= 1
yi^2 + 2 b
∑n
i= 1
yi+nb^2
=(n−1)s^2 +nb^2
where the final equation used that
∑n
i= 1 yi =
∑n
i= 1 (xi− ̄x)=
∑n
i= 1 xi−nx ̄ =0.
Therefore, we obtain from Equation (2.4.1) that
N(k)≤
(n−1)s^2 +nb^2
(ks+b)^2
implying that
N(k)
n
≤
s^2 +b^2
(ks+b)^2
Because the preceding is valid for allb>0, we can setb =s/k(which is the value ofb
that minimizes the right-hand side of the preceding) to obtain that
N(k)
n
≤
s^2 +s^2 /k^2
(ks+s/k)^2
Multiplying the numerator and the denominator of the right side of the preceding byk^2 /s^2
gives
N(k)
n
≤
k^2 + 1
(k^2 +1)^2
=
1
k^2 + 1
and the result is proven. Thus, for instance, where the usual Chebyshev inequality shows
that at most 25 percent of data values are at least 2 standard deviations greater than
the sample mean, the one-sided Chebyshev inequality lowers the bound to “at most
20 percent.” ■