5.2 The inequalities of Markov and Chebyshev 73
is well behaved the inequality is rather loose. By assuming that higher moments of
Xexist, Chebyshev’s inequality can be improved by applying Markov’s inequality
to|μˆ−μ|kwith the positive integerkto be chosen so that the resulting bound is
optimized. This is a bit cumbersome and thus instead we present the continuous
analog of this, known as the Cramer-Chernoff method.
To calibrate our expectations on what gains to expect over Chebyshev’s
inequality, let us start by recalling the∑ central limit theorem. LetSn =
n
t=1(Xt−μ). The central limit theorem (CLT) says that under no additional
assumptions than the existence of the variance, the limiting distribution of
Sn/
√
σ^2 nas n → ∞is a Gaussian with mean zero and unit variance. If
Z∼N(0,1), then
P(Z≥u) =
∫∞
u
1
√
2 π
exp
(
−
x^2
2
)
dx.
The integral has no closed form solution, but is easy to bound:
∫∞
u
√^1
2 π
exp
(
−x
2
2
)
dx≤^1
u
√
2 π
∫∞
u
xexp
(
−x
2
2
)
dx
=
√
1
2 πu^2
exp
(
−
u^2
2
)
, (5.3)
which gives
P(ˆμ≥μ+ε) =P
(
Sn/
√
σ^2 n≥ε
√
n/σ^2
)
≈P
(
Z≥ε
√
n/σ^2
)
≤
√
σ^2
2 πnε^2
exp
(
−
nε^2
2 σ^2
)
. (5.4)
This is usually much smaller than what we obtained with Chebyshev’s inequality
(Exercise 5.3). In particular, the bound on the right-hand side of(5.4)decays
slightly faster than the negative exponential ofnε^2 /σ^2 , which means thatˆμ
rapidly concentrates around its mean.
An oft-taught rule of thumb is that the central limit theorem provides a
reasonable approximation forn≥30. We advise caution. Suppose that
X 1 ,...,Xnare independent Bernoulli with biasp= 1/n. Asntends to
infinity the distribution of
∑n
t=1Xtconverges to a Poisson distribution with
parameter 1, which does not look Gaussian at all.
The asymptotic nature of the central limit theorem makes it unsuitable for
designing bandit algorithms. In the next section we derive finite-time analogs,
which are only possible by making additional assumptions.