Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
B.2 Chebyshev’s Inequality 423

Therefore,

P[Z > 1 −a]≥ 1 −

1 −μ
a =

a+μ− 1
a.

B.2 Chebyshev’s Inequality


Applying Markov’s inequality on the random variable (Z−E[Z])^2 we obtain
Chebyshev’s inequality:

∀a > 0 , P[|Z−E[Z]|≥a] =P[(Z−E[Z])^2 ≥a^2 ]≤Var[Z]
a^2

, (B.4)

where Var[Z] =E[(Z−E[Z])^2 ] is the variance ofZ.
Consider the random variablem^1

∑m
i=1Zi. SinceZ^1 ,...,Zmare i.i.d. it is easy
to verify that

Var

[

1

m

∑m

i=1

Zi

]

= Var[Z^1 ]
m

.

Applying Chebyshev’s inequality, we obtain the following:

lemmaB.2 LetZ 1 ,...,Zmbe a sequence of i.i.d. random variables and assume
thatE[Z 1 ] =μandVar[Z 1 ]≤ 1. Then, for anyδ∈(0,1), with probability of at
least 1 −δwe have
∣∣
∣∣

1

m

∑m

i=1

Zi−μ

∣∣

∣∣




1

δ m

.

Proof Applying Chebyshev’s inequality we obtain that for alla > 0

P

[∣∣

∣∣


1

m

∑m

i=1

Zi−μ

∣∣

∣∣


> a

]


Var[Z 1 ]
ma^2


1

ma^2

.

The proof follows by denoting the right-hand sideδand solving fora.

The deviation between the empirical average and the mean given previously
decreases polynomially withm. It is possible to obtain a significantly faster
decrease. In the sections that follow we derive bounds that decrease exponentially
fast.

B.3 Chernoff’s Bounds


LetZ 1 ,...,Zmbe independent Bernoulli variables where for everyi,P[Zi= 1] =
piandP[Zi= 0] = 1−pi. Letp=

∑m
i=1piand letZ=

∑m
i=1Zi. Using the
Free download pdf