Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

Appendix B Measure Concentration


LetZ 1 ,...,Zmbe an i.i.d. sequence of random variables and letμbe their mean.
The strong law of large numbers states that whenmtends to infinity, the em-
pirical average,m^1

∑m
i=1Zi, converges to the expected valueμ, with probability


  1. Measure concentration inequalities quantify the deviation of the empirical
    average from the expectation whenmis finite.


B.1 Markov’s Inequality


We start with an inequality which is called Markov’s inequality. LetZbe a
nonnegative random variable. The expectation ofZcan be written as follows:

E[Z] =

∫∞

x=0

P[Z≥x]dx. (B.1)

SinceP[Z≥x] is monotonically nonincreasing we obtain

∀a≥ 0 , E[Z]≥

∫a

x=0

P[Z≥x]dx≥

∫a

x=0

P[Z≥a]dx=aP[Z≥a]. (B.2)

Rearranging the inequality yields Markov’s inequality:

∀a≥ 0 , P[Z≥a]≤

E[Z]

a

. (B.3)

For random variables that take value in [0,1], we can derive from Markov’s
inequality the following.
lemmaB.1 LetZbe a random variable that takes values in[0,1]. Assume that
E[Z] =μ. Then, for anya∈(0,1),

P[Z > 1 −a]≥

μ−(1−a)
a

.

This also implies that for everya∈(0,1),

P[Z > a]≥

μ−a
1 −a

≥μ−a.

Proof LetY= 1−Z. ThenY is a nonnegative random variable withE[Y] =
1 −E[Z] = 1−μ. Applying Markov’s inequality onY we obtain

P[Z≤ 1 −a] =P[1−Z≥a] =P[Y≥a]≤

E[Y]

a

=

1 −μ
a

.

Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David
Published 2014 by Cambridge University Press.
Personal use only. Not for distribution. Do not post.
Please link tohttp://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning
Free download pdf