Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

Appendix A Technical Lemmas


lemmaA.1 Leta > 0. Then:x≥ 2 alog(a) ⇒ x≥alog(x). It follows that a
necessary condition for the inequalityx < alog(x)to hold is thatx < 2 alog(a).

Proof First note that fora∈(0,


e] the inequalityx≥alog(x) holds uncon-
ditionally and therefore the claim is trivial. From now on, assume thata >


e.
Consider the functionf(x) =x−alog(x). The derivative isf′(x) = 1−a/x.
Thus, forx > athe derivative is positive and the function increases. In addition,

f(2alog(a)) = 2alog(a)−alog(2alog(a))
= 2alog(a)−alog(a)−alog(2 log(a))
=alog(a)−alog(2 log(a)).

Sincea−2 log(a)>0 for alla >0, the proof follows.

lemmaA.2 Leta≥ 1 andb > 0. Then:x≥ 4 alog(2a)+2b⇒ x≥alog(x)+b.

Proof It suffices to prove thatx≥ 4 alog(2a) + 2bimplies that bothx ≥
2 alog(x) andx≥ 2 b. Since we assumea ≥1 we clearly have thatx≥ 2 b.
In addition, sinceb >0 we have thatx≥ 4 alog(2a) which using Lemma A.1
implies thatx≥ 2 alog(x). This concludes our proof.

lemmaA.3 LetXbe a random variable andx′∈Rbe a scalar and assume
that there existsa > 0 such that for allt≥ 0 we haveP[|X−x′|> t]≤ 2 e−t

(^2) /a 2
.
Then,E[|X−x′|] ≤ 4 a.
Proof For alli= 0, 1 , 2 ,...denoteti=ai. Sincetiis monotonically increasing
we have thatE[|X−x′|] is at most


∑∞

i=1tiP[|X−x

′|> ti− 1 ]. Combining this
with the assumption in the lemma we get thatE[|X−x′|] ≤ 2 a

∑∞

i=1ie

−(i−1)^2.
The proof now follows from the inequalities
∑∞

i=1

ie−(i−1)

2

∑^5

i=1

ie−(i−1)

2
+

∫∞

5

xe−(x−1)

2
dx < 1 .8 + 10−^7 < 2.

lemmaA.4 LetXbe a random variable andx′∈Rbe a scalar and assume
that there existsa > 0 andb≥esuch that for allt≥ 0 we haveP[|X−x′|>
t]≤ 2 be−t

(^2) /a 2


. Then,E[|X−x′|] ≤ a(2 +



log(b)).

Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David
Published 2014 by Cambridge University Press.
Personal use only. Not for distribution. Do not post.
Please link tohttp://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning
Free download pdf