Understanding Machine Learning: From Theory to Algorithms

Technical Lemmas 421 Using Stirling’s approximation we further have that ≤ (em d )d( 1 + ( d e )d (m−d) (d+ 1) √ 2 πd(d/e)d ) = ...

Appendix B Measure Concentration LetZ 1 ,...,Zmbe an i.i.d. sequence of random variables and letμbe their mean. The strong law o ...

B.2 Chebyshev’s Inequality 423 Therefore, P[Z > 1 −a]≥ 1 − 1 −μ a = a+μ− 1 a. B.2 Chebyshev’s Inequality Applying Markov’s in ...

424 Measure Concentration monotonicity of the exponent function and Markov’s inequality, we have that for everyt > 0 P[Z > ...

B.4 Hoeffding’s Inequality 425 and, E[e−tZ] =E[e−t ∑ iZi] =E[ ∏ i e−tZi] = ∏ i E[e−tZi] by independence = ∏ i ( 1 +pi(e−t−1) ) ≤ ...

426 Measure Concentration Therefore, P[X ̄≥]≤e−λ ∏ i e λ^2 (b−a)^2 8 m^2 =e−λ+λ (^2) (b−a) 2 8 m. Settingλ= 4m/(b−a)^2 we ob ...

B.5 Bennet’s and Bernstein’s Inequalities 427 Then for all > 0 , P [m ∑ i=1 Zi> ] ≤e−mσ (^2) h(mσ 2 ) . where h(a) = ( ...

428 Measure Concentration Solving fortyields t^2 / 2 mLD(h) +t/ 3 = log(1/δ) ⇒ t^2 / 2 − log(1/δ) 3 t−log(1/δ)mLD(h) = 0 ⇒ t= lo ...

B.7 Concentration ofχ^2 Variables 429 Finally, for all∈(0,3), P[(1−)k≤Z≤(1 +)k]≥ 1 − 2 e− (^2) k/ 6 . Proof Let us writeZ= ∑ ...

Appendix C Linear Algebra C.1 Basic Definitions In this chapter we only deal with linear algebra over finite dimensional Euclide ...

C.2 Eigenvalues and Eigenvectors 431 C.2 Eigenvalues and Eigenvectors LetA∈Rd,dbe a matrix. A non-zero vectoruis an eigenvector ...

432 Linear Algebra lemmaC.3 LetA∈Rm,nbe a matrix of rankr. Assume thatv 1 ,...,vris an orthonormal set of right singular vectors ...

C.4 Singular Value Decomposition (SVD) 433 Finally, we show that ifA has rankr then it hasr orthonormal singular vectors. lemmaC ...

434 Linear Algebra corollaryC.6 (The SVD theorem) LetA∈Rm,nwith rankr. ThenA= UDV>whereDis anr×rmatrix with nonzero singular ...

Notes Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David Published 2014 by Cambridge University Pre ...

References Abernethy, J., Bartlett, P. L., Rakhlin, A. & Tewari, A. (2008), Optimal strategies and minimax lower bounds for ...

438 References Bartlett, P. L. & Mendelson, S. (2002), ‘Rademacher and Gaussian complexities: Risk bounds and structural res ...

References 439 Breiman, L. (1996), Bias, variance, and arcing classifiers, Technical Report 460, Statis- tics Department, Univer ...

440 References Dietterich, T. G. & Bakiri, G. (1995), ‘Solving multiclass learning problems via error- correcting output cod ...

«
14
15
16
17
18
19
20
21
22
23
»

Free download pdf

Get our desktop app

Company

Features

Documentation

Resources