Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

28 Introduction


Table 1.1Summary of notation
symbol meaning
R the set of real numbers
Rd the set ofd-dimensional vectors overR
R+ the set of non-negative real numbers
N the set of natural numbers
O,o,Θ,ω,Ω,O ̃ asymptotic notation (see text)

(^1) [Boolean expression] indicator function (equals 1 if expression is true and 0 o.w.)
[a]+ = max{ 0 ,a}
[n] the set{ 1 ,...,n}(forn∈N)
x,v,w (column) vectors
xi,vi,wi theith element of a vector
〈x,v〉 =∑di=1xivi(inner product)
‖x‖ 2 or‖x‖ =

〈x,x〉(the2 norm ofx) ‖x‖ 1 = ∑d i=1|xi|(the^1 norm ofx)
‖x‖∞ = maxi|xi|(the∞norm ofx) ‖x‖ 0 the number of nonzero elements ofx A∈Rd,k ad×kmatrix overR A> the transpose ofA Ai,j the (i,j) element ofA xx> thed×dmatrixAs.t.Ai,j=xixj(wherex∈Rd) x 1 ,...,xm a sequence ofmvectors xi,j thejth element of theith vector in the sequence w(1),...,w(T) the values of a vectorwduring an iterative algorithm w(it) theith element of the vectorw(t) X instances domain (a set) Y labels domain (a set) Z examples domain (a set) H hypothesis class (a set) :H×Z→R+ loss function
D a distribution over some set (usually overZor overX)
D(A) the probability of a setA⊆Zaccording toD
z∼D samplingzaccording toD
S=z 1 ,...,zm a sequence ofmexamples
S∼Dm samplingS=z 1 ,...,zmi.i.d. according toD
P,E probability and expectation of a random variable
Pz∼D[f(z)] =D({z:f(z) =true}) forf:Z→{true,false}
Ez∼D[f(z)] expectation of the random variablef:Z→R
N(μ,C) Gaussian distribution with expectationμand covarianceC
f′(x) the derivative of a functionf:R→Ratx
f′′(x) the second derivative of a functionf:R→Ratx
∂f(w)
∂wi the partial derivative of a functionf:R
d→Ratww.r.t.wi
∇f(w) the gradient of a functionf:Rd→Ratw
∂f(w) the differential set of a functionf:Rd→Ratw
minx∈Cf(x) = min{f(x) :x∈C}(minimal value offoverC)
maxx∈Cf(x) = max{f(x) :x∈C}(maximal value offoverC)
argminx∈Cf(x) the set{x∈C:f(x) = minz∈Cf(z)}
argmaxx∈Cf(x) the set{x∈C:f(x) = maxz∈Cf(z)}
log the natural logarithm

Free download pdf