Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

134 Boosting


thresholdθ. Therefore, instead of minimizing overθ∈Rwe can minimize over
θ∈Θj.
This already gives us an efficient procedure: Choosej∈[d] andθ∈Θjthat
minimize the objective value of Equation (10.1). For everyjandθ∈Θj we
have to calculate a sum overmexamples; therefore the runtime of this approach
would beO(dm^2 ). We next show a simple trick that enables us to minimize the
objective in timeO(dm).
The observation is as follows. Suppose we have calculated the objective for
θ∈(xi− 1 ,j,xi,j). LetF(θ) be the value of the objective. Then, when we consider
θ′∈(xi,j,xi+1,j) we have that

F(θ′) =F(θ)−Di (^1) [yi=1]+Di (^1) [yi=−1]=F(θ)−yiDi.
Therefore, we can calculate the objective atθ′ in a constant time, given the
objective at the previous threshold,θ. It follows that after a preprocessing step
in which we sort the examples with respect to each coordinate, the minimization
problem can be performed in timeO(dm). This yields the following pseudocode.
ERM for Decision Stumps
input:
training setS= (x 1 ,y 1 ),...,(xm,ym)
distribution vectorD
goal:Findj?,θ?that solve Equation (10.1)
initialize:F?=∞
forj= 1,...,d
sortSusing thej’th coordinate, and denote
x 1 ,j≤x 2 ,j≤···≤xm,j≤xm+1,jdef=xm,j+ 1
F=



i:yi=1Di
ifF < F?
F?=F,θ?=x 1 ,j−1,j?=j
fori= 1,...,m
F=F−yiDi
ifF < F?andxi,j 6 =xi+1,j
F?=F,θ?=^12 (xi,j+xi+1,j),j?=j
outputj?,θ?

10.2 AdaBoost


AdaBoost (short for Adaptive Boosting) is an algorithm that has access to a
weak learner and finds a hypothesis with a low empirical risk. The AdaBoost
algorithm receives as input a training set of examplesS= (x 1 ,y 1 ),...,(xm,ym),
where for eachi,yi=f(xi) for some labeling functionf. The boosting process
proceeds in a sequence of consecutive rounds. At roundt, the booster first defines
Free download pdf