Understanding Machine Learning: From Theory to Algorithms

134 Boosting

thresholdθ. Therefore, instead of minimizing overθ∈Rwe can minimize over θ∈Θj. This already gives us an efficient procedure: Choosej∈[d] andθ∈Θjthat minimize the objective value of Equation (10.1). For everyjandθ∈Θj we have to calculate a sum overmexamples; therefore the runtime of this approach would beO(dm^2 ). We next show a simple trick that enables us to minimize the objective in timeO(dm). The observation is as follows. Suppose we have calculated the objective for θ∈(xi− 1 ,j,xi,j). LetF(θ) be the value of the objective. Then, when we consider θ′∈(xi,j,xi+1,j) we have that

F(θ′) =F(θ)−Di (^1) [yi=1]+Di (^1) [yi=−1]=F(θ)−yiDi.
Therefore, we can calculate the objective atθ′ in a constant time, given the
objective at the previous threshold,θ. It follows that after a preprocessing step
in which we sort the examples with respect to each coordinate, the minimization
problem can be performed in timeO(dm). This yields the following pseudocode.
ERM for Decision Stumps
input:
training setS= (x 1 ,y 1 ),...,(xm,ym)
distribution vectorD
goal:Findj?,θ?that solve Equation (10.1)
initialize:F?=∞
forj= 1,...,d
sortSusing thej’th coordinate, and denote
x 1 ,j≤x 2 ,j≤···≤xm,j≤xm+1,jdef=xm,j+ 1
F=

∑

i:yi=1Di ifF < F? F?=F,θ?=x 1 ,j−1,j?=j fori= 1,...,m F=F−yiDi ifF < F?andxi,j 6 =xi+1,j F?=F,θ?=^12 (xi,j+xi+1,j),j?=j outputj?,θ?

10.2 AdaBoost

AdaBoost (short for Adaptive Boosting) is an algorithm that has access to a weak learner and finds a hypothesis with a low empirical risk. The AdaBoost algorithm receives as input a training set of examplesS= (x 1 ,y 1 ),...,(xm,ym), where for eachi,yi=f(xi) for some labeling functionf. The boosting process proceeds in a sequence of consecutive rounds. At roundt, the booster first defines

Understanding Machine Learning: From Theory to Algorithms

∑

10.2 AdaBoost

Get our desktop app

Company

Features

Documentation

Resources