Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1

126 Linear Predictors


We will focus here on the class of one dimensional,n-degree, polynomial re-
gression predictors, namely,

Hnpoly={x7→p(x)},

wherepis a one dimensional polynomial of degreen, parameterized by a vector
of coefficients (a 0 ,...,an). Note thatX =R, since this is a one dimensional
polynomial, andY=R, as this is a regression problem.
One way to learn this class is by reduction to the problem of linear regression,
which we have already shown how to solve. To translate a polynomial regression
problem to a linear regression problem, we define the mappingψ:R→Rn+1
such thatψ(x) = (1,x,x^2 ,...,xn). Then we have that

p(ψ(x)) =a 0 +a 1 x+a 2 x^2 +···+anxn=〈a,ψ(x)〉

and we can find the optimal vector of coefficientsaby using the Least Squares
algorithm as shown earlier.

9.3 Logistic Regression


In logistic regression we learn a family of functionshfromRdto the interval [0,1].
However, logistic regression is used for classification tasks: We can interpreth(x)
as theprobabilitythat the label ofxis 1. The hypothesis class associated with
logistic regression is the composition of a sigmoid functionφsig:R→[0,1] over
the class of linear functionsLd. In particular, the sigmoid function used in logistic
regression is thelogistic function, defined as

φsig(z) =^1
1 + exp(−z)

. (9.9)

The name “sigmoid” means “S-shaped,” referring to the plot of this function,
shown in the figure:
Free download pdf