Understanding Machine Learning: From Theory to Algorithms

3.3 Summary 49

This loss function is used in regression problems.

We will later see more examples of useful instantiations of loss functions.

To summarize, we formally define agnostic PAC learnability for general loss functions. definition3.4 (Agnostic PAC Learnability for General Loss Functions) A hypothesis classHis agnostic PAC learnable with respect to a setZand a loss function`:H ×Z →R+, if there exist a functionmH: (0,1)^2 →N and a learning algorithm with the following property: For every,δ ∈(0,1) and for every distributionDoverZ, when running the learning algorithm on m≥mH(,δ) i.i.d. examples generated byD, the algorithm returnsh∈ H such that, with probability of at least 1−δ(over the choice of themtraining examples), LD(h)≤hmin′∈HLD(h′) +,

whereLD(h) =Ez∼D[`(h,z)].

Remark 3.1(A Note About Measurability*) In the aforementioned definition, for everyh∈H, we view the function`(h,·) :Z→R+as a random variable and defineLD(h) to be the expected value of this random variable. For that, we need to require that the function`(h,·) is measurable. Formally, we assume that there is aσ-algebra of subsets ofZ, over which the probabilityDis defined, and that the preimage of every initial segment inR+is in thisσ-algebra. In the specific case of binary classification with the 0−1 loss, theσ-algebra is overX ×{ 0 , 1 } and our assumption on`is equivalent to the assumption that for everyh, the set{(x,h(x)) :x∈X}is in theσ-algebra. Remark 3.2(Proper versus Representation-Independent Learning*) In the pre- ceding definition, we required that the algorithm will return a hypothesis from H. In some situations,His a subset of a setH′, and the loss function can be naturally extended to be a function fromH′×Zto the reals. In this case, we may allow the algorithm to return a hypothesish′∈ H′, as long as it satisfies the requirementLD(h′)≤minh∈HLD(h) +. Allowing the algorithm to output a hypothesis fromH′is calledrepresentation independentlearning, while proper learning occurs when the algorithm must output a hypothesis fromH. Represen- tation independent learning is sometimes called “improper learning,” although there is nothing improper in representation independent learning.

3.3 Summary

In this chapter we defined our main formal learning model – PAC learning. The basic model relies on the realizability assumption, while the agnostic variant does

Understanding Machine Learning: From Theory to Algorithms

3.3 Summary

Get our desktop app

Company

Features

Documentation

Resources