Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
3.3 Summary 49

This loss function is used in regression problems.

We will later see more examples of useful instantiations of loss functions.

To summarize, we formally define agnostic PAC learnability for general loss
functions.
definition3.4 (Agnostic PAC Learnability for General Loss Functions) A
hypothesis classHis agnostic PAC learnable with respect to a setZand a
loss function`:H ×Z →R+, if there exist a functionmH: (0,1)^2 →N
and a learning algorithm with the following property: For every,δ ∈(0,1)
and for every distributionDoverZ, when running the learning algorithm on
m≥mH(,δ) i.i.d. examples generated byD, the algorithm returnsh∈ H
such that, with probability of at least 1−δ(over the choice of themtraining
examples),
LD(h)≤hmin′∈HLD(h′) +,

whereLD(h) =Ez∼D[`(h,z)].

Remark 3.1(A Note About Measurability*) In the aforementioned definition,
for everyh∈H, we view the function`(h,·) :Z→R+as a random variable and
defineLD(h) to be the expected value of this random variable. For that, we need
to require that the function`(h,·) is measurable. Formally, we assume that there
is aσ-algebra of subsets ofZ, over which the probabilityDis defined, and that
the preimage of every initial segment inR+is in thisσ-algebra. In the specific
case of binary classification with the 0−1 loss, theσ-algebra is overX ×{ 0 , 1 }
and our assumption on`is equivalent to the assumption that for everyh, the
set{(x,h(x)) :x∈X}is in theσ-algebra.
Remark 3.2(Proper versus Representation-Independent Learning*) In the pre-
ceding definition, we required that the algorithm will return a hypothesis from
H. In some situations,His a subset of a setH′, and the loss function can be
naturally extended to be a function fromH′×Zto the reals. In this case, we
may allow the algorithm to return a hypothesish′∈ H′, as long as it satisfies
the requirementLD(h′)≤minh∈HLD(h) +. Allowing the algorithm to output
a hypothesis fromH′is calledrepresentation independentlearning, while proper
learning occurs when the algorithm must output a hypothesis fromH. Represen-
tation independent learning is sometimes called “improper learning,” although
there is nothing improper in representation independent learning.

3.3 Summary


In this chapter we defined our main formal learning model – PAC learning. The
basic model relies on the realizability assumption, while the agnostic variant does
Free download pdf