Understanding Machine Learning: From Theory to Algorithms

31 PAC-Bayes

The Minimum Description Length (MDL) and Occam’s razor principles allow a potentially very large hypothesis class but define a hierarchy over hypotheses and prefer to choose hypotheses that appear higher in the hierarchy. In this chapter we describe the PAC-Bayesian approach that further generalizes this idea. In the PAC-Bayesian approach, one expresses the prior knowledge by defining prior distribution over the hypothesis class.

31.1 PAC-Bayes Bounds

As in the MDL paradigm, we define a hierarchy over hypotheses in our classH. Now, the hierarchy takes the form of a prior distribution overH. That is, we assign a probability (or density ifHis continuous)P(h)≥0 for eachh∈ H and refer toP(h) as the prior score ofh. Following the Bayesian reasoning approach, the output of the learning algorithm is not necessarily a single hypothesis. Instead, the learning process defines a posterior probability overH, which we denote byQ. In the context of a supervised learning problem, where Hcontains functions fromXtoY, one can think ofQas defining a randomized prediction rule as follows. Whenever we get a new instancex, we randomly pick a hypothesish∈Haccording toQand predicth(x). We define the loss ofQon an examplezto be `(Q,z)def=h∼EQ[`(h,z)].

By the linearity of expectation, the generalization loss and training loss ofQcan be written as

LD(Q)def=h∼EQ[LD(h)] and LS(Q)def=h∼EQ[LS(h)].

The following theorem tells us that the difference between the generalization loss and the empirical loss of a posteriorQis bounded by an expression that depends on the Kullback-Leibler divergence betweenQand the prior distribu- tionP. The Kullback-Leibler is a natural measure of the distance between two distributions. The theorem suggests that if we would like to minimize the generalization loss ofQ, we should jointly minimize both the empirical loss ofQ and the Kullback-Leibler distance betweenQand the prior distribution. We will

Understanding Machine Learning,©c2014 by Shai Shalev-Shwartz and Shai Ben-David Published 2014 by Cambridge University Press. Personal use only. Not for distribution. Do not post. Please link tohttp://www.cs.huji.ac.il/~shais/UnderstandingMachineLearning

Understanding Machine Learning: From Theory to Algorithms

31 PAC-Bayes

31.1 PAC-Bayes Bounds

Get our desktop app

Company

Features

Documentation

Resources