Understanding Machine Learning: From Theory to Algorithms

3.5 Exercises 51

Describe an algorithm that implements the ERM rule for learningHSingleton
in the realizable setup.

Show thatHSingletonis PAC learnable. Provide an upper bound on the
sample complexity.

LetX=R^2 ,Y={ 0 , 1 }, and letHbe the class of concentric circles in the

plane, that is,H={hr:r∈R+}, wherehr(x) = (^1) [‖x‖≤r]. Prove thatHis
PAC learnable (assume realizability), and its sample complexity is bounded
by
mH(,δ)≤

⌈

log(1/δ)

⌉

.

In this question, we study the hypothesis class ofBoolean conjunctionsdefined
as follows. The instance space isX={ 0 , 1 }dand the label set isY={ 0 , 1 }. A
literal over the variablesx 1 ,...,xdis a simple Boolean function that takes the
formf(x) =xi, for somei∈[d], orf(x) = 1−xifor somei∈[d]. We use the
notation ̄xias a shorthand for 1−xi. A conjunction is any product of literals.
In Boolean logic, the product is denoted using the∧sign. For example, the
functionh(x) =x 1 ·(1−x 2 ) is written asx 1 ∧ ̄x 2.
We consider the hypothesis class of all conjunctions of literals over thed
variables. The empty conjunction is interpreted as the all-positive hypothesis
(namely, the function that returnsh(x) = 1 for allx). The conjunctionx 1 ∧x ̄ 1
(and similarly any conjunction involving a literal and its negation) is allowed
and interpreted as the all-negative hypothesis (namely, the conjunction that
returnsh(x) = 0 for allx). We assume realizability: Namely, we assume
that there exists a Boolean conjunction that generates the labels. Thus, each
example (x,y)∈X ×Yconsists of an assignment to thedBoolean variables
x 1 ,...,xd, and its truth value (0 for false and 1 for true).
For instance, letd= 3 and suppose that the true conjunction isx 1 ∧ ̄x 2.
Then, the training setSmight contain the following instances:
((1, 1 ,1),0),((1, 0 ,1),1),((0, 1 ,0),0)((1, 0 ,0),1).

Prove that the hypothesis class of all conjunctions overdvariables is PAC learnable and bound its sample complexity. Propose an algorithm that implements the ERM rule, whose runtime is polynomial ind·m.

LetX be a domain and letD 1 ,D 2 ,...,Dmbe a sequence of distributions
overX. LetHbe a finite class of binary classifiers overX and letf ∈ H.
Suppose we are getting a sampleSofmexamples, such that the instances are
independent but are not identically distributed; theith instance is sampled
fromDiand thenyiis set to bef(xi). LetD ̄mdenote the average, that is,
D ̄m= (D 1 +···+Dm)/m.

Fix an accuracy parameter∈(0,1). Show that

P

[

∃h∈Hs.t.L(D ̄m,f)(h)> andL(S,f)(h) = 0

]

≤|H|e−m.

Understanding Machine Learning: From Theory to Algorithms

⌈

⌉

.

[

]

Get our desktop app

Company

Features

Documentation

Resources