3.5 Exercises 51
- Describe an algorithm that implements the ERM rule for learningHSingleton
in the realizable setup. - Show thatHSingletonis PAC learnable. Provide an upper bound on the
sample complexity. - LetX=R^2 ,Y={ 0 , 1 }, and letHbe the class of concentric circles in the
plane, that is,H={hr:r∈R+}, wherehr(x) = (^1) [‖x‖≤r]. Prove thatHis
PAC learnable (assume realizability), and its sample complexity is bounded
by
mH(,δ)≤
⌈
log(1/δ)
⌉
.
- In this question, we study the hypothesis class ofBoolean conjunctionsdefined
as follows. The instance space isX={ 0 , 1 }dand the label set isY={ 0 , 1 }. A
literal over the variablesx 1 ,...,xdis a simple Boolean function that takes the
formf(x) =xi, for somei∈[d], orf(x) = 1−xifor somei∈[d]. We use the
notation ̄xias a shorthand for 1−xi. A conjunction is any product of literals.
In Boolean logic, the product is denoted using the∧sign. For example, the
functionh(x) =x 1 ·(1−x 2 ) is written asx 1 ∧ ̄x 2.
We consider the hypothesis class of all conjunctions of literals over thed
variables. The empty conjunction is interpreted as the all-positive hypothesis
(namely, the function that returnsh(x) = 1 for allx). The conjunctionx 1 ∧x ̄ 1
(and similarly any conjunction involving a literal and its negation) is allowed
and interpreted as the all-negative hypothesis (namely, the conjunction that
returnsh(x) = 0 for allx). We assume realizability: Namely, we assume
that there exists a Boolean conjunction that generates the labels. Thus, each
example (x,y)∈X ×Yconsists of an assignment to thedBoolean variables
x 1 ,...,xd, and its truth value (0 for false and 1 for true).
For instance, letd= 3 and suppose that the true conjunction isx 1 ∧ ̄x 2.
Then, the training setSmight contain the following instances:
((1, 1 ,1),0),((1, 0 ,1),1),((0, 1 ,0),0)((1, 0 ,0),1).
Prove that the hypothesis class of all conjunctions overdvariables is
PAC learnable and bound its sample complexity. Propose an algorithm that
implements the ERM rule, whose runtime is polynomial ind·m.
- LetX be a domain and letD 1 ,D 2 ,...,Dmbe a sequence of distributions
overX. LetHbe a finite class of binary classifiers overX and letf ∈ H.
Suppose we are getting a sampleSofmexamples, such that the instances are
independent but are not identically distributed; theith instance is sampled
fromDiand thenyiis set to bef(xi). LetD ̄mdenote the average, that is,
D ̄m= (D 1 +···+Dm)/m.
Fix an accuracy parameter∈(0,1). Show that
P
[
∃h∈Hs.t.L(D ̄m,f)(h)> andL(S,f)(h) = 0
]
≤|H|e−m.