Understanding Machine Learning: From Theory to Algorithms

(Jeff_L) #1
3.5 Exercises 51


  1. Describe an algorithm that implements the ERM rule for learningHSingleton
    in the realizable setup.

  2. Show thatHSingletonis PAC learnable. Provide an upper bound on the
    sample complexity.

  3. LetX=R^2 ,Y={ 0 , 1 }, and letHbe the class of concentric circles in the


plane, that is,H={hr:r∈R+}, wherehr(x) = (^1) [‖x‖≤r]. Prove thatHis
PAC learnable (assume realizability), and its sample complexity is bounded
by
mH(,δ)≤



log(1/δ)



.


  1. In this question, we study the hypothesis class ofBoolean conjunctionsdefined
    as follows. The instance space isX={ 0 , 1 }dand the label set isY={ 0 , 1 }. A
    literal over the variablesx 1 ,...,xdis a simple Boolean function that takes the
    formf(x) =xi, for somei∈[d], orf(x) = 1−xifor somei∈[d]. We use the
    notation ̄xias a shorthand for 1−xi. A conjunction is any product of literals.
    In Boolean logic, the product is denoted using the∧sign. For example, the
    functionh(x) =x 1 ·(1−x 2 ) is written asx 1 ∧ ̄x 2.
    We consider the hypothesis class of all conjunctions of literals over thed
    variables. The empty conjunction is interpreted as the all-positive hypothesis
    (namely, the function that returnsh(x) = 1 for allx). The conjunctionx 1 ∧x ̄ 1
    (and similarly any conjunction involving a literal and its negation) is allowed
    and interpreted as the all-negative hypothesis (namely, the conjunction that
    returnsh(x) = 0 for allx). We assume realizability: Namely, we assume
    that there exists a Boolean conjunction that generates the labels. Thus, each
    example (x,y)∈X ×Yconsists of an assignment to thedBoolean variables
    x 1 ,...,xd, and its truth value (0 for false and 1 for true).
    For instance, letd= 3 and suppose that the true conjunction isx 1 ∧ ̄x 2.
    Then, the training setSmight contain the following instances:
    ((1, 1 ,1),0),((1, 0 ,1),1),((0, 1 ,0),0)((1, 0 ,0),1).


Prove that the hypothesis class of all conjunctions overdvariables is
PAC learnable and bound its sample complexity. Propose an algorithm that
implements the ERM rule, whose runtime is polynomial ind·m.


  1. LetX be a domain and letD 1 ,D 2 ,...,Dmbe a sequence of distributions
    overX. LetHbe a finite class of binary classifiers overX and letf ∈ H.
    Suppose we are getting a sampleSofmexamples, such that the instances are
    independent but are not identically distributed; theith instance is sampled
    fromDiand thenyiis set to bef(xi). LetD ̄mdenote the average, that is,
    D ̄m= (D 1 +···+Dm)/m.


Fix an accuracy parameter∈(0,1). Show that

P

[

∃h∈Hs.t.L(D ̄m,f)(h)> andL(S,f)(h) = 0

]

≤|H|e−m.
Free download pdf