3.5 Exercises 51- Describe an algorithm that implements the ERM rule for learningHSingleton
 in the realizable setup.
- Show thatHSingletonis PAC learnable. Provide an upper bound on the
 sample complexity.
- LetX=R^2 ,Y={ 0 , 1 }, and letHbe the class of concentric circles in the
plane, that is,H={hr:r∈R+}, wherehr(x) = (^1) [‖x‖≤r]. Prove thatHis
PAC learnable (assume realizability), and its sample complexity is bounded
by
mH(,δ)≤
⌈
log(1/δ)
⌉
.
- In this question, we study the hypothesis class ofBoolean conjunctionsdefined
 as follows. The instance space isX={ 0 , 1 }dand the label set isY={ 0 , 1 }. A
 literal over the variablesx 1 ,...,xdis a simple Boolean function that takes the
 formf(x) =xi, for somei∈[d], orf(x) = 1−xifor somei∈[d]. We use the
 notation ̄xias a shorthand for 1−xi. A conjunction is any product of literals.
 In Boolean logic, the product is denoted using the∧sign. For example, the
 functionh(x) =x 1 ·(1−x 2 ) is written asx 1 ∧ ̄x 2.
 We consider the hypothesis class of all conjunctions of literals over thed
 variables. The empty conjunction is interpreted as the all-positive hypothesis
 (namely, the function that returnsh(x) = 1 for allx). The conjunctionx 1 ∧x ̄ 1
 (and similarly any conjunction involving a literal and its negation) is allowed
 and interpreted as the all-negative hypothesis (namely, the conjunction that
 returnsh(x) = 0 for allx). We assume realizability: Namely, we assume
 that there exists a Boolean conjunction that generates the labels. Thus, each
 example (x,y)∈X ×Yconsists of an assignment to thedBoolean variables
 x 1 ,...,xd, and its truth value (0 for false and 1 for true).
 For instance, letd= 3 and suppose that the true conjunction isx 1 ∧ ̄x 2.
 Then, the training setSmight contain the following instances:
 ((1, 1 ,1),0),((1, 0 ,1),1),((0, 1 ,0),0)((1, 0 ,0),1).
Prove that the hypothesis class of all conjunctions overdvariables is
PAC learnable and bound its sample complexity. Propose an algorithm that
implements the ERM rule, whose runtime is polynomial ind·m.- LetX be a domain and letD 1 ,D 2 ,...,Dmbe a sequence of distributions
 overX. LetHbe a finite class of binary classifiers overX and letf ∈ H.
 Suppose we are getting a sampleSofmexamples, such that the instances are
 independent but are not identically distributed; theith instance is sampled
 fromDiand thenyiis set to bef(xi). LetD ̄mdenote the average, that is,
 D ̄m= (D 1 +···+Dm)/m.
Fix an accuracy parameter∈(0,1). Show thatP[
∃h∈Hs.t.L(D ̄m,f)(h)> andL(S,f)(h) = 0]
≤|H|e−m.