306 Online Learning
Show that if the regret ofAon each period of 2mrounds is at mostα
√
2 m,
then the total regret is at most
√
2
√
2 − 1
α
√
T.
5.Online-to-batch Conversions:In this exercise we demonstrate how a suc-
cessful online learning algorithm can be used to derive a successful PAC
learner as well.
Consider a PAC learning problem for binary classification parameterized
by an instance domain,X, and a hypothesis class,H. Suppose that there exists
an online learning algorithm,A, which enjoys a mistake boundMA(H)<∞.
Consider running this algorithm on a sequence ofTexamples which are sam-
pled i.i.d. from a distributionDover the instance spaceX, and are labeled by
someh?∈H. Suppose that for every roundt, the prediction of the algorithm
is based on a hypothesisht:X →{ 0 , 1 }. Show that
E[LD(hr)]≤MA(H)
T
,
where the expectation is over the random choice of the instances as well as a
random choice ofraccording to the uniform distribution over [T].
Hint: Use similar arguments to the ones appearing in the proof of Theo-
rem 14.8.