37.9 Bibliographical remarks 472
setI(p)⊆Pd− 1 by
I(p) =
{
q∈Pd− 1 :
∑d
i=1
(pi−qi)I{Φai=f}= 0 for alla∈[k] andf∈[F]
}
This is the set of distributions over the outcomes that are indistinguishable
frompby the learner using any actions. Then define
f(p) = max
q∈I(p)
min
a∈[k]
∑d
i=1
qiLai.
Rustichini [1999] proved there exist policies such that
nlim→∞maxi
1:n
E
[
1
n
∑n
t=1
LAtit−f( ̄un)
]
= 0,
whereu ̄n=^1 n
∑n
t=1eit∈Pd−^1 is the average outcome chosen by the adversary.
Intuitively this means the learner does not compete with the best action in
hindsight with respect to the actual outcomes. Instead, the learner competes
with the best action in hindsight with respect to an outcome sequence that is
indistinguishable from the actual outcome sequence. Rustichini did not prove
rates on the convergence of the limit. This has been remedied recently and we
give some references in the bibliographic remarks.
11 Finally, we want to emphasize that partial monitoring is still quite poorly
understood. We do not know how the regret should depend ond,F,kor the
structure ofG. Lower bounds that depend on these quantities are also missing
and the lower bounds proven in Section 37.4 are surely very conservative. We
hope this chapter inspires more activity in this area. The setting described in
the previous note is even more wide open, with even the dependence onnstill
not completely nailed down.
37.9 Bibliographical remarks
The first work on partial monitoring is by Rustichini [1999], who focussed on the
finding Hannan consistent policies in the adversarial setting. Rustichini shows how
to reduce the problem to Blackwell approachability (see Cesa-Bianchi and Lugosi
[2006]) and uses this to deduce the existence of a Hannan consistent strategy.
Rustichini also used a slightly different notion of regret, which eliminates the
hopeless games. The first nonasymptotic result in the setting of this chapter is
due to Piccolboni and Schindelhauer [2001] where a policy with regretO(n^3 /^4 )
is given for problems that are not hopeless. Cesa-Bianchi et al. [2006] reduced
the dependence toO(n^2 /^3 ) and proved a wide range of other results for specific
classes of problems. The classification theorem whend= 2 is due to Bart ́ok et al.
[2010] (extended version: Antos et al. [2013]). The classification of general partial
monitoring games is by Bart ́ok et al. [2014]. The NeighborhoodWatch policy is