25.2 Clouds looming for optimism 285
The constraints mean that
1
α(a 3 )ε^2 γ^2 +α(a 2 )=α(alim 1 )→∞‖a^2 ‖
(^2) H(α)− 1 ≤^1
2
γ^2 ε^2
α(a 3 )ε^2 γ^2 +α(a 2 )
= lim
α(a 1 )→∞
‖a 3 ‖^2 H(α)− 1 ≤
ε^2
2
.
Provided thatγ≥1 this reduces to the constraint that
α(a 3 )ε^2 +α(a 2 )≥ 2 γ^2.
Since we are minimizingα(a 2 ) +εα(a 3 ) we can easily see thatα(a 2 ) = 2γ^2 and
α(a 3 ) = 0 provided that 2γ^2 ≤ 2 /ε. Therefore ifεis chosen sufficiently small
relative toγ, then the optimal rate of the regret isc(A,θ) = 2γ^2 and so by
Theorem 25.3 there exists a policy such that
lim sup
n→∞
Rn(A,θ)
log(n)
= 2γ^2.
Now we argue that forγsufficiently large andεarbitrarily small that the regret
for any consistent optimistic algorithm is at least
lim sup
n→∞
Rn(A,θ)
log(n)
= Ω(1/ε),
which can be arbitrarily worse than the optimal rate! So why is this so? Recall
that optimistic algorithms choose
At= argmaxa∈Amax ̃
θ∈Ct
〈
a,θ ̃
〉
,
whereCt⊂Rdis a confidence set that we assume contains the trueθwith high
probability. So far this does not greatly restrict the class of algorithms that we
might call optimistic. We now assume that there exists a constantc >0 such
that
Ct⊆
{
̃θ:‖θˆt−θ ̃‖Vt≤c√log(n)
}
,
whereVt=
∑t
s=1AsA>s. So now we ask how often can we expect the optimistic
algorithm to choose actiona 2 =e 2 in the example described above? Since we
have assumedθ∈Ctwith high probability we have that
max ̃
θ∈Ct
〈a 1 ,θ ̃〉≥ 1.
On the other hand, ifTa 2 (t−1)> 4 c^2 log(n), then
max ̃
θ∈Ct
〈a 2 ,θ ̃〉= max ̃
θ∈Ct
〈a 2 ,θ ̃−θ〉≤ 2 c
√
‖a 2 ‖Vt− 1 log(n)≤ 2 c
√
log(n)
Ta 2 (t−1)
< 1 ,
which means thata 2 will not be chosen more than 1 + 4c^2 log(n) times. So if
γ= Ω(c^2 ), then the optimistic algorithm will not choosea 2 sufficiently often
and a simple computation shows it must choosea 3 at least Ω(log(n)/ε^2 ) times