30.4 Follow the perturbed leader 348calculate the Hessian we use a change of variable to avoid applying the gradient
to the non-differentiable argmax:∇^2 F∗(x) =∇(∇F∗(x)) =∇E[a(x+Z)] =∇∫
Rda(x+z)f(z)dz=∇∫
Rda(u)f(u−x)du=∫
Rda(u)(∇f(u−x))>du=
∫
Rda(u) sign(u−x)>f(u−x)du=∫
Rda(x+z) sign(z)>f(z)dz.Using the definition ofξand the fact thata(x) is nonnegative,∇^2 F∗(ξ)ij=∫
Rda(ξ+z)isign(z)jf(z)dz (30.9)≤∫
Rda(ξ+z)if(z)dz=
∫
Rda(z−ηLˆt− 1 −αηYˆt)if(z)dz=∫
Rda(u−ηLˆt− 1 )if(u+αηYˆt)du≤exp(
‖αηYˆt‖ 1)∫
Rda(u−ηLˆt− 1 )if(u)du≤ePti, (30.10)where the last inequality follows sinceα∈[0,1] andYˆti≤β= 1/(mη) andYˆt
has at mostmnonzero entries. Continuing on from Eq. (30.8) we have
η^2
2‖Yˆt‖^2 ∇ (^2) F∗(ξ)≤
eη^2
2
∑d
i=1
PtiYˆti
∑d
j=1
Yˆtj≤eη
2
2
∑d
i=1
∑d
j=1
PtiKtiAtiKtjAtj.
Chaining together the parts and taking the expectation shows that
E[DF(A ̄t,A ̄t+1)]≤
eη
2 E
∑di=1∑dj=1PtiKtiAtiKtjAtj
=eη2
2E
∑di=1∑dj=1AtiAtj
Ptj
≤emdη2
2.
The last step is to control the bias term:
E
[n
∑t=1〈A ̄t−a,yt−Yˆt〉]
≤E
[n
∑t=1〈A ̄t,yt−Yˆt〉]
=E
[n
∑t=1∑di=1Pti(yti−min{Ptiβ,yti})]
≤
dn
2 β=
dnmη
2