Bandit Algorithms

19.3 Regret analysis 231

Soˆθtis an estimate ofθ∗, which makes it natural to chooseCtto be centered at θˆt− 1. For what follows we will simply assume that the confidence setCtis closed and satisfies Ct⊆Et=

{

θ∈Rd:‖θ−θˆt− 1 ‖^2 Vt− 1 ≤βt

}

, (19.6)

where (βt)tis an increasing sequence of constants withβ 1 ≥1. The setEtis an ellipsoid centered atθˆt− 1 and with principle axis being the eigenvectors ofVt with corresponding lengths being the reciprocal of the eigenvalues. Notice that as tgrows the matrixVthas increasing eigenvalues, which means the volume of the ellipse is also shrinking (at least, providedβtdoes not grow too fast). As noted beforehand, the next chapter will be devoted to show thatCt=Etis a natural choice for carefully chosenβt. In the rest of this chapter, we simply examine the consequence of using a confidence set satisfying Eq. (19.6) and assume all the desirable properties.

19.3 Regret analysis

We prove a regret bound for LinUCB under the assumption that the confidence intervals indeed contain the true parameter with high probability and boundedness conditions on the action set and rewards.

Assumption19.1.The following hold:

(a)|〈a,θ∗〉|≤1 for alla∈

⋃n t=1At. (b)‖a‖ 2 ≤Lfor alla∈

⋃n t=1At. (c)There exists aδ∈(0,1) such that with probability 1−δ, for allt∈[n], θ∗∈CtwhereCtsatisfies Eq. (19.6).

Theorem19.2. Under the conditions of Assumption 19.1 with probability 1 −δ the regret of LinUCB satisfies

Rˆn≤

√

8 nβnlog

(

detVn detV 0

)

≤

√

8 dnβnlog

(

dλ+nL^2 dλ

)

.

Theorem 20.5 in the next chapter shows thatβnmay be chosen to be

√ βn=

√

λL+

√

2 log

(

1

δ

)

+dlog

(

dλ+nL^2 dλ

)

.

By choosingδ= 1/nwe obtain the follow corollary bounding the expected regret.

Corollary19.3.Under the conditions of Assumption 19.1 the expected regret of LinUCB withδ= 1/nis bounded by

Rn≤Cd

√

nlog(nL),

whereC > 0 is a suitably large universal constant.

Bandit Algorithms

{

}

, (19.6)

19.3 Regret analysis

√

(

)

≤

√

(

)

.

√

√

(

1

)

(

)

.

√

Get our desktop app

Company

Features

Documentation

Resources