35.5 Computing the Gittins index 423any policyπ,Eπ[∞
∑
t=1αtr(SAt(t))]
≤Eπ[∞
∑
t=1αtGAt(t)]
=Eπ[n
∑t=1αtIt(H,A)]
≤Eπ[n
∑t=1αtI∗t(H)]
,
where the last line follows from Lemma 35.10. Finally note that the law ofH
underPπdoes not depend onπand hence
Eπ[n
∑t=1αtIt∗(H)]
=Eπ∗[n
∑t=1αtIt∗(H)]
Therefore for allπ,
Eπ∗[∞
∑
t=1αtr(SAt(t))]
≥Eπ[∞
∑
t=1αtr(SAt(t))]
,
which completes the proof.
35.5 Computing the Gittins index
We describe a simple approach that depends on the state space being finite.
References to more general methods are given in the bibliographic remarks.
Assume without loss of generality thatS={ 1 , 2 ,...,|S|}andG= 2S. The matrix
form of the transition kernel isP∈[0,1]|S|×|S|and is defined byPij=μ(i,{j}).
We also letr∈[0,1]|S|be the vector of rewards so thatri=r(i). The standard
basis vector isei∈R|S|and 1 ∈R|S|is the vector with 1 in every coordinate.
ForC⊂Swe letQCbe the transition matrix with (QC)ij=PijIC(j). For each
i∈Sour goal is to find
g(i) = sup
τ≥ 2Ei[∑τ− 1
t=1αtr(St)]
Ei[∑τ− 1
t=1αt] ,
whereEiis the expectation with respect the measurePifor which the initial state
isS 1 =i. Lemma 35.7 shows that the stopping timeτ=min{t≥2 :g(St)≤g(i)}
attains the supremum in the above display. The setCi={j:g(j)> g(i)}is
called the continuation region andSi=S\Ciis the stopping region. Then the
Gittins index can be calculated as
g(i) =Ei[∑τ− 1
t=1αtr(St)]
Ei[∑τ− 1
t=1αt] =
∑∞
t=1αte>
iQt− 1
∑ Cir
∞
t=1αte>iQt− 1
Ci^1=
e>i(I−αQCi)−^1 r
e>i(I−αQCi)−^11