Bandit Algorithms

32.2 Policy 366

this model relative to the previous ones is that it offers more flexibility and yet it is not so flexible that learning is impossible.

32.2 Policy

We now explain the policy for learning to rank whenvis unknown, but satisfies Assumption 32.1. After the description is an illustration that may prove helpful.

Step 0: Initialization The policy takes as input a confidence parameterδ∈(0,1) and`andm. The policy maintains a binary relationGt⊆[`]×[`]. In the first roundt= 1 the relation is empty:G 1 =∅. You should think ofGtas maintaining pairs (i,j) for which the policy has proven with high probability thatα(i)< α(j). Ideally, Gt⊆{(i,j)∈[`]×[`] :α(i)< α(j)}.

Step 1: Defining a partition In each roundtthe learner computes a partition of the actions based on a topological sort according to relationGt. GivenA⊂[`] defineminGt(A) to be the set of minimum elements ofAaccording to relationGt.

minGt(A) ={i∈A: (i,j)∈/Gtfor allj∈Gt}.

Then letPt 1 ,Pt 2 ,...be the partition of [`] defined inductively by

Ptd= minGt

(

[`]\

d⋃− 1

c=1

Ptc

)

.

Finally, letMt=max{d:Ptd 6 =∅}. The reader should check that ifGtdoes not have cycles, thenMtis well defined and finite and thatPt 1 ,...,PtMtis indeed a partition of [`] (Exercise 32.5). The event thatGtcontains cycles is a failure event. In order for the policy to be well defined we assume it chooses some arbitrary fixed action in this case.

Step 2: Choosing an action LetIt 1 ,...,ItMtbe a partition of [`] defined inductively by

Itd= [|∪c≤dPtc|]\[|∪c<dPtc|].

Next let Σt⊆Abe the set of actionsσsuch thatσ(Itd) =Ptdfor alld∈[Mt]. The algorithm choosesAtuniformly at random from Σt. Intuitively the policy first shuffles the items inPt 1 and uses these as the first|Pt 1 |entries in the ranking. ThenPt 2 is shuffled and the items are appended to the ranking. This process is repeated until the ranking is complete. For an itemi∈[`], we denote byDtithe unique indexdsuch thati∈Ptd.

Bandit Algorithms

32.2 Policy

(

[`]\

)

.

Get our desktop app

Company

Features

Documentation

Resources