Bandit Algorithms

(Jeff_L) #1
2.1 Probability spaces and random elements 23

prevents us from definingF= 2Ω. There are two justifications not to do this,
the first technical and the second conceptual. The technical issue is highlighted
by the following surprising theorem, which shows there does not exist a uniform
probability distribution on Ω = [0,1] ifFis chosen to be the powerset of Ω (a
uniform probability distribution over [0,1], if existed, would have the property of
assigning the length to every interval). In other words, if you want to be able
to define the uniform measure, thenFcannot be too large. By contrast, the
uniform measure can be defined on the Borelσ-algebra, though proving this is
not elementary.

Theorem2.3.LetΩ = [0,1]andFis the powerset ofΩ. Then there does not
exist a measurePon(Ω,F)such thatP([a,b]) =b−afor all 0 ≤a≤b≤ 1.


A second technical reason to prefer the measure-theoretic approach to
probabilities is that this approach allows for the unification of distributions
on discrete spaces and densities on continuous ones (the uninitiated reader will
find the definitions of these later). This unification can be necessary when dealing
with random variables that combine elements of both. For example, when dealing
with a random variable that is zero with probability 1/2 and otherwise behaves
like a standard Gaussian.
The main conceptual reason not to focus exclusively on the case whereFis
the powerset is thatσ-algebras are a way of representing information. This is
especially useful in the study of bandits where the learner is interacting with an
environment and slowly gaining knowledge. One useful way to represent this idea
is by the means of a sequence ofσ-algebras as we explain in the next section. One
might be worried that the Borelσ-algebra does not contain enough measurable
sets. Rest assured that this is not a problem and you will not easily find a
non-measurable set. For completeness, an example will still be given in the notes,
along with a little more discussion on this topic.


From laws to probability spaces and random variables
A big ‘conspiracy’ in probability theory is that probability spaces are seldom
mentioned in theorem statements, despite the fact that a measure cannot be
defined without one. Statements are instead given in terms of random elements
and constraints on their joint probabilities. For example, suppose thatXandY
are random variables such that


P(X∈A,Y∈B) =

|A∩[6]|


6


·


|B∩[2]|


2


for allA,B∈B(R), (2.1)

which represents the joint distribution for the values of a dice (X∈[6]) and coin
(Y∈[2]). The formula describes some constraints on the probabilistic interactions
between the outputs ofXandY, but says nothing about their domain. In a way,
the domain is an unimportant detail. Nevertheless, onemustask whether or not
an appropriate domain exists at all. More generally, one may ask whether an
appropriate probability space exists given some constraints on the joint law of a

Free download pdf