STATISTICS
therefore consider thexias a set ofNrandom variables. In the most general case,
these random variables will be described by someN-dimensional joint probability
density functionP(x 1 ,x 2 ,...,xN).§In other words, an experiment consisting ofN
measurements is considered as a single randomsamplefrom the joint distribution
(orpopulation)P(x), wherexdenotes a point in theN-dimensional data space
having coordinates (x 1 ,x 2 ,...,xN).
The situation is simplified considerably if the sample valuesxiareindependent.
In this case, theN-dimensional joint distributionP(x) factorises into the product
ofNone-dimensional distributions,
P(x)=P(x 1 )P(x 2 )···P(xN). (31.1)
In the general case, each of the one-dimensional distributionsP(xi)maybe
different. A typical example of this occurs whenNindependent measurements
are made of some quantityxbut the accuracy of the measuring procedure varies
between measurements.
It is often the case, however, that each sample valuexiis drawn independently
from thesamepopulation. In this case,P(x) is of the form (31.1), but, in addition,
P(xi) has the same form for each value ofi. The measurementsx 1 ,x 2 ,...,xN
are then said to form arandom sample of sizeNfrom the one-dimensional
populationP(x). This is the most common situation met in practice and, unless
stated otherwise, we will assume from now on that this is the case.
31.2 Sample statistics
Suppose we have a set ofNmeasurementsx 1 ,x 2 ,...,xN. Any function of these
measurements (that contains no unknown parameters) is called asample statistic,
or often simply astatistic. Sample statistics provide a means of characterising the
data. Although the resulting characterisation is inevitably incomplete, it is useful
to be able to describe a set of data in terms of a few pertinent numbers. We now
discuss the most commonly used sample statistics.
§In this chapter, we will adopt the common convention thatP(x) denotes the particular probability
density function that applies to its argument,x. This obviates the need to use a different letter
for the PDF of each new variable. For example, ifXandYare random variables with different
PDFs, then properly one should denote these distributions byf(x)andg(y), say. In our shorthand
notation, these PDFs are denoted byP(x)andP(y), where it is understood that the functional
form of the PDF may be different in each case.