Anon

(Dana P.) #1

Descriptive Statistics 329


population or just a sample from that population. The key numbers when
dealing with populations are called parameters, while we refer to statis-
tics when we observe only a sample. Parameters are commonly denoted by
Greek letters while statistics are usually assigned Roman letters.
The difference between these two measures is that parameters are valid
values for the entire population or universe of data and, hence, remain con-
stant throughout whereas statistics may vary with every different sample
even though they each are selected from the very same population. This is
easily understood using the following example. Consider the average return
of all stocks listed in the S&P 500 index during a particular year. This quan-
tity is a parameter μ, for example, since it represents all these stocks. If one
randomly selects 10 stocks included in the S&P 500, however, one may end
up with an average return for this sample that deviates from the popula-
tion average, μ. The reason would be that by chance one has picked stocks
that do not represent the population very well. For example, one might by
chance select the top 10 performing stocks included in the S&P 500. Their
returns will yield an average (statistic) that is above the average of all 500
stocks (parameter). The opposite analog arises if one had picked the 10
worst performers. In general, deviations of the statistics from the parameters
are the result of one selecting the sample.


center and location


The measures we present first are those revealing the center and the location
of the data. The center and location are expressed by three different mea-
sures: mean, mode, and median.
The mean is the quantity given by the sum of all values divided by the
size of the data set. The size is the number of values or observations. The
mode is the value that occurs most often in a data set. If the distribution of
some population or the empirical distribution of some sample are known,
the mode can be determined to be the value corresponding to the highest
frequency. Roughly speaking, the median divides data by value into a lower
half and an upper half. A more rigorous definition for the median is that we
require that at least half of the data are no greater and at least half of the
data are no smaller than the median itself.
The interpretation of the mean is as follows: the mean gives an indica-
tion as to which value the data are scattered about. Moreover, on average,
one has to expect a data value equal to the mean when selecting an observa-
tion at random. However, one incurs some loss of information that is not
insignificant. Given a certain data size, a particular mean can be obtained
from different values. One extreme would be that all values are equal to
the mean. The other extreme could be that half of the observations are

Free download pdf