Also define a cumulative frequency function F(x)for the sample, sometimes re-
ferred to as a sample distribution function. The cumulative frequency function is
defined
F(x) =
∑
t≤x
f(t) = sum of all relative frequencies less than or equal to x
Whenever the data has too many numerical values then one usually defines
class intervals and class midpoints with class frequencies as in table 11.3. This is
called grouping of the data and the corresponding frequency function and cumulative
frequency function are associated with the grouped data.
The relative frequency distribution f(x)is also called a discrete probability dis-
tribution for the sample and the cumulative relative frequency function F(x) or
distribution function represents a probability. In particular,
F(x) = P(X≤x) = Probability that population variable X is less than or equal to x
1 −F(x) = P(X > x ) = Probability that population variable X is greater than x
(11.2)
Arithmetic Mean or Sample Mean
Given a set of data points X 1 , X 2 ,... , X N, define the arithmetic mean or sample
mean of the data set by
sample mean =X=X^1 +X^2 +···+XN
N
=
∑N
j=1 Xj
N
(11.3)
If the frequency of the data points are known, say X 1 , X 2 ,... , X koccur with frequen-
cies f ̃ 1 ,f ̃ 2 ,... ,f ̃k, then the arithmetic mean is calculated
X=
f ̃ 1 X 1 +f ̃ 2 X 2 +··· +f ̃kXk
f ̃ 1 +f ̃ 2 +··· +f ̃k =
∑k
∑j=1 f ̃jXj
k
j=1 f ̃j
=
∑k
j=1 f ̃jXj
N (11.4)
Note that the finite data collected is used to calculate an estimate of the true pop-
ulation mean μassociated with the total population.
Median, Mode and Percentiles
After arranging the data from low to high, the median of the data set is the
middle value or the arithmetic mean of the two middle values. This value divides
the data set into two equal numbered parts. In a similar fashion find those points
which divide the data set, arranged in order of magnitude, into four equal parts.
These values are usually denoted Q 1 , Q 2 , Q 3 and are called the first, second and third