Robert_V._Hogg,_Joseph_W._McKean,_Allen_T._Craig

(Jacob Rumans) #1
258 Some Elementary Statistical Inferences

wheregk(yk)isthepdfofYkgiven in expression (4.4.2). If, in this integral, we
make a change of variables through the transformationz=F(yk), we have


E(F(Yk)) =

∫ 1

0

n!
(k−1)!(n−k)!

zk(1−z)n−kdz.

Comparing this to the integral of a beta pdf, we see that it is equal to

E(F(Yk)) =

n!k!(n−k)!
(k−1)!(n−k)!(n+1)!

=

k
n+1

.

On the average, there isk/(n+ 1) of the total area to the left ofYk. Because
p=. k/(n+ 1), it seems reasonable to takeYkas an estimator of the quantileξp.
Hence, we callYkthepthsample quantileIt is also called the 100pthpercentile
of the sample.


Remark 4.4.2.Some statisticians define sample quantiles slightly differently from
what we have. For one modification with 1/(n+1)<p<n/(n+1), if (n+1)/p
is not equal to an integer, then thepth quantile of the sample may be defined as
follows. Write (n+1)p=k+r,wherek=[(n+1)p]andris a proper fraction,
using the weighted average. Then thepth quantile of the sample is the weighted
average
(1−r)Yk+rYk+1, (4.4.5)
which is an estimator of thepth quantile. Asnbecomes large, however, all these
modified definitions are essentially the same. For R code, let the R vectorxcontain
the realization of the sample. Then the callquantile(x,p)computes apth quantile
of form (4.4.5).


Sample quantiles are useful descriptive statistics. For instance, ifykis thepth
quantile of the realized sample, then we know that approximatelyp100% of the data
are less than or equal toykand approximately (1−p)100% of the data are greater
than or equal toyk. Next we discuss two statistical applications of quantiles.
Afive-numbersummary of the data consists of the following five sample quan-
tiles: the minimum (Y 1 ), the first quartile (Y.25(n+1)), the median defined in expres-
sion (4.4.4), the third quartile (Y.75(n+1)), and the maximum (Yn). For this section,
we use the notationQ 1 ,Q 2 ,andQ 3 to denote, respectively, the first quartile, me-
dian, and third quartile of the sample.
The five-number summary divides the data into their quartiles, offering a sim-
ple and easily interpretable description of the data. Five-number summaries were
made popular by the work of the late Professor John Tukey [see Tukey (1977) and
Mosteller and Tukey (1977)]. Tukey used the median of the lower half of the data
(from minimum to median) and the median of the upper half of the data instead
of the first and third quartiles. He referred to these quantities as thehingesof
the data. The R functionfivenum(x)returns the hinges along with the minimum,
median, and maximum of the data.


Example 4.4.4.The following data are the ordered realizations of a random sample
of size 15 on a random variableX.

Free download pdf