Robert_V._Hogg,_Joseph_W._McKean,_Allen_T._Craig

(Jacob Rumans) #1
4.1. Sampling and Statistics 231

Next, suppose that the space ofXis infinite, say,D={a 1 ,a 2 ,...}. In practice,
we select a value, say,am, and make the groupings


{a 1 },{a 2 },...,{am}, ̃am+1={am+1,am+2,...}. (4.1.12)

Letp̂( ̃am+1) be the proportion of sample items that are greater than or equal
toam+1. Then the estimates{p̂(a 1 ),...,̂p(am),̂p( ̃am+1)}form our estimate of
p(x). For the merging of groups, a rule of thumb is to selectmso that the fre-
quency of the categoryamexceeds twice the combined frequencies of the categories
am+1,am+2,....
A histogram is abarplotof̂p(aj)versusaj. There are two cases to consider. For
the first case, suppose the valuesajrepresent qualitative categories, for example,
hair colors of a population of people. In this case, there is no ordinal information
in theajs. The usual histogram for such data consists of nonabutting bars with
heightsp̂(aj) that are plotted in decreasing order of thep̂(a 1 )s. Such histograms
are usually calledbar charts. An example is helpful here.


Example 4.1.5 (Hair Color of Scottish School Children). Kendall and Sturat
(1979) present data on the eye and hair color of Scottish schoolchildren in the
early 1900s. The data are also in the filescotteyehair.rdaat the site listed in the
Preface. In this example, we consider hair color. The discrete random variable is
the hair color of a Scottish child with categories fair, red, medium, dark, and black.
The results that Kendall and Sturat present are based on a sample ofn=22, 361
Scottish school children. The frequency distribution of this sample and the estimate
of the pmf are


Fair Red Medium Dark Black
Count 5789 1319 9418 5678 157
p̂(aj) 0.259 0.059 0.421 0.254 0.007

The bar chart of this sample is shown in Figure 4.1.1. Assume that the counts
(second row of the table) are in the R vectorvec. Then the following R segment
computes this bar chart:
n=sum(vec); vecs = sort(vec,decreasing=T)/n
nms = c("Medium","Fair","Dark","Red","Black")
barplot(vecs,beside=TRUE,names.arg=nms,ylab="",xlab="Haircolor")

For the second case, assume that the values in the spaceDareordinalin nature;
i.e., the natural ordering of theajs is numerically meaningful. In this case, the usual
histogram is an abutting bar chart with heightsp̂(aj) that are plotted in the natural
order of theajs, as in the following example.


Example 4.1.6(Simulated Poisson Variates). The following 30 data points are
simulated values drawn from a Poisson distribution with meanλ=2;seeExample
4.8.2 for the generation of Poisson variates.


211115113021134
212265232413130
Free download pdf