Robert_V._Hogg,_Joseph_W._McKean,_Allen_T._Craig

(Jacob Rumans) #1
238 Some Elementary Statistical Inferences

(d)Obtain the sample mean and standard deviation and on the histogram overlay
the normal pdf with these estimates as parameters, usingmris=sort(mri)and
lines(dnorm(mris,mean(mris),sd(mris))~mris,lty=2). Comment on the
fit.

(e)Determine the proportions of the data within 1 and 2 standard deviations of
the sample mean and compare these with the empirical rule.

4.1.11.This is a famous data set on the speed of light recorded by the scientist
Simon Newcomb. The data set was obtained at the Carnegie Melon site given in
Exercise 4.1.10 and it can also be found in the rda filespeedlight.rdaat the sites
referenced in the Preface. Stigler (1977) presents an informative discussion of this
data set.


(a)Load the rda file and type the commandprint(speed). As Stigler notes, the
data values× 10 −^3 +24.8 are Newcomb’s data values; hence, negative items
can occur. Also, in the unit of the data the “true value” is 33.02. Discuss the
data.

(b)Obtain a histogram of the data. Comment on the shape.

(c)On the histogram overlay the default density estimator. Comment on the
shape.

(d)Obtain the sample mean and standard deviation and on the histogram overlay
the normal pdf with these estimates as parameters. Comment on the fit.

(e)Determine the proportions of the data within 1 and 2 standard deviations of
the sample mean and compare these with the empirical rule.

4.2 Confidence Intervals


Let us continue with the statistical problem that we were discussing in Section
4.1. Recall that the random variable of interestX has densityf(x;θ),θ∈Ω,
whereθis unknown. In that section, we discussed estimatingθby a statistic
̂θ=̂θ(X 1 ,...,Xn), whereX 1 ,...,Xnis a sample from the distribution ofX.When


the sample is drawn, it is unlikely that the value ofθ̂is the true value of the
parameter. In fact, if̂θhas a continuous distribution, thenPθ(̂θ=θ)=0,wherethe
notationPθdenotes that the probability is computed whenθis the true parameter.
What is needed is an estimate of the error of the estimation; i.e., by how much did
θ̂missθ? In this section, we embody this estimate of error in terms of a confidence
interval, which we now formally define:


Definition 4.2.1(Confidence Interval).LetX 1 ,X 2 ,...,Xnbe a sample on a ran-
dom variableX,whereXhas pdff(x;θ),θ∈Ω.Let 0 <α< 1 be specified. Let
L=L(X 1 ,X 2 ,...,Xn)andU=U(X 1 ,X 2 ,...,Xn)be two statistics. We say that
the interval(L, U)is a(1−α)100%confidence intervalforθif


1 −α=Pθ[θ∈(L, U)]. (4.2.1)
Free download pdf