240 Some Elementary Statistical Inferences
In Example 4.1.3, we presented a data set on sulfur dioxide concentrations in a
damaged Bavarian forest. Letμdenote the true mean sulfur dioxide concentration.
Recall, based on the data, that our estimate ofμisx=53.92 with sample standard
deviations=
√
101 .48 = 10.07. Since the sample size isn= 24, for a 99% confidence
interval thet-critical value ist 0. 005 , 23 =qt(.995,23) = 2.807. Based on these
values, the confidence interval in expression (4.2.3) can be calculated. Assuming
that the R vectorsulfurdioxidecontains the sample, the R code to compute this
interval ist.test(sulfurdioxide,conf.level=0.99), which results in the 99%
confidence interval (48. 14 , 59 .69). Many scientists write this interval as 53. 92 ± 5 .78.
In this way, we can see our estimate ofμand the margin of error.
The distribution of the pivot random variableT=(X−μ)/(s/
√
n)ofthelast
example depends on the normality of the sampled items; however, this is approx-
imately true even if the sampled items are not drawn from a normal distribution.
TheCentral Limit Theorem(CLT) shows that the distribution ofTis approxi-
matelyN(0,1). In order to use this result now, we state the CLT now, leaving its
proof to Chapter 5; see Theorem 5.3.1.
Theorem 4.2.1(Central Limit Theorem).LetX 1 ,X 2 ,...,Xndenote the observa-
tions of a random sample from a distribution that has meanμand finite variance
σ^2. Then the distribution function of the random variableWn=(X−μ)/(σ/
√
n)
converges toΦ, the distribution function of theN(0,1)distribution, asn→∞.
As we further show in Chapter 5, the result stays the same if we replaceσby
the sample standard deviationS; that is, under the assumptions of Theorem 4.2.1,
the distribution of
Zn=
X−μ
S/
√
n
(4.2.4)
is approximatelyN(0,1). For the nonnormal case, as the next example shows, with
this result we can obtain an approximate confidence interval forμ.
Example 4.2.2(Large Sample Confidence Interval for the Mean μ). Suppose
X 1 ,X 2 ,...,Xnis a random sample on a random variableX with meanμand
varianceσ^2 , but, unlike the last example, the distribution ofXis not normal. How-
ever, from the above discussion we know that the distribution ofZn, (4.2.4), is
approximatelyN(0,1). Hence
1 −α≈Pμ
(
−zα/ 2 <
X−μ
S/
√
n
<zα/ 2
)
.
Using the same algebraic derivation as in the last example, we obtain
1 −α≈Pμ
(
X−zα/ 2
S
√
n
<μ<X+zα/ 2
S
√
n
)
. (4.2.5)
Again, lettingxandsdenote the realized values of the statisticsXandS, respec-
tively, after the sample is drawn, an approximate (1−α)100% confidence interval
forμis given by
(x−zα/ 2 s/
√
n,x+zα/ 2 s/
√
n). (4.2.6)
This is called alarge sampleconfidence interval forμ.