Robert_V._Hogg,_Joseph_W._McKean,_Allen_T._Craig

(Jacob Rumans) #1
162 Some Special Distributions

3.1.3 HypergeometricDistribution


In Chapter 1, for a particular problem, we introduced the hypergeometric distribu-
tion; see expression (1.6.4). We now formally define it. Suppose we have a lot ofN
items of whichDare defective. LetXdenote the number of defective items in a
sample of sizen. If the sampling is done with replacement and the items are cho-
sen at random, thenXhas a binomial distribution with parametersnandD/N.
In this case the mean and variance ofXaren(D/N)andn(D/N)[(N−D)/N],
respectively. Suppose, however, that the sampling is without replacement, which is
often the case in practice. The pmf ofXfollows by noting in this case that each of
the


(N
n

)
samples are equilikely and that there are

(N−D
n−x

)(D
x

)
samples that havex
defective items. Hence, the pmf ofXis


p(x)=

(N−D
n−x

)(D
x

)
(N
n

) ,x=0, 1 ,...,n, (3.1.7)

where, as usual, a binomial coefficient is taken to be 0 when the top value is less
than the bottom value. We say thatXhas ahypergeometric distributionwith
parameters (N, D, n).
The mean ofXis

E(X)=

∑n

x=0

xp(x)=

∑n

x=1

x

(N−D
n−x

)
[D(D−1)!]/[x(x−1)!(D−x)!]
[N(N−1)!]/[(N−n)!n(n−1)!]

= n

D
N

∑n

x=1

(
(N−1)−(D−1)
(n−1)−(x−1)

)(
D− 1
x− 1

)(
N− 1
n− 1

)− 1
=n

D
N

.

In the next-to-last step, we used the fact that the probabilities of a hypergeometric
(N− 1 ,D− 1 ,n−1) distribution summed over its entire range is 1. So the means for
both types of sampling (with and without replacement) are the same. The variances,
though, differ. As Exercise 3.1.31 shows, the variance of a hypergeometric (N, D, n)
is


Var(X)=n

D
N

N−D
N

N−n
N− 1

. (3.1.8)


The last term is often thought of as the correction term when sampling without
replacement. Note that it is close to 1 ifNis much larger thann.
The pmf (3.1.7) can be computed in R with the codedhyper(x, D, N-D, n).
Suppose we draw 2 cards from a well shuffled standard deck of 52 cards and record
the number of aces. The next R segment shows the probabilities over the range
{ 0 , 1 , 2 }for sampling with and without replacement, respectively:


rng <- 0:2; dbinom(rng,2,1/13); dhyper(rng,4,48,2)
[1] 0.85207101 0.14201183 0.00591716
[1] 0.850678733 0.144796380 0.004524887

Notice how close the probabilities are.
Free download pdf