The Essentials of Biostatistics for Physicians, Nurses, and Clinicians

(Ann) #1
2.3 Selecting Simple Random Samples 25

method is preferred on the computer because generating new random
numbers is faster than calculating new partitions. So although it looks
to be wasteful in practice, it is usually computationally faster.
Now for each patient, we are interested in a particular characteristic
that we can measure. For this example, we choose age in years at their
last birthday. Let us assume the ages for the patients are as follows:

A is 26
B is 17
C is 45
D is 70
E is 32
F is 9

The parameter of interest is the average age of the population. We
will estimate it using the sample estimate. Since the population is 6,
and we know the six values, the population mean, denoted as μ , is (26



  • 1 7 + 4 5 + 7 0 + 3 2 + 9)/6 = 33.1667. In our example, we will not
    know the parameter value because we will only see the sample of size
    4 and will not know the ages of the two patients that were not selected.
    Now, if we generated the random sample using the exact random
    numbers that we got from the reject technique, we would have {B, C,
    E, F} as our sample, and the sample mean will be (17 + 4 5 +
    32 + 9)/4 = 19.5. This is our estimate. It is a lot smaller than the true
    population mean of 33.1667.
    This is because patient D is not in the sample. D is the oldest patient
    and is 70. So his addition in the average would increase the mean and
    his absence decreases it. So if we added D to the sample, the average
    would be (17 + 4 5 + 7 0 + 3 2 + 9)/5 = 34.6. So adding D to the sample
    increases the mean from 19.5 to 34.6. On the other hand, if we think
    of the sample as being {B, C, D, E, F}, the removal of D drops the
    mean from 34.6 to 19.5. So the infl uence of D is 15.1 years! This is
    how much D infl uences the mean. This shows that the mean is a param-
    eter that is heavily infl uenced by outliers. We will address this again
    later.
    The sample mean as an estimator is unbiased. That means that if
    we averaged the estimate for the 15 possible samples of size 4, we

Free download pdf