The Essentials of Biostatistics for Physicians, Nurses, and Clinicians

(Ann) #1
2.5 Generating Bootstrap Samples 29

replacement m times from the data where each data point has probabil-
ity 1/ n for each of the m draws, where n is the number of data points.
As mentioned in Section 2.3 , m is usually equal to n , but sometimes it
is advantageous to take m << n.
In this section, we will show how bootstrap samples can be gener-
ated, much as we did for simple random samples in Section 2.3. We
will further discuss the bootstrap when we get to hypothesis testing and
confi dence intervals, where it is commonly applied. Without going into
detail now, let us say that bootstrap estimation is based on using the
sampling distribution of estimates obtained from bootstrap samples.
In theory, that sampling distribution can be derived directly from
the data. However, this is not often easy to do (especially as n gets
large), so the distribution is approximated by Monte Carlo methods.
That means that we get a collection of B bootstrap samples by sampling
with replacement from the original data B times, each time taking a
sample of size m. In our example, we will take m = n , where n is the
size of the original sample.
The bootstrap samples, typically, differ from the original sample
because some observations get repeated in the bootstrap sample and
others are left out. This will become apparent in the example. To gener-
ate a bootstrap sample, we again partition the interval [0, 1). In this
case, since we have n samples indexed 1, 2, 3,... , n , we divide the
interval into n equal disjoint parts. Again taking U to be a uniform
random number from a table of random numbers, we get:


If 0 ≤ U < 1 / n , the index is 1.
If 1/ n ≤ U < 2 / n , the index is 2.
If 2/ n ≤ U < 3 / n , the index is 3.
·
·
·
If ( n − 2)/ n ≤ U < ( n − 1)/ n , the index is n − 1.
If ( n − 1)/ n ≤ U < n / n = 1, the index is n

Let us take the same population of six patients {A, B, C, D, E, F} that
we used in Section 2.3 , but it now represents the sample of patients.
Again, the correspondence of patients to indices:

Free download pdf