Introductory Biostatistics

(Chris Devlin) #1

a measure of location, atypicalvalue representing the data set. In addition, the
varianceand/orstandard deviationis formed and used to measure the degree of
variation or dispersion of data around the mean. In this short section we will
see that binary data can be treated as a special case of continuous data.
Many outcomes can be classified as belonging to one of two possible cate-
gories: presence and absence, nonwhite and white, male and female, improved
and not improved. Of course, one of these two categories is usually identified as
being of primary interest; for example, presence in the presence and absence
classification, or nonwhite in the white and nonwhite classification. We can, in
general, relabel the two outcome categories as positiveðþÞand negativeðÞ.
An outcome is positive if the primary category is observed and is negative if the
other category is observed. The proportion is defined as in Chapter 1:



x
n

wherexis the number of positive outcomes andnis the sample size. However,
it can also be expressed as



P


xi
n

wherexiis ‘‘1’’ if theith outcome is positive and ‘‘0’’ otherwise. In other words,
a sample proportion can be viewed as a special case of sample means where
data are coded as 0 or 1. But what do we mean byvariationordispersion, and
how do we measure it?
Let us write out the variances^2 using the shortcut formula of Section 2.2 but
with the denominatorninstead ofn1 (this would make little di¤erence
because we almost always deal withlargesamples of binary data):



ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi
P
xi^2 ð

P


xiÞ^2 =n
n

s

Sincexiis binary, with ‘‘1’’ if theith outcome is positive and ‘‘0’’ otherwise, we
have


xi^2 ¼xi

and therefore,


s^2 ¼

P


xið

P


xiÞ^2 =n
n

¼

P


xi
n

1 


P


xi
n




¼pð 1 pÞ

82 DESCRIPTIVE METHODS FOR CONTINUOUS DATA

Free download pdf