Basic Statistics

(Barry) #1

166 REGRESSION AND CORRELATION


other time consuming and expensive. It would be useful to have a regression equation
that would allow us to predict the more expensive result from the inexpensive one.
In Section 12.1 we discuss interpreting scatter diagrams, a widely used graphic
method. Section 12.2 deals with linear regression analysis when the observations
are from a single sample. Formulas for computing the regression line, confidence
intervals, and tests of hypotheses are covered. In Section 12.3 the correlation coeffi-
cient is defined. Confidence intervals, tests of hypotheses, and interpretation of the
correlation coefficient are discussed. In Section 12.4 we discuss regression analysis
for the fixed-X model: when the model is used and what can be estimated from the
model. In Section 12.5 we discuss the use of transformations in regression analysis,
the detection and effect of outliers, and briefly mention multiple regression.


12.1 THE SCATTER DIAGRAM: SINGLE SAMPLE

The simplest and yet probably the most useful graphical technique for displaying the
relation between two variables is the scatter diagram (also called a scatterplot). The
first step in making a scatter diagram is to decide which variable to call the outcome
variable (also called the dependent or response variable) and which variable to call
the predictor or independent variable. As the names imply, the predictor variable
is the variable that we think predicts the outcome variable (the outcome variable is
dependent on the predictor variable). For example, for children we would assume
that age predicts height, so that age would be the predictor variable and height the
outcome variable-not the other way around.
The predictor variable is called the X variable and is plotted on the horizontal or
X axis of the scatter diagram. The outcome variable is called the Y variable and is
depicted on the vertical or Y axis. Each point on the scatter diagram must have both
an X value and a Y value and is plotted on the diagram at the appropriate horizontal
and vertical distances. As a small example of a scatter diagram, we will use the
hypothetical data in Table 12.1, consisting of weights in pounds (lb) from a sample of
10 adult men as the predictor or X variable and their systolic blood pressure (SBP)
in millimeters of mercury (mmHg) as the outcome or Y variable. The pair of values
for each point is written as (X, Y). For example, in Table 12.1 the pair of values for the
first adult male would be written as (165,134). Statistical programs such as Minitab,
SAS, SPSS, and Stata will all make scatter plots.
There are 10 points in the scatter diagram, one for each male. Scales have been
chosen for the X and Y axes that include the range of the weights and of the systolic
blood pressure. Tic marks have been placed at intervals of 5 units of systolic blood
pressure and every 20 lb of weight.
The scatter diagram is extremely useful in indicating the relationship between the
predictor and outcome variables (see Figure 12.1). One thing we note is whether the
relationship between X and Y is positive or negative. In Figure 12.1 it can be seen
that systolic blood pressure increases as weight increases. Adult males in this sample
who have higher weight tend to have higher blood pressure. This is called a positive
relationship. If we had plotted data from adults using vital capacity as the outcome

Free download pdf