are discussed in many more specialized texts. In general, smoothing takes place by the
averaging of Yvalues close to the target value of the predictor. In other words we move
across the graph computing lines as we go (Everitt, 2005). An example of a smoothed plot
is shown in Figure 9.3. This plot was produced using R, but similar plots can be produced
using SPSS and clicking on the Fit panel as you define the scatterplot you want. The ad-
vantage of using smoothed lines is that it gives you a better idea about the overall form of
the relationship. Given the amount of variability that we see in our data, it is difficult to tell
whether the smoothed plot fits significantly better than a straight line, but it is reasonable
to assume that symptoms would increase with the level of stress, but that this increase
would start to level off at some point.
9.7 The Accuracy of Prediction
The fact that we can fit a regression line to a set of data does not mean that our problems
are solved. On the contrary, they have only begun. The important point is not whether a
straight line can be drawn through the data (you can always do that) but whether that line
represents a reasonable fit to the data—in other words, whether our effort was worthwhile.
In beginning a discussion of errors of prediction, it is instructive to consider the situa-
tion in which we wish to predict Ywithout any knowledge of the value of X.
The Standard Deviation as a Measure of Error
As mentioned earlier, the data plotted in Figure 9.2 represent the log of the number of
symptoms shown by students (Y) as a function of the number of stressful life events (X).
Assume that you are now given the task of predicting the number of symptoms that will be
shown by a particular individual, but that you have no knowledge of the number of stress-
ful life events he or she has experienced. Your best prediction in this case would be the
mean value of lnSymptoms^8 ( ) (averaged across all subjects), and the error associatedY
258 Chapter 9 Correlation and Regression
4.2
0 1020304050 60
4.4
4.6
4.8
5.0
Stress
InSymptoms
Figure 9.3 A scatterplot of lnSymptoms as a function of Stress with a smoothed
regression line superimposed
(^8) Rather than constantly repeating “log of symptoms,” I will refer to symptoms with the understanding that I am
referring to the log transformed values.