8.1.6 Analysis-of-Variance Approach
The variation inY is conventionally measured in terms of the deviations
ðYiYÞ; the total variation, denoted by SST, is the sum of squared deviations:
SST¼
X
ðYiYÞ^2
For example, SST¼0 when all observations are the same. SST is the numera-
tor of the sample variance ofY. The larger the SST, the greater the variation
amongYvalues.
When we use the regression approach, the variation inYis decomposed into
two components:
YiY¼ðYiYY^iÞþðYY^iYÞ
- The first term reflects thevariation around the regression line; the part
that cannot be explained by the regression itself with the sum of squared
deviations:
SSE¼
X
ðYiYY^iÞ^2
called theerror sum of squares.
- The di¤erence between the two sums of squares,
SSR¼SSTSSE
¼
X
ðYY^iYÞ^2
is called theregression sum of squares. SSR may be considered a measure
of the variation inYassociated with the regression line. In fact, we can
express the coe‰cient of determination as
r^2 ¼
SSR
SST
Corresponding to the partitioning of the total sum of squares SST, there is
partitioning of the associated degrees of freedom (df). We haven1 degrees of
freedom associated with SST, the denominator of the sample variance ofY;
SSR has 1 degree of freedom representing the slope, the remainingn2 are
associated with SSE. These results lead to the usual presentation of regression
analysis by most computer programs:
- Theerror mean square,
MSE¼
SSE
n 2
292 CORRELATION AND REGRESSION