Introductory Biostatistics

(Chris Devlin) #1

8.1.6 Analysis-of-Variance Approach


The variation inY is conventionally measured in terms of the deviations
ðYiYÞ; the total variation, denoted by SST, is the sum of squared deviations:


SST¼

X


ðYiYÞ^2

For example, SST¼0 when all observations are the same. SST is the numera-
tor of the sample variance ofY. The larger the SST, the greater the variation
amongYvalues.
When we use the regression approach, the variation inYis decomposed into
two components:


YiY¼ðYiYY^iÞþðYY^iYÞ


  1. The first term reflects thevariation around the regression line; the part
    that cannot be explained by the regression itself with the sum of squared
    deviations:


SSE¼

X


ðYiYY^iÞ^2

called theerror sum of squares.


  1. The di¤erence between the two sums of squares,


SSR¼SSTSSE

¼

X


ðYY^iYÞ^2

is called theregression sum of squares. SSR may be considered a measure
of the variation inYassociated with the regression line. In fact, we can
express the coe‰cient of determination as

r^2 ¼

SSR


SST


Corresponding to the partitioning of the total sum of squares SST, there is
partitioning of the associated degrees of freedom (df). We haven1 degrees of
freedom associated with SST, the denominator of the sample variance ofY;
SSR has 1 degree of freedom representing the slope, the remainingn2 are
associated with SSE. These results lead to the usual presentation of regression
analysis by most computer programs:



  1. Theerror mean square,


MSE¼


SSE


n 2

292 CORRELATION AND REGRESSION

Free download pdf