Introductory Biostatistics

8.1.6 Analysis-of-Variance Approach

The variation inY is conventionally measured in terms of the deviations
ðYiYÞ; the total variation, denoted by SST, is the sum of squared deviations:

SST¼

X

ðYiYÞ^2

For example, SST¼0 when all observations are the same. SST is the numera-
tor of the sample variance ofY. The larger the SST, the greater the variation
amongYvalues.
When we use the regression approach, the variation inYis decomposed into
two components:

YiY¼ðYiYY^iÞþðYY^iYÞ

The first term reflects thevariation around the regression line; the part
that cannot be explained by the regression itself with the sum of squared
deviations:

SSE¼

X

ðYiYY^iÞ^2

called theerror sum of squares.

The di¤erence between the two sums of squares,

SSR¼SSTSSE

¼

X

ðYY^iYÞ^2

is called theregression sum of squares. SSR may be considered a measure of the variation inYassociated with the regression line. In fact, we can express the coe‰cient of determination as

r^2 ¼

SSR

SST

Corresponding to the partitioning of the total sum of squares SST, there is
partitioning of the associated degrees of freedom (df). We haven1 degrees of
freedom associated with SST, the denominator of the sample variance ofY;
SSR has 1 degree of freedom representing the slope, the remainingn2 are
associated with SSE. These results lead to the usual presentation of regression
analysis by most computer programs:

Theerror mean square,

MSE¼

SSE

n 2

292 CORRELATION AND REGRESSION

Introductory Biostatistics

X

X

X

SSR

SST

MSE¼

SSE

Get our desktop app

Company

Features

Documentation

Resources