400 Statistical Methods
squares (the sum of squared deviations from the mean) into two parts: a part
attributed to differences between the groups and a part due to random error
or random chance. To see how this is done, recall that the formula for the
total sum of squares is
Total SS (^5) a
n
i 511 yi^2 y^2
2
Here, the total number of observations is n, and the average of all obser-
vations is y. The value for the hotel data is 933,747.9 and is shown in cell
B16. The sample average (not shown) is 272.5625.
Let’s express the total SS in a different way. We’ll break the calculation
down by the various groups. Assume that there are a total of P groups and
that the size of each group is ni (groups need not be equal in size, so ni
would indicate the sample size of the ith group), and we calculate the total
sum of squares for each group separately. We can write this as
Total SS (^5) a
P
i 51 a
ni
j 51
(^1) yij 2 y 22
Here, yij identifi es the jth observation from the ith group (for example,
y 23 would mean the third observation from the second group). Notice that
we haven’t changed the value; all we’ve done is specify the order in which
we’ll calculate the total sum of squares. We’ll calculate the sum of squares
in the fi rst group, and then in the second and so forth, adding up all of the
sums of squares in each group to arrive at the total sum of squares.
Next we’ll calculate the sample average for each group, labeling it yi,
which is the sample average of the ith group. For example, in the hotel data,
the values (shown in cells D5:D8) are
NYC 481.125
CHI 227.625
DEN 181.125
SF 200.375
Using the group averages, we can calculate the total sum of squares within
each group. This is equal to the sum of the squared deviations, where the
deviation is from each observation to its group average. We’ll call this value
the error sum of squares, or SSE, and express it as
SSE (^5) a
P
i 51 a
ni
j 51
(^1) yij 2 yi 22
Another term for this value is the within-groups sum of squares because
it is the sum of squares within each group. The value for SSE in the hotel
data is 461,031.5 (shown in cell B14).
The final piece of the analysis of variance is to calculate the sum of
squares between each of the group averages and the overall average. This
value, called the between-groups sum of squares and otherwise known as
the treatment sum of squares, or SST, is
SST (^5) a
P
i 51 ni^1 yi^2 y^2
2