Statistical Methods for Psychology

(Michael S) #1
In this equation, SSY, which you know to be equal to , is the sum of squares of Y
and represents the totals of


  1. The part of the sum of squares of Ythat is related to

  2. The part of the sum of squares of Ythat is independent of X[i.e., ]
    In the context of our example, we are talking about that part of the number of symptoms
    people exhibited that is related to how many stressful life events they had experienced, and
    that part that is related to other things. The quantity is the sum of squares of Ythat is
    independent of Xand is a measure of the amount of error remaining even after we use Xto
    predict Y. These concepts can be made clearer with a second example.
    Suppose we were interested in studying the relationship between amount of cigarette
    smoking (X) and age at death (Y). As we watch people die over time, we notice several
    things. First, we see that not all die at precisely the same age. There is variability in age at
    death regardless of smoking behavior, and this variability is measured by
    . We also notice that some people smoke more than others. This variabil-
    ity in smoking regardless of age at death is measured by. We further
    find that cigarette smokers tend to die earlier than nonsmokers, and heavy smokers earlier
    than light smokers. Thus, we write a regression equation to predict Yfrom X. Since people
    differ in their smoking behavior, they will also differ in their predictedlife expectancy ( ),
    and we will label this variability This last measure is variability in Y
    that is directly attributable to variability in X, since different values of arise from differ-
    ent values of Xand the same values of arise from the same value of X—that is, does
    not vary unless Xvaries.
    We have one last source of variability: the variability in the life expectancy of those
    people who smoke exactly the same amount. This is measured by and is the vari-
    ability in Ythat cannot be explained by the variability in X(since these people do not differ
    in the amount they smoke). These several sources of variability (sums of squares) are sum-
    marized in Table 9.5.
    If we considered the absurd extreme in which all of the nonsmokers die at exactly age
    72 and all of the smokers smoke precisely the same amount and die at exactly age 68, then
    all of the variability in life expectancy is directly predictable from variability in smoking
    behavior. If you smoke you will die at 68, and if you don’t you will die at 72. Here
    , and
    As a more realistic example, assume smokers tend to die earlier than nonsmokers, but
    within each group there is a certain amount of variability in life expectancy. This is a situa-
    tion in which some of is attributable to smoking ( ) and some is not ( ).
    What we want to be able to do is to specify what percentageof the overall variability in


SSY SSYN SSresidual

SSYN=SSY SSresidual=0.

SSresidual

YN YN


YN


SSYN=g(YN^2 Y)^2.

YN


SSX=g(X 2 X)^2

SSY=g(Y 2 Y)^2

SSresidual

SSresidual

X 3 i.e., SSY(r^2 ) 4

g(Y 2 Y)^2

262 Chapter 9 Correlation and Regression


Table 9.5 Sources of variance in regression for the study of smoking and life
expectancy

SSX 5 variability in amount smoked 5
SSY 5 variability in life expectancy 5
5 variability in life expectancy directly attributable to variability in
smoking behavior 5
SSresidual 5 variability in life expectancy that cannot be attributed to variability in
smoking behavior 5 g(Y 2 YN)^2 =SSY 2 SSYN

g(YN 2 Y)^2

SSYN


g(Y 2 Y)^2

g(X 2 X)^2
Free download pdf