variables in the candidate model. The following SAS code produces these statistics for
the two variables SMATHS and MATHS.
proc means maxdec=3 fw=10 n nmiss min max range mean
std stderr
uss CSS t prt;
var maths smaths;
run;
The SAS output is shown:
Variable Label N Nmiss Minimum Maximum Range
MATHS 10 0 1.000 10.000 9.000
SMATHS 10 0 95.000 133.000 38.000
Variable Label Mean Std Dev Std Error
MATHS 5.600 2.591 0.819
SMATHS 115.400 10.700 3.384
Variable Label USS CSS T Prob>|T|
MATHS 374.000 60.400 6.836 0.0001
SMATHS 134202.000 1030.400 34.105 0.0001
Notice under the heading CSS that the corrected sums of squares for both the response
variable SMATHS and the explanatory variable MATHS correspond with the values for
the sums of squares for Y and X computed in step 1 of the worked example (see pp. 265–
6).
An important regression assumption to check is whether there is a linear trend between
the independent and response variable. A separate plot of the response variable SMATHS
against the explanatory variable MATHS could be performed but this output can also be
produced by PROC REG which is more convenient and has the advantage of allowing an
overlay plot to be produced. The overlay plot which is a plot of response against
explanatory variable—with a plot of predicted scores against the explanatory variable
overlayed—gives a visual indication of model fit by showing the extent to which the
observed scores are scattered about the fitted regression line (indicated by the plot of
predicted values).
The SAS code that generates the regression output and the overlay plot is:
proc reg;
model smaths=maths /p r clm cli;
output out=outreg p=p r=r;
id id;
run;
proc plot data=outreg vpercent=75 hpercent=75;
plot smaths*maths='#' p*maths='*' / overlay;
title1 'Plot of Observed response var vs Independent
var (#) and';
title2 'Predicted values vs Independent var (*)';
run;
Inferences involving continuous data 269