J
Statistics and probability
60
Linear regression
60.1 Introduction to linear regression
Regression analysis, usually termedregression,is
used to draw the line of ‘best fit’ through co-ordinates
on a graph. The techniques used enable a mathemat-
ical equation of the straight line formy=mx+cto
be deduced for a given set of co-ordinate values,
the line being such that the sum of the deviations of
the co-ordinate values from the line is a minimum,
i.e. it is the line of ‘best fit’. When a regression analy-
sis is made, it is possible to obtain two lines of best fit,
depending on which variable is selected as the depen-
dent variable and which variable is the independent
variable. For example, in a resistive electrical cir-
cuit, the current flowing is directly proportional to
the voltage applied to the circuit. There are two ways
of obtaining experimental values relating the current
and voltage. Either, certain voltages are applied to the
circuit and the current values are measured, in which
case the voltage is the independent variable and the
current is the dependent variable; or, the voltage can
be adjusted until a desired value of current is flowing
and the value of voltage is measured, in which case
the current is the independent value and the voltage
is the dependent value.
60.2 The least-squares regression lines
For a given set of co-ordinate values, (X 1 ,Y 1 ),
(X 2 ,Y 2 ),...,(Xn,Yn) let theXvalues be the inde-
pendent variables and theY-values be the dependent
values. Also letD 1 ,...,Dnbe the vertical distances
between the line shown asPQin Fig. 60.1 and the
points representing the co-ordinate values. The least-
squares regression line, i.e. the line of best fit, is the
line which makes the value ofD^21 +D^22 + ··· +D^2 n
a minimum value.
The equation of the least-squares regression line
is usually written asY=a 0 +a 1 X, wherea 0 is the
Y-axis intercept value and a 1 is the gradient of
the line (analogous toc andmin the equation
y=mx+c). The values ofa 0 anda 1 to make the
sum of the ‘deviations squared’ a minimum can be
(X 1 , Y 1 )
D 1
P
(X 2 , Y 2 )
D 2
H 3
H 4
Dn
(Xn, Yn) Q
Y
X
Figure 60.1
obtained from the two equations:
∑
Y=a 0 N+a 1
∑
X (1)
∑
(XY)=a 0
∑
X+a 1
∑
X^2 (2)
whereXandYare the co-ordinate values,Nis the
number of co-ordinates anda 0 anda 1 are called the
regression coefficientsofYonX. Equations (1) and
(2) are called thenormal equationsof the regres-
sion lines ofYonX. The regression line ofYon
Xis used to estimate values ofYfor given values
ofX. If theY-values (vertical-axis) are selected as
the independent variables, the horizontal distances
between the line shown asPQin Fig. 60.1 and the
co-ordinate values (H 3 ,H 4 , etc.) are taken as the
deviations. The equation of the regression line is of
the form:X=b 0 +b 1 Yand the normal equations
become:
∑
X=b 0 N+b 1
∑
Y (3)
∑
(XY)=b 0
∑
Y+b 1
∑
Y^2 (4)
whereXandYare the co-ordinate values,b 0 andb 1
are the regression coefficients ofXonYandNis
the number of co-ordinates. These normal equations