Higher Engineering Mathematics

J

Statistics and probability

60

Linear regression

60.1 Introduction to linear regression

Regression analysis, usually termedregression,is
used to draw the line of ‘best fit’ through co-ordinates
on a graph. The techniques used enable a mathemat-
ical equation of the straight line formy=mx+cto
be deduced for a given set of co-ordinate values,
the line being such that the sum of the deviations of
the co-ordinate values from the line is a minimum,
i.e. it is the line of ‘best fit’. When a regression analy-
sis is made, it is possible to obtain two lines of best fit,
depending on which variable is selected as the depen-
dent variable and which variable is the independent
variable. For example, in a resistive electrical cir-
cuit, the current flowing is directly proportional to
the voltage applied to the circuit. There are two ways
of obtaining experimental values relating the current
and voltage. Either, certain voltages are applied to the
circuit and the current values are measured, in which
case the voltage is the independent variable and the
current is the dependent variable; or, the voltage can
be adjusted until a desired value of current is flowing
and the value of voltage is measured, in which case
the current is the independent value and the voltage
is the dependent value.

60.2 The least-squares regression lines

For a given set of co-ordinate values, (X 1 ,Y 1 ),
(X 2 ,Y 2 ),...,(Xn,Yn) let theXvalues be the inde-
pendent variables and theY-values be the dependent
values. Also letD 1 ,...,Dnbe the vertical distances
between the line shown asPQin Fig. 60.1 and the
points representing the co-ordinate values. The least-
squares regression line, i.e. the line of best fit, is the
line which makes the value ofD^21 +D^22 + ··· +D^2 n
a minimum value.
The equation of the least-squares regression line
is usually written asY=a 0 +a 1 X, wherea 0 is the
Y-axis intercept value and a 1 is the gradient of
the line (analogous toc andmin the equation
y=mx+c). The values ofa 0 anda 1 to make the
sum of the ‘deviations squared’ a minimum can be

(X 1 , Y 1 )

D 1 P

(X 2 , Y 2 )

D 2

H 3

H 4

Dn

(Xn, Yn) Q

Y

X

Figure 60.1

obtained from the two equations: ∑ Y=a 0 N+a 1

∑ X (1) ∑ (XY)=a 0

∑ X+a 1

∑ X^2 (2)

whereXandYare the co-ordinate values,Nis the number of co-ordinates anda 0 anda 1 are called the regression coefficientsofYonX. Equations (1) and (2) are called thenormal equationsof the regression lines ofYonX. The regression line ofYon Xis used to estimate values ofYfor given values ofX. If theY-values (vertical-axis) are selected as the independent variables, the horizontal distances between the line shown asPQin Fig. 60.1 and the co-ordinate values (H 3 ,H 4 , etc.) are taken as the deviations. The equation of the regression line is of the form:X=b 0 +b 1 Yand the normal equations become: ∑ X=b 0 N+b 1

∑ Y (3) ∑ (XY)=b 0

∑ Y+b 1

∑ Y^2 (4)

whereXandYare the co-ordinate values,b 0 andb 1 are the regression coefficients ofXonYandNis the number of co-ordinates. These normal equations

Higher Engineering Mathematics

J

60

Linear regression

Get our desktop app

Company

Features

Documentation

Resources