556 Chapter 9. Essential Statistics for Data Analysis
our task is to find their best values. By best values we mean the ones for which each
of the data points is as close to the line as possible. To quantify this statement we
first note that the value ofyat anyxiisyiand the value of the straight line at that
point ismxi+c. The difference between the two at each data point, which can be
calledresidual,isthengivenby
R=mxi+c−yi. (9.7.2)
This residual should be minimum for each data point. In other words the sum of
the residuals should be a minimum. However there is a problem with this scheme,
namely the summed residuals would turn out to be zero since the positive residuals
would cancel out the negative ones. To overcome this problem one can minimize the
sum of thesquaredresiduals instead. That is, we can demand that
χ^2 =
∑
(mxi+c−yi)^2 (9.7.3)
is minimum. Here we have represented the sum of the squared residuals byχ^2 as
it is the most commonly used notation for this quantity. Now we need to minimize
this function with respect tocandm. Since the differential of a function vanishes
at its minimum, therefore we have the following two conditions.
∂χ^2
∂c
=0=
∂
∂c
[∑
(mxi+c−yi)^2
]
(9.7.4)
∂χ^2
∂m
=0=
∂
∂m
[∑
(mxi+c−yi)^2
]
(9.7.5)
Performing these differentiations gives
∑
(mxi+c−yi) = 0 (9.7.6)
and
∑
(mxi+c−yi)xi =0. (9.7.7)
Now, we have two equations, which we can solve to determine the requiredcand
m. A few algebraic manipulations finally yield
m =
∑
xiyi−
∑
xi
∑
yi
∑
x^2 i−(xi)^2
(9.7.8)
and c =
∑
yi
∑
x^2 i−
∑
xi
∑
xiyi
∑
x^2 i−(xi)^2
. (9.7.9)
Note that these are the values for which the sum of the squared residuals is minimum.
In other words, these values represent a line that is the best fit to the data.
It is evident that for a large dataset the computations involved to determine the
best fit are enormous. Therefore normally one uses computer codes to perform the
regression analysis. Luckily enough now most standard statistical analysis packages
have built-in routines that can handle linear as well as more complicated regressions.
9.7.B NonlinearRegression.......................
By nonlinear regression we mean fitting any nonlinear function to the data. This
could be a polynomial of the order 2 or more, an exponential, a logarithmic, a