In [ 21 ]: np.sum((f(x) - ry) ** 2 ) / len(x)
Out[21]: 2.2749084503102031e-31
In fact, the minimization routine recovers the correct parameters of 1 for the sin part and
0.5 for the linear part:
In [ 22 ]: reg
Out[22]: array([ 1.55428020e-16, 5.00000000e-01, 0.00000000e+00,
1.00000000e+00])
Noisy data
Regression can cope equally well with noisy data, be it data from simulation or from (non-
perfect) measurements. To illustrate this point, let us generate both independent
observations with noise and also dependent observations with noise:
In [ 23 ]: xn = np.linspace(- 2 * np.pi, 2 * np.pi, 50 )
xn = xn + 0.15 * np.random.standard_normal(len(xn))
yn = f(xn) + 0.25 * np.random.standard_normal(len(xn))
The very regression is the same:
In [ 24 ]: reg = np.polyfit(xn, yn, 7 )
ry = np.polyval(reg, xn)
Figure 9-7 reveals that the regression results are closer to the original function than the
noisy data points. In a sense, the regression averages out the noise to some extent:
In [ 25 ]: plt.plot(xn, yn, ‘b^’, label=‘f(x)’)
plt.plot(xn, ry, ‘ro’, label=‘regression’)
plt.legend(loc= 0 )
plt.grid(True)
plt.xlabel(‘x’)
plt.ylabel(‘f(x)’)
Figure 9-7. Regression with noisy data
Unsorted data
Another important aspect of regression is that the approach also works seamlessly with
unsorted data. The previous examples all rely on sorted x data. This does not have to be
the case. To make the point, let us randomize the independent data points as follows:
In [ 26 ]: xu = np.random.rand( 50 ) * 4 * np.pi - 2 * np.pi
yu = f(xu)
In this case, you can hardly identify any structure by just visually inspecting the raw data:
In [ 27 ]: print xu[: 10 ].round( 2 )
print yu[: 10 ].round( 2 )