Python for Finance: Analyze Big Financial Data

In [ 21 ]: np.sum((f(x) - ry) ** 2 ) / len(x) Out[21]: 2.2749084503102031e-31

In fact, the minimization routine recovers the correct parameters of 1 for the sin part and

0.5 for the linear part:

In [ 22 ]: reg Out[22]: array([ 1.55428020e-16, 5.00000000e-01, 0.00000000e+00, 1.00000000e+00])

Noisy data

Regression can cope equally well with noisy data, be it data from simulation or from (non-

perfect) measurements. To illustrate this point, let us generate both independent

observations with noise and also dependent observations with noise:

In [ 23 ]: xn = np.linspace(- 2 * np.pi, 2 * np.pi, 50 ) xn = xn + 0.15 * np.random.standard_normal(len(xn)) yn = f(xn) + 0.25 * np.random.standard_normal(len(xn))

The very regression is the same:

In [ 24 ]: reg = np.polyfit(xn, yn, 7 ) ry = np.polyval(reg, xn)

Figure 9-7 reveals that the regression results are closer to the original function than the

noisy data points. In a sense, the regression averages out the noise to some extent:

In [ 25 ]: plt.plot(xn, yn, ‘b^’, label=‘f(x)’) plt.plot(xn, ry, ‘ro’, label=‘regression’) plt.legend(loc= 0 ) plt.grid(True) plt.xlabel(‘x’) plt.ylabel(‘f(x)’)

Figure 9-7. Regression with noisy data

Unsorted data

Another important aspect of regression is that the approach also works seamlessly with

unsorted data. The previous examples all rely on sorted x data. This does not have to be

the case. To make the point, let us randomize the independent data points as follows:

In [ 26 ]: xu = np.random.rand( 50 ) * 4 * np.pi - 2 * np.pi yu = f(xu)

In this case, you can hardly identify any structure by just visually inspecting the raw data:

In [ 27 ]: print xu[: 10 ].round( 2 ) print yu[: 10 ].round( 2 )