Python for Finance: Analyze Big Financial Data

In [ 10 ]: plt.plot(x, f(x), ‘b’, label=‘f(x)’) plt.plot(x, ry, ‘r.’, label=‘regression’) plt.legend(loc= 0 ) plt.grid(True) plt.xlabel(‘x’) plt.ylabel(‘f(x)’)

Figure 9-4. Regression with monomials up to order 7

A brief check reveals that the result is not perfect:

In [ 11 ]: np.allclose(f(x), ry) Out[11]: False

However, the mean squared error (MSE) is not too large — at least, over this narrow range

of x values:

In [ 12 ]: np.sum((f(x) - ry) ** 2 ) / len(x) Out[12]: 0.0017769134759517413

Individual basis functions

In general, you can reach better regression results when you can choose better sets of basis

functions, e.g., by exploiting knowledge about the function to approximate. In this case,

the individual basis functions have to be defined via a matrix approach (i.e., using a NumPy

ndarray object). First, the case with monomials up to order 3:

In [ 13 ]: matrix = np.zeros(( 3 + 1 , len(x))) matrix[ 3 , :] = x ** 3 matrix[ 2 , :] = x ** 2 matrix[ 1 , :] = x matrix[ 0 , :] = 1

The sublibrary numpy.linalg provides the function lstsq to solve least-squares

optimization problems like the one in Equation 9-1:

In [ 14 ]: reg = np.linalg.lstsq(matrix.T, f(x))[ 0 ]

Applying lstsq to our problem in this way yields the optimal parameters for the single

basis functions:

In [ 15 ]: reg Out[15]: array([ 1.13968447e-14, 5.62777448e-01, -8.88178420e-16, -5.43553615e-03])

To get the regression estimates we apply the dot function to the reg and matrix arrays.

Figure 9-5 shows the result. np.dot(a, b) simply gives the dot product for the two arrays a

and b:

In [ 16 ]: ry = np.dot(reg, matrix)