In [ 10 ]: plt.plot(x, f(x), ‘b’, label=‘f(x)’)
plt.plot(x, ry, ‘r.’, label=‘regression’)
plt.legend(loc= 0 )
plt.grid(True)
plt.xlabel(‘x’)
plt.ylabel(‘f(x)’)
Figure 9-4. Regression with monomials up to order 7
A brief check reveals that the result is not perfect:
In [ 11 ]: np.allclose(f(x), ry)
Out[11]: False
However, the mean squared error (MSE) is not too large — at least, over this narrow range
of x values:
In [ 12 ]: np.sum((f(x) - ry) ** 2 ) / len(x)
Out[12]: 0.0017769134759517413
Individual basis functions
In general, you can reach better regression results when you can choose better sets of basis
functions, e.g., by exploiting knowledge about the function to approximate. In this case,
the individual basis functions have to be defined via a matrix approach (i.e., using a NumPy
ndarray object). First, the case with monomials up to order 3:
In [ 13 ]: matrix = np.zeros(( 3 + 1 , len(x)))
matrix[ 3 , :] = x ** 3
matrix[ 2 , :] = x ** 2
matrix[ 1 , :] = x
matrix[ 0 , :] = 1
The sublibrary numpy.linalg provides the function lstsq to solve least-squares
optimization problems like the one in Equation 9-1:
In [ 14 ]: reg = np.linalg.lstsq(matrix.T, f(x))[ 0 ]
Applying lstsq to our problem in this way yields the optimal parameters for the single
basis functions:
In [ 15 ]: reg
Out[15]: array([ 1.13968447e-14, 5.62777448e-01, -8.88178420e-16,
-5.43553615e-03])
To get the regression estimates we apply the dot function to the reg and matrix arrays.
Figure 9-5 shows the result. np.dot(a, b) simply gives the dot product for the two arrays a
and b:
In [ 16 ]: ry = np.dot(reg, matrix)