Python for Finance: Analyze Big Financial Data

No2 0.049462 No3 0.570157 No4 -0.327594 dtype: float64 In [ 29 ]: df.cumsum() Out[29]: No1 No2 No3 No4 2015-01-31 -0.737304 1.065173 0.073406 1.301174 2015-02-28 -1.526122 0.079354 0.477201 -0.452609 2015-03-31 -1.682003 -1.673318 1.514645 -0.853403 2015-04-30 -2.459549 0.056960 1.931759 -0.669323 2015-05-31 -4.223209 -0.318508 2.030438 -2.223147 2015-06-30 -5.357467 1.083313 3.257562 -1.243758 2015-07-31 -4.898629 0.940126 4.823263 -3.329621 2015-08-31 -5.001687 0.573956 4.345227 -3.362430 2015-09-30 -3.961370 0.445156 5.131414 -2.948346

There is also a shortcut to a number of often-used statistics for numerical data sets, the

describe method:

In [ 30 ]: df.describe() Out[30]: No1 No2 No3 No4 count 9.000000 9.000000 9.000000 9.000000 mean -0.440152 0.049462 0.570157 -0.327594 std 0.847907 1.141676 0.642904 1.219345 min -1.763660 -1.752672 -0.478036 -2.085863 25% -0.788818 -0.375469 0.098678 -1.553824 50% -0.737304 -0.143187 0.417114 -0.032810 75% -0.103058 1.065173 1.037444 0.414084 max 1.040318 1.730278 1.565701 1.301174

You can also apply the majority of NumPy universal functions to DataFrame objects:

In [ 31 ]: np.sqrt(df) Out[31]: No1 No2 No3 No4 2015-01-31 NaN 1.032072 0.270935 1.140690 2015-02-28 NaN NaN 0.635449 NaN 2015-03-31 NaN NaN 1.018550 NaN 2015-04-30 NaN 1.315400 0.645844 0.429045 2015-05-31 NaN NaN 0.314131 NaN 2015-06-30 NaN 1.183985 1.107756 0.989641 2015-07-31 0.677376 NaN 1.251280 NaN 2015-08-31 NaN NaN NaN NaN 2015-09-30 1.019960 NaN 0.886672 0.643494

NUMPY UNIVERSAL FUNCTIONS

In general, you can apply NumPy universal functions to pandas DataFrame objects whenever they could be applied

to an ndarray object containing the same data.

pandas is quite error tolerant, in the sense that it captures errors and just puts a NaN value

where the respective mathematical operation fails. Not only this, but as briefly shown

already, you can also work with such incomplete data sets as if they were complete in a

number of cases:

In [ 32 ]: np.sqrt(df).sum() Out[32]: No1 1.697335 No2 3.531458 No3 6.130617 No4 3.202870 dtype: float64

In such cases, pandas just leaves out the NaN values and only works with the other

available values. Plotting of data is also only one line of code away in general (cf.

Figure 6-1):

In [ 33 ]: %matplotlib inline df.cumsum().plot(lw=2.0)