Python for Finance: Analyze Big Financial Data

(Elle) #1
                                    No2             0.049462
No3 0.570157
No4 -0.327594
dtype: float64
In [ 29 ]: df.cumsum()
Out[29]: No1 No2 No3 No4
2015-01-31 -0.737304 1.065173 0.073406 1.301174
2015-02-28 -1.526122 0.079354 0.477201 -0.452609
2015-03-31 -1.682003 -1.673318 1.514645 -0.853403
2015-04-30 -2.459549 0.056960 1.931759 -0.669323
2015-05-31 -4.223209 -0.318508 2.030438 -2.223147
2015-06-30 -5.357467 1.083313 3.257562 -1.243758
2015-07-31 -4.898629 0.940126 4.823263 -3.329621
2015-08-31 -5.001687 0.573956 4.345227 -3.362430
2015-09-30 -3.961370 0.445156 5.131414 -2.948346

There is also a shortcut to a number of often-used statistics for numerical data sets, the


describe method:


In  [ 30 ]: df.describe()
Out[30]: No1 No2 No3 No4
count 9.000000 9.000000 9.000000 9.000000
mean -0.440152 0.049462 0.570157 -0.327594
std 0.847907 1.141676 0.642904 1.219345
min -1.763660 -1.752672 -0.478036 -2.085863
25% -0.788818 -0.375469 0.098678 -1.553824
50% -0.737304 -0.143187 0.417114 -0.032810
75% -0.103058 1.065173 1.037444 0.414084
max 1.040318 1.730278 1.565701 1.301174

You can also apply the majority of NumPy universal functions to DataFrame objects:


In  [ 31 ]: np.sqrt(df)
Out[31]: No1 No2 No3 No4
2015-01-31 NaN 1.032072 0.270935 1.140690
2015-02-28 NaN NaN 0.635449 NaN
2015-03-31 NaN NaN 1.018550 NaN
2015-04-30 NaN 1.315400 0.645844 0.429045
2015-05-31 NaN NaN 0.314131 NaN
2015-06-30 NaN 1.183985 1.107756 0.989641
2015-07-31 0.677376 NaN 1.251280 NaN
2015-08-31 NaN NaN NaN NaN
2015-09-30 1.019960 NaN 0.886672 0.643494

NUMPY UNIVERSAL FUNCTIONS

In general, you can apply NumPy universal functions to pandas DataFrame objects whenever they could be applied

to an ndarray object containing the same data.

pandas is quite error tolerant, in the sense that it captures errors and just puts a NaN value


where the respective mathematical operation fails. Not only this, but as briefly shown


already, you can also work with such incomplete data sets as if they were complete in a


number of cases:


In  [ 32 ]: np.sqrt(df).sum()
Out[32]: No1 1.697335
No2 3.531458
No3 6.130617
No4 3.202870
dtype: float64

In such cases, pandas just leaves out the NaN values and only works with the other


available values. Plotting of data is also only one line of code away in general (cf.


Figure 6-1):


In  [ 33 ]: %matplotlib inline
df.cumsum().plot(lw=2.0)
Free download pdf