Python for Finance: Analyze Big Financial Data

2015-03-31 -0.155881

2015-04-30 -0.777546

2015-05-31 -1.763660

2015-06-30 -1.134258

2015-07-31 0.458838

2015-08-31 -0.103058

2015-09-30 1.040318

Freq: M, Name: No1, dtype: float64 In [ 36 ]: type(df[‘No1’]) Out[36]: pandas.core.series.Series

The main DataFrame methods are available for Series objects as well, and we can, for

instance, plot the results as before (cf. Figure 6-2):

In [ 37 ]: import matplotlib.pyplot as plt df[‘No1’].cumsum().plot(style=‘r’, lw=2.) plt.xlabel(‘date’) plt.ylabel(‘value’)

Figure 6-2. Line plot of a Series object

GroupBy Operations

pandas has powerful and flexible grouping capabilities. They work similarly to grouping

in SQL as well as pivot tables in Microsoft Excel. To have something to group by, we add a

column indicating the quarter the respective data of the index belongs to:

In [ 38 ]: df[‘Quarter’] = [‘Q1’, ‘Q1’, ‘Q1’, ‘Q2’, ‘Q2’, ‘Q2’, ‘Q3’, ‘Q3’, ‘Q3’] df Out[38]: No1 No2 No3 No4 Quarter 2015-01-31 -0.737304 1.065173 0.073406 1.301174 Q1 2015-02-28 -0.788818 -0.985819 0.403796 -1.753784 Q1 2015-03-31 -0.155881 -1.752672 1.037444 -0.400793 Q1 2015-04-30 -0.777546 1.730278 0.417114 0.184079 Q2 2015-05-31 -1.763660 -0.375469 0.098678 -1.553824 Q2 2015-06-30 -1.134258 1.401821 1.227124 0.979389 Q2 2015-07-31 0.458838 -0.143187 1.565701 -2.085863 Q3 2015-08-31 -0.103058 -0.366170 -0.478036 -0.032810 Q3 2015-09-30 1.040318 -0.128799 0.786187 0.414084 Q3

Now, we can group by the “Quarter” column and can output statistics for the single

groups:

In [ 39 ]: groups = df.groupby(‘Quarter’)

For example, we can easily get the mean, max, and size of every group bucket as follows:

In [ 40 ]: groups.mean() Out[40]: No1 No2 No3 No4 Quarter Q1 -0.560668 -0.557773 0.504882 -0.284468