2015-03-31 -0.155881
2015-04-30 -0.777546
2015-05-31 -1.763660
2015-06-30 -1.134258
2015-07-31 0.458838
2015-08-31 -0.103058
2015-09-30 1.040318
Freq: M, Name: No1, dtype: float64
In [ 36 ]: type(df[‘No1’])
Out[36]: pandas.core.series.Series
The main DataFrame methods are available for Series objects as well, and we can, for
instance, plot the results as before (cf. Figure 6-2):
In [ 37 ]: import matplotlib.pyplot as plt
df[‘No1’].cumsum().plot(style=‘r’, lw=2.)
plt.xlabel(‘date’)
plt.ylabel(‘value’)
Figure 6-2. Line plot of a Series object
GroupBy Operations
pandas has powerful and flexible grouping capabilities. They work similarly to grouping
in SQL as well as pivot tables in Microsoft Excel. To have something to group by, we add a
column indicating the quarter the respective data of the index belongs to:
In [ 38 ]: df[‘Quarter’] = [‘Q1’, ‘Q1’, ‘Q1’, ‘Q2’, ‘Q2’, ‘Q2’, ‘Q3’, ‘Q3’, ‘Q3’]
df
Out[38]: No1 No2 No3 No4 Quarter
2015-01-31 -0.737304 1.065173 0.073406 1.301174 Q1
2015-02-28 -0.788818 -0.985819 0.403796 -1.753784 Q1
2015-03-31 -0.155881 -1.752672 1.037444 -0.400793 Q1
2015-04-30 -0.777546 1.730278 0.417114 0.184079 Q2
2015-05-31 -1.763660 -0.375469 0.098678 -1.553824 Q2
2015-06-30 -1.134258 1.401821 1.227124 0.979389 Q2
2015-07-31 0.458838 -0.143187 1.565701 -2.085863 Q3
2015-08-31 -0.103058 -0.366170 -0.478036 -0.032810 Q3
2015-09-30 1.040318 -0.128799 0.786187 0.414084 Q3
Now, we can group by the “Quarter” column and can output statistics for the single
groups:
In [ 39 ]: groups = df.groupby(‘Quarter’)
For example, we can easily get the mean, max, and size of every group bucket as follows:
In [ 40 ]: groups.mean()
Out[40]: No1 No2 No3 No4
Quarter
Q1 -0.560668 -0.557773 0.504882 -0.284468