Python for Finance: Analyze Big Financial Data

(Elle) #1

pandas Basics


In a sense, pandas is built “on top” of NumPy. So, for example, NumPy universal functions


will generally work on pandas objects as well. We therefore import both to begin with:


In  [ 1 ]:  import numpy as np
import pandas as pd

First Steps with DataFrame Class


On a rather fundamental level, the DataFrame class is designed to manage indexed and


labeled data, not too different from a SQL database table or a worksheet in a spreadsheet


application. Consider the following creation of a DataFrame object:


In  [ 2 ]:  df  =   pd.DataFrame([ 10 ,  20 ,    30 ,    40 ],  columns=[‘numbers’],
index=[‘a’, ‘b’, ‘c’, ‘d’])
df
Out[2]: numbers
a 10
b 20
c 30
d 40

This simple example already shows some major features of the DataFrame class when it


comes to storing data:


Data


Data itself can be provided in different shapes and types (list, tuple, ndarray, and


dict objects are candidates).


Labels


Data is organized in columns, which can have custom names.


Index


There is an index that can take on different formats (e.g., numbers, strings, time


information).


Working with such a DataFrame object is in general pretty convenient and efficient, e.g.,


compared to regular ndarray objects, which are more specialized and more restricted


when you want to do something link enlarge an existing object. The following are simple


examples showing how typical operations on a DataFrame object work:


In  [ 3 ]:  df.index #  the index   values
Out[3]: Index([u’a’, u’b’, u’c’, u’d’], dtype=‘object’)
In [ 4 ]: df.columns # the column names
Out[4]: Index([u’numbers’], dtype=‘object’)
In [ 5 ]: df.ix[‘c’] # selection via index
Out[5]: numbers 30
Name: c, dtype: int64
In [ 6 ]: df.ix[[‘a’, ‘d’]] # selection of multiple indices
Out[6]: numbers
a 10
d 40
In [ 7 ]: df.ix[df.index[ 1 : 3 ]] # selection via Index object
Out[7]: numbers
b 20
c 30
In [ 8 ]: df.sum() # sum per column
Free download pdf