pandas Basics
In a sense, pandas is built “on top” of NumPy. So, for example, NumPy universal functions
will generally work on pandas objects as well. We therefore import both to begin with:
In [ 1 ]: import numpy as np
import pandas as pd
First Steps with DataFrame Class
On a rather fundamental level, the DataFrame class is designed to manage indexed and
labeled data, not too different from a SQL database table or a worksheet in a spreadsheet
application. Consider the following creation of a DataFrame object:
In [ 2 ]: df = pd.DataFrame([ 10 , 20 , 30 , 40 ], columns=[‘numbers’],
index=[‘a’, ‘b’, ‘c’, ‘d’])
df
Out[2]: numbers
a 10
b 20
c 30
d 40
This simple example already shows some major features of the DataFrame class when it
comes to storing data:
Data
Data itself can be provided in different shapes and types (list, tuple, ndarray, and
dict objects are candidates).
Labels
Data is organized in columns, which can have custom names.
Index
There is an index that can take on different formats (e.g., numbers, strings, time
information).
Working with such a DataFrame object is in general pretty convenient and efficient, e.g.,
compared to regular ndarray objects, which are more specialized and more restricted
when you want to do something link enlarge an existing object. The following are simple
examples showing how typical operations on a DataFrame object work:
In [ 3 ]: df.index # the index values
Out[3]: Index([u’a’, u’b’, u’c’, u’d’], dtype=‘object’)
In [ 4 ]: df.columns # the column names
Out[4]: Index([u’numbers’], dtype=‘object’)
In [ 5 ]: df.ix[‘c’] # selection via index
Out[5]: numbers 30
Name: c, dtype: int64
In [ 6 ]: df.ix[[‘a’, ‘d’]] # selection of multiple indices
Out[6]: numbers
a 10
d 40
In [ 7 ]: df.ix[df.index[ 1 : 3 ]] # selection via Index object
Out[7]: numbers
b 20
c 30
In [ 8 ]: df.sum() # sum per column