Python for Finance: Analyze Big Financial Data

(Elle) #1

Structured Arrays


The specialization of the numpy.ndarray class obviously brings a number of really


valuable benefits with it. However, a too-narrow specialization might turn out to be too


large a burden to carry for the majority of array-based algorithms and applications.


Therefore, NumPy provides structured arrays that allow us to have different NumPy data


types per column, at least. What does “per column” mean? Consider the following


initialization of a structured array object:


In  [ 116 ]:    dt  =   np.dtype([(‘Name’,  ‘S10’), (‘Age’, ‘i4’),
(‘Height’, ‘f’), (‘Children/Pets’, ‘i4’, 2 )])
s = np.array([(‘Smith’, 45 , 1.83, ( 0 , 1 )),
(‘Jones’, 53 , 1.72, ( 2 , 2 ))], dtype=dt)
s
Out[116]: array([(‘Smith’, 45, 1.8300000429153442, [0, 1]),
(‘Jones’, 53, 1.7200000286102295, [2, 2])],
dtype=[(‘Name’, ‘S10’), (‘Age’, ‘<i4’), (‘Height’, ‘<f4’), (‘Chi
ldren/Pets’, ‘<i4’, (2,))])

In a sense, this construction comes quite close to the operation for initializing tables in a


SQL database. We have column names and column data types, with maybe some additional


information (e.g., maximum number of characters per string object). The single columns


can now be easily accessed by their names:


In  [ 117 ]:    s[‘Name’]
Out[117]: array([‘Smith’, ‘Jones’],
dtype=’|S10’)
In [ 118 ]: s[‘Height’].mean()
Out[118]: 1.7750001

Having selected a specific row and record, respectively, the resulting objects mainly


behave like dict objects, where one can retrieve values via keys:


In  [ 119 ]:    s[ 1 ][‘Age’]
Out[119]: 53

In summary, structured arrays are a generalization of the regular numpy.ndarray object


types in that the data type only has to be the same per column, as one is used to in the


context of tables in SQL databases. One advantage of structured arrays is that a single


element of a column can be another multidimensional object and does not have to conform


to the basic NumPy data types.


STRUCTURED ARRAYS

NumPy provides, in addition to regular arrays, structured arrays that allow the description and handling of rather

complex array-oriented data structures with a variety of different data types and even structures per (named)

column. They bring SQL table-like data structures to Python, with all the benefits of regular numpy.ndarray

objects (syntax, methods, performance).
Free download pdf