Python for Finance: Analyze Big Financial Data

(Elle) #1

Structured Arrays

The specialization of the numpy.ndarray class obviously brings a number of really

valuable benefits with it. However, a too-narrow specialization might turn out to be too

large a burden to carry for the majority of array-based algorithms and applications.

Therefore, NumPy provides structured arrays that allow us to have different NumPy data

types per column, at least. What does “per column” mean? Consider the following

initialization of a structured array object:

In  [ 116 ]:    dt  =   np.dtype([(‘Name’,  ‘S10’), (‘Age’, ‘i4’),
(‘Height’, ‘f’), (‘Children/Pets’, ‘i4’, 2 )])
s = np.array([(‘Smith’, 45 , 1.83, ( 0 , 1 )),
(‘Jones’, 53 , 1.72, ( 2 , 2 ))], dtype=dt)
Out[116]: array([(‘Smith’, 45, 1.8300000429153442, [0, 1]),
(‘Jones’, 53, 1.7200000286102295, [2, 2])],
dtype=[(‘Name’, ‘S10’), (‘Age’, ‘<i4’), (‘Height’, ‘<f4’), (‘Chi
ldren/Pets’, ‘<i4’, (2,))])

In a sense, this construction comes quite close to the operation for initializing tables in a

SQL database. We have column names and column data types, with maybe some additional

information (e.g., maximum number of characters per string object). The single columns

can now be easily accessed by their names:

In  [ 117 ]:    s[‘Name’]
Out[117]: array([‘Smith’, ‘Jones’],
In [ 118 ]: s[‘Height’].mean()
Out[118]: 1.7750001

Having selected a specific row and record, respectively, the resulting objects mainly

behave like dict objects, where one can retrieve values via keys:

In  [ 119 ]:    s[ 1 ][‘Age’]
Out[119]: 53

In summary, structured arrays are a generalization of the regular numpy.ndarray object

types in that the data type only has to be the same per column, as one is used to in the

context of tables in SQL databases. One advantage of structured arrays is that a single

element of a column can be another multidimensional object and does not have to conform

to the basic NumPy data types.


NumPy provides, in addition to regular arrays, structured arrays that allow the description and handling of rather

complex array-oriented data structures with a variety of different data types and even structures per (named)

column. They bring SQL table-like data structures to Python, with all the benefits of regular numpy.ndarray

objects (syntax, methods, performance).
Free download pdf