Structured Arrays
The specialization of the numpy.ndarray class obviously brings a number of really
valuable benefits with it. However, a too-narrow specialization might turn out to be too
large a burden to carry for the majority of array-based algorithms and applications.
Therefore, NumPy provides structured arrays that allow us to have different NumPy data
types per column, at least. What does “per column” mean? Consider the following
initialization of a structured array object:
In [ 116 ]: dt = np.dtype([(‘Name’, ‘S10’), (‘Age’, ‘i4’),
(‘Height’, ‘f’), (‘Children/Pets’, ‘i4’, 2 )])
s = np.array([(‘Smith’, 45 , 1.83, ( 0 , 1 )),
(‘Jones’, 53 , 1.72, ( 2 , 2 ))], dtype=dt)
s
Out[116]: array([(‘Smith’, 45, 1.8300000429153442, [0, 1]),
(‘Jones’, 53, 1.7200000286102295, [2, 2])],
dtype=[(‘Name’, ‘S10’), (‘Age’, ‘<i4’), (‘Height’, ‘<f4’), (‘Chi
ldren/Pets’, ‘<i4’, (2,))])
In a sense, this construction comes quite close to the operation for initializing tables in a
SQL database. We have column names and column data types, with maybe some additional
information (e.g., maximum number of characters per string object). The single columns
can now be easily accessed by their names:
In [ 117 ]: s[‘Name’]
Out[117]: array([‘Smith’, ‘Jones’],
dtype=’|S10’)
In [ 118 ]: s[‘Height’].mean()
Out[118]: 1.7750001
Having selected a specific row and record, respectively, the resulting objects mainly
behave like dict objects, where one can retrieve values via keys:
In [ 119 ]: s[ 1 ][‘Age’]
Out[119]: 53
In summary, structured arrays are a generalization of the regular numpy.ndarray object
types in that the data type only has to be the same per column, as one is used to in the
context of tables in SQL databases. One advantage of structured arrays is that a single
element of a column can be another multidimensional object and does not have to conform
to the basic NumPy data types.
STRUCTURED ARRAYS
NumPy provides, in addition to regular arrays, structured arrays that allow the description and handling of rather
complex array-oriented data structures with a variety of different data types and even structures per (named)
column. They bring SQL table-like data structures to Python, with all the benefits of regular numpy.ndarray
objects (syntax, methods, performance).