Chapter3:MemoryManagement
<mmzone.h>
typedef struct pglist_data {
...
struct zonelist node_zonelists[MAX_ZONELISTS];
...
} pg_data_t;
#define MAX_ZONES_PER_ZONELIST (MAX_NUMNODES * MAX_NR_ZONES)
struct zonelist {
...
struct zone *zones[MAX_ZONES_PER_ZONELIST + 1]; // NULL delimited
};
Thenode_zonelistsarray makes a separate entry available for every possible zone type. This entry
contains a fallback list of typezonelistwhose structure is discussed below.
Because the fallback list must include all zones of all nodes, it consists ofMAX_NUMNODES * MAX_NZ_ZONES
entries, plus a further element for a null pointer to mark the end of the list.
The task of creating a fallback hierarchy is delegated tobuild_zonelists, which creates the data struc-
tures for each NUMA node. It requires as parameter a pointer to the relevantpg_data_tinstance. Before
I discuss the code in detail, let us recall one thing mentioned above. Since we have restricted our dis-
cussion to UMA systems, why would it be necessary to consider multiple NUMA nodes? Indeed, the
code shown below will be replaced with a different variant by the kernel ifCONFIG_NUMAis set. However,
it is possible that an architecture selects the discontiguous or sparse memory option on UMA systems.
This can be beneficial if the address space contains large holes. The memory ‘‘blocks’’ created by such
holes can best be treated using the data structures provided by NUMA. This is why we have to deal with
them here.
A large external loop first iterates over all node zones. Each loop pass looks for the zone entry for thei-th
zone in thezonelistarray in which the fallback list is held.
mm/page_alloc.c
static void __init build_zonelists(pg_data_t *pgdat)
{
int node, local_node;
enum zone_type i,j;
local_node = pgdat->node_id;
for (i = 0; i < MAX_NR_ZONES; i++) {
struct zonelist *zonelist;
zonelist = pgdat->node_zonelists + i;
j = build_zonelists_node(pgdat, zonelist, 0, j);
...
}
The array element ofnode_zonelistsis addressed by means of pointer manipulation, a perfectly legal
practice in C. The actual work is delegated tobuild_zonelist_node. When invoked, it first generates the
fallback order within the local node.