The Marketing Book 5th Edition

(singke) #1

222 The Marketing Book


characteristics but differ perhaps on one or two,
or in the degree of intensity of characteristics.
In the terminology of the SOM, the grid
preserves the ‘topological structure’ of the data
or alternatively may help us uncover such
structure. The Concise Oxford Dictionary(1999)
defines topology as ‘the study of geometrical
properties and spatial relations left unchanged
by the continuous change of shape or size of the
figure’. The topological structure emerges as a
‘feature map’ in which the prototypes are
related and subject to potential overlaps. Topol-
ogy preservation implies that input vectors
close together in input space map to close
nodes in the grid. Thus, not only are the
prototypes intended to reflect ‘typical’ values of
the inputs in their respective neighbourhoods,
but their grid positions reflect the relative
positioning and density of the original data. No
such ordering exists for the clusters which
emerge from CA.
We have noted how, once ‘trained’, the
network classifies a data point by identifying the
nearest Kohonen node. As regards the training
process, a similar principle is adopted. As is
common in NN operation, data points are
presented randomly to the network, and at each
stage the nearest Kohonen node is identified.
This is referred to as the ‘winning’ node and the
learning mechanism itself as ‘competitive learn-
ing’ or ‘winner takes all learning’. The weights
of the winning node are adjusted so as to move
towards the current data point, in which case the
training process involves allowing the weights
of each node to reflect or describe the data. The
topological structure of the data is preserved
because not only is the winning node updated,
but also its neighbouring nodes. The shape of
the neighbourhood may take various forms,
such as square or diamond. It is also possible to
model the proximity of nodes by a Gaussian
decay function.
More formally, we denote the input data
by an m× nmatrix X, each row of which
contains a data point comprising observed
values of the ninputs. Each node kin the SOM
grid is characterized by a 1 ×nvectorw(k)of


weights. The Euclidean distance between the
kth node and the jth input vector is then given
by:

D=
i
(W(ik)–Xji)^2

where the observed values of the attributes of
each data vector are indexed by i.
During training, the winning node is that
with the smallest distance from the current data
vector. The distance is in fact modified to allow
for the frequency with which nodes have
previously been ‘winners’, a so-called ‘con-
science mechanism’ through which an addi-
tional egality is inserted. The adjusted distance
is given by:

D*=D– (NFk– 1)

where Nis the number of Kohonen nodes, Fkis
the relative frequency with which the kth of
these nodes has been the winning node, and
is a constant between zero and unity. For nodes
whose frequency is the average for all nodes,
i.e. 1/N, the adjustment is zero. Nodes winning
with higher or lower frequencies have the
distances adjusted downwards or upwards
respectively. The frequency values are estimates
adjusted at each iteration.
The weight adjustment process involves
finding the node nearest to the data vector in
terms of adjusted distance D*, and this node, p
say, has its weights updated. The actual adjust-
ment used is such that the change in each
weight of the prototype is proportional to the
Euclidean distance between the current weights
and the current input vector. The adjustment
proportion is referred to as the learning
constant. Hence we have:

W(p
i
)* =W(p
i
) + (Xji–W(p
i

))

where W(p
i

)* and W(p
i

) respectively denote new
and old weight values.

An interesting presentation of this learning rule
is given by Kohonen (1995) and Ritter et al.
Free download pdf