202 Visual Perception of Objects
the problems resulting from three-dimensionality, at least in
principle. The kinds of features that are included in a shape
representation can refer to intrinsically three-dimensional
qualities and parts as well as two-dimensional ones, and so
can be used to capture the shape of three-dimensional as well
as two-dimensional objects. For instance, the shape of an ob-
ject can be described as having the featuresphericalrather
thancircularand ascontains-a-pyramidrather thancontains-
a-triangle.Thus, there is nothing intrinsic to the feature-list
approach that limits it to two-dimensional features.
Feature theories have several important weaknesses, how-
ever. One is that it is often unclear how to determine compu-
tationally whether a given object actually hasthe features
that are proposed to comprise its shape representation. Sim-
ple part-features of two-dimensional images, such as lines,
edges, and blobs, can be computed from an underlying
template system as discussed above, but even these must be
abstracted from the color-, size-, and orientation-specific
peripheral channels that detect lines, edges, and blobs. Un-
fortunately, these simple image-based features are just the tip
of a very large iceberg. They do not cover the plethora of dif-
ferent attributes that feature theorists might (and do) propose
in their representations of shape. Features like contains-a-
cylinderorhas-a-nose,for instance, are not easy to compute
from gray-scale images. Until such feature-extraction rou-
tines are available to back up the features proposed for the
representations, feature-based theories are incomplete in a
very important sense.
Another difficult problem is specifying what the proper
features might be for a shape representation system. It is one
thing to propose that some appropriate set of shape features
can, in principle, account for shape perception, but quite an-
other to say exactly what those features are. Computer-based
methods such as multidimensional scaling (Shepard, 1962a,
1962b) and hierarchical clustering can help in limited do-
mains, but they have not yet succeeded in suggesting viable
schemes for the general problem of representing shape in
terms of lists of properties.
Structural Descriptions
Structural descriptions are graph-theoretical representations
that can be considered an elaboration or extension of feature
theories. They generally contain three distinct types of infor-
mation: properties, parts, and relations between parts. They
are usually depicted as hierarchical networks in which nodes
represent the whole object and its various parts and subparts
with labeled links (or arcs) between nodes that represent
structural relations between objects and parts. Because of
this hierarchical network format, structural descriptions are
surely the representational approach that is closest to the
view of perceptual organization that was presented in the first
half of this chapter.
Another important aspect of perceptual organization that
can be encoded in structural descriptions is information about
the intrinsic reference frame for the object as a whole and for
each of its parts. Each reference frame can be represented
as global features attached to the node corresponding to the
object or part, one each for its position, orientation, size, and
reflection (e.g., Marr, 1982; Palmer, 1975b). The reference
frame for a part can then be represented relative to that of
its superordinate, as evidence from organizational phenom-
ena suggests (see this chapter’s section entitled “Frames of
Reference”).
One serious problem with structural descriptions is how
to represent the global shapes of the components. An attrac-
tive solution is to postulateshape primitives:a set of indi-
visible perceptual units into which all other shapes can be
decomposed. For three-dimensional objects, such as people,
houses, trees, and cars, the shape primitives presumably
must be three-dimensional volumes. The best known pro-
posal of this type is Binford’s (1971) suggestion, later popu-
larized by Marr (1982), that complex shapes can be analyzed
into combinations ofgeneralized cylinders.As the name im-
plies, generalized cylinders are a generalization of standard
geometric cylinders in which several further parameters are
introduced to encompass a larger set of shapes. Variables are
added to allow, for example, avariable base shape(e.g.,
square or trapezoidal in addition to circular), a variable axis
(e.g., curved in addition to straight), avariable sweeping
rule(e.g., the cross-sectional size getting small toward one
end in addition to staying a constant size), and so forth.
Some of the other proposals about shape primitives are very
closely related to generalized cylinders, such asgeons(Bie-
derman, 1987) and some are rather different, such assu-
perquadrics(Pentland, 1986).
Structural descriptions with volumetric shape primitives
can overcome many of the difficulties with template and fea-
ture approaches. Like features, they can represent abstract
visual information, such as edges defined by luminance, tex-
ture, and motion. They can account for the effects of spatial
transformations on shape perception by absorbing them
within object-centered reference frames. They deal explicitly
with the problem of part structure by having distinct repre-
sentations of parts and the spatial relations among those
parts. And they are able to represent three-dimensional shape
by using volumetric primitives and three-dimensional spatial
relations in representing three-dimensional objects.
One difficulty with structural descriptions is that the repre-
sentations become quite complex, so that matching two such