May 2019, ScientificAmerican.com 61
Input: Thousands
of cat photographs
Each layer of the network
learns to identify progressively
more complex features
Training
Images
broken
into pixels
Result: Ability to recognize a cat
Output: Image label
Cat
Input: Sets of
different defined
groupings
Training
Pretraining
Input: A few cat photographs
Result: Ability to recognize a cat faster
Cat
Result: Ability to generate convincing cat image
Discriminator
is randomly given
either a real or
a fake cat image
Training
Discriminator
judges whether the
image is real. If not,
in what ways is it not
real? Feedback is fed
to the generator.
Real
Fa ke
Fake (generated)
cat image
Noise
Discriminator
Generator
Input: Random
noise and a class
Cat
Result: Ability to isolate and reconstruct elements
Training
Input: Primitive elements with multiple variables
Bottleneck is gradually loosened
A so-called deep network has tens or hundreds of hidden lay-
ers. They might represent midlevel structures such as edges and
geometric shapes, although it is not always obvious what they are
doing. With thousands of neurons and millions of inter con nec-
tions, there is no simple logical path through the system. And that
is by design. Neural networks are masters at problems not amen -
able to explicit logical rules, such as pattern recognition.
Crucially, the neuronal connections are not fixed in advance
but adapt in a process of trial and error. You feed the network im-
ages labeled “dog” or “cat.” For each image, it guesses a label. If it
is wrong, you adjust the strength of the connections that contrib-
uted to the erroneous result, which is a straightforward exercise
in calculus. Starting from complete scratch, without knowing
what an image is, let alone an animal, the network does no better
than a coin toss. But after perhaps 10,000 examples, it does as well
as a human presented with the same images. In other training
methods, the network responds to vaguer cues or even discerns
the categories entirely on its own.
Remarkably, a network can sort images it has never seen be-
fore. Theorists are still not entirely sure how it does that, but one
factor is that the humans using the network must tolerate errors
or even deliberately introduce them. A network that classifies its
initial batch of cats and dogs perfectly might be fudging: basing
its judgment on unreliable cues and variations rather than on
essential features.
This ability of networks to sculpt themselves means they can
solve problems that their human designers have no idea how to
solve. And that includes the problem of making the networks
even better at what they do.
GOING ME TA
teachers often coMplain that students forget everything over the
summer. In lieu of making vacations shorter, they have taken to
loading them up with summer homework. But psychologists such
as Robert Bjork of the University of California, Los Angeles, have
found that forgetting is not inimical to learning but essential to it.
That principle applies to machine learning, too.
If a machine learns a task, then forgets it, then learns another
task and forgets it, and so on, it can be coached to grasp the com-
mon features of those tasks, and it will pick up new variants fast-
er. It won’t have learned anything specific, but it will have learned
how to learn—what researchers call meta-learning. When you do
want it to retain information, it’ll be ready. “After you’ve learned
to do 1,000 tasks, the 1,001st is much easier,” says Sanjeev Arora, a
machine-learning theorist at Princeton University. Forgetting is
what puts the meta into meta-learning. Without it, the tasks all
blur together, and the machine can’t see their overall structure.
Meta-learning gives machines some of our mental agility. “It
will probably be key to achieving AI that can perform with hu-
man-level intelligence,” says Jane Wang, a computational neuro-
scientist at Google’s DeepMind in London. Conversely, she thinks
that computer meta-learning will help scientists figure out what
happens inside our own head.
In nature, the ultimate meta-learning algorithm is Darwinian
Generative Adversarial Networks
A classification network can be run in reverse to generate fresh images—cats
that never existed, say, but look as if they could have. Researchers train this “gen
erative” network by coupling it with an ordinary classifier to serve as critic and
coach. Random noise is input to the system to ensure that each new cat is unique.
Disentanglement
A machine can learn to pick apart a scene into the objects that consti tute it.
One network compresses the input data; the other expands them again. By
constricting the link between the two, the system is forced to find the most
parsimonious description. That is usually the de scription a human would
use, too, thereby making the network more transparent in its operation.
© 2019 Scientific American