May 2019, ScientificAmerican.com 61Input: Thousands
of cat photographsEach layer of the network
learns to identify progressively
more complex featuresTraining
Images
broken
into pixelsResult: Ability to recognize a cat
Output: Image labelCatInput: Sets of
different defined
groupingsTrainingPretrainingInput: A few cat photographsResult: Ability to recognize a cat fasterCatResult: Ability to generate convincing cat imageDiscriminator
is randomly given
either a real or
a fake cat imageTrainingDiscriminator
judges whether the
image is real. If not,
in what ways is it not
real? Feedback is fed
to the generator.RealFa keFake (generated)
cat imageNoiseDiscriminatorGeneratorInput: Random
noise and a class
CatResult: Ability to isolate and reconstruct elementsTraining
Input: Primitive elements with multiple variablesBottleneck is gradually loosenedA so-called deep network has tens or hundreds of hidden lay-
ers. They might represent midlevel structures such as edges and
geometric shapes, although it is not always obvious what they are
doing. With thousands of neurons and millions of inter con nec-
tions, there is no simple logical path through the system. And that
is by design. Neural networks are masters at problems not amen -
able to explicit logical rules, such as pattern recognition.
Crucially, the neuronal connections are not fixed in advance
but adapt in a process of trial and error. You feed the network im-
ages labeled “dog” or “cat.” For each image, it guesses a label. If it
is wrong, you adjust the strength of the connections that contrib-
uted to the erroneous result, which is a straightforward exercise
in calculus. Starting from complete scratch, without knowing
what an image is, let alone an animal, the network does no better
than a coin toss. But after perhaps 10,000 examples, it does as well
as a human presented with the same images. In other training
methods, the network responds to vaguer cues or even discerns
the categories entirely on its own.
Remarkably, a network can sort images it has never seen be-
fore. Theorists are still not entirely sure how it does that, but one
factor is that the humans using the network must tolerate errors
or even deliberately introduce them. A network that classifies its
initial batch of cats and dogs perfectly might be fudging: basing
its judgment on unreliable cues and variations rather than on
essential features.
This ability of networks to sculpt themselves means they can
solve problems that their human designers have no idea how tosolve. And that includes the problem of making the networks
even better at what they do.GOING ME TA
teachers often coMplain that students forget everything over the
summer. In lieu of making vacations shorter, they have taken to
loading them up with summer homework. But psychologists such
as Robert Bjork of the University of California, Los Angeles, have
found that forgetting is not inimical to learning but essential to it.
That principle applies to machine learning, too.
If a machine learns a task, then forgets it, then learns another
task and forgets it, and so on, it can be coached to grasp the com-
mon features of those tasks, and it will pick up new variants fast-
er. It won’t have learned anything specific, but it will have learned
how to learn—what researchers call meta-learning. When you do
want it to retain information, it’ll be ready. “After you’ve learned
to do 1,000 tasks, the 1,001st is much easier,” says Sanjeev Arora, a
machine-learning theorist at Princeton University. Forgetting is
what puts the meta into meta-learning. Without it, the tasks all
blur together, and the machine can’t see their overall structure.
Meta-learning gives machines some of our mental agility. “It
will probably be key to achieving AI that can perform with hu-
man-level intelligence,” says Jane Wang, a computational neuro-
scientist at Google’s DeepMind in London. Conversely, she thinks
that computer meta-learning will help scientists figure out what
happens inside our own head.
In nature, the ultimate meta-learning algorithm is DarwinianGenerative Adversarial Networks
A classification network can be run in reverse to generate fresh images—cats
that never existed, say, but look as if they could have. Researchers train this “gen
erative” network by coupling it with an ordinary classifier to serve as critic and
coach. Random noise is input to the system to ensure that each new cat is unique.Disentanglement
A machine can learn to pick apart a scene into the objects that consti tute it.
One network compresses the input data; the other expands them again. By
constricting the link between the two, the system is forced to find the most
parsimonious description. That is usually the de scription a human would
use, too, thereby making the network more transparent in its operation.© 2019 Scientific American