Nature 2020 01 30 Part.01

Nature | Vol 577 | 30 January 2020 | 643

As shown in Fig. 2a, a five-layer CNN was constructed on a memristor-
based hardware system to recognize MNIST handwritten-digit images.
The detailed data flow in the CNN and the corresponding memristor
mapping are described in Methods.
Realizing memristor-based convolutional operations requires per-
forming sliding operations with various kernels. Memristor arrays are
highly efficient in achieving parallel MACs under shared inputs for
different kernels^22. Figure 2b shows a typical convolution example at a
particular slipping step, and Fig. 2c reveals the associated events in the
1T1R memristor array. The input value is encoded by the pulse number
according to its quantized bit number (Extended Data Fig. 2). A signed
kernel weight is mapped to the differential conductance of a pair of
memristors. In this manner, all the weights of a kernel are mapped to
two conductance rows: one row for positive weights with positive pulse
inputs and the other for negative weights with equivalent negative
pulse inputs. After inputting the encoded pulses into the bit lines, the
output currents through the two differential source lines are sensed

and accumulated. The differential current is the weighted sum corresponding to the input patch and the chosen kernel. Different kernels with different weights are mapped to different pairs of differential rows, and the entire memristor array operates MACs in parallel under the same inputs. All the desired weighted-sum results are obtained concurrently. In typical CNN training, it is necessary to propagate the objective deriva- tive backwards with respect to the last outputs, to determine all weight updates^10. This task requires highly complex operations to apply encoded read pulses to source lines from back to front and layer by layer. Further- more, it is challenging to train a complicated memristor DNN, owing to non-ideal device characteristics, such as nonlinearity and asymmetric conductance tuning^6 ,^27. In contrast to the pure in situ training solution, the ex situ training method appears to be a shortcut that takes advantage of existing high-performing parameters. However, inevitable hardware imperfections, such as defective devices and parasitic wire resistance and capacitance, would blur the weights and degrade the system performance

Convolutional kernel 1 × 3 × 3 × 8

a Convolutional kernel 8 × 3 × 3 × 12

Pooling 3 × 3

FC weights 192 × 10

Pooling 2 × 2

Input image 28 × 28 C1, feature maps 26 × 26 × 8

S2, feature maps 9 × 9 × 8

S4, feature maps 4 × 4 × 12

S4, expand to vector 192

FC, output vector 10

PE1 PE3 PE5 PE7

C1, kernel 16 × 9

C3, kernel 96 × 9 FC, weights 192 × 10 Unused

b

Input image

c

Kernel i

Kernel j x2,2 x2,3 x4,3 x4,4

Kernel i

Kernel j

x2,2x2,3x2,4 x3,2x3,3x3,4 x4,2x4,3x4,4

w 2 iw 3 i w 4 i w 5 i w 6 i w 7 i w 8 iw 9 i w 1 jw 2 j w 3 j w 4 jw 5 j w 6 j w 7 jw 8 j w 9 j

g 1 i+

g 1 i–

g 2 i+

g 2 i–

g 8 i+ g 9 i+

g 8 i– g 9 i–

w 1 i= g 1 i+– g 1 i–

g 1 j+

g 1 j–

g 2 j+

g 2 j–

g 8 j+ g 9 j+

g 8 j– g 9 j–

w 1 i

BL (^1) BL 2BL 8BL 9
WLi+
SLi+
WLi–
SLi–
WLj+
SLj+
WLj–
SLj–
C3,
feature maps
7 × 7 × 12
Fig. 2 | Five-layer mCNN with memristor convolver. a, Structure of the five-
layer mCNN used for MNIST image recognition, with alternating convolutional
(C1, C3) and subsampling (S2, S4) layers. The input is a 28 × 28 greyscale (8-bit)
digit image. The mapping relations between the weights of different layers and
the independent PEs are illustrated (see Methods for details). The top labels
give the kernel size (input channel depth × filter height × filter width × filter
batch size) for C1 and C3, the pooling size for S2 and S4 and the weight size for
the FC layer (input neuron number × output neuron number). The bottom
labels provide the feature map dimension (height × width × channel depth) or
the vector dimension. b, Typical convolutional case during the slipping
process. The grey box in the image confines the input patch of this sample case.
xm,n indicates the relevant pixel at the crossing of row m and column n. Kernels i
and j each have a total of 3 × 3 weights. c, The equivalent memristor convolver of
the convolutional operation in b. Throughout the entire parallel computing
process, all word lines (WL) are set to VWL = 4.5 V. The injected bit line (BL) pulses
are 0.2 V, and the source lines (SL) are clamped at 0 V. w represents the element
value in the weight matrix, and g+ and g− represent the device conductance
values for the positive and negative weights in the differential pair,
respectively.

Nature 2020 01 30 Part.01

Get our desktop app

Company

Features

Documentation

Resources