Nature 2020 01 30 Part.01

(Ann) #1
Nature | Vol 577 | 30 January 2020 | 643

As shown in Fig. 2a, a five-layer CNN was constructed on a memristor-
based hardware system to recognize MNIST handwritten-digit images.
The detailed data flow in the CNN and the corresponding memristor
mapping are described in Methods.
Realizing memristor-based convolutional operations requires per-
forming sliding operations with various kernels. Memristor arrays are
highly efficient in achieving parallel MACs under shared inputs for
different kernels^22. Figure 2b shows a typical convolution example at a
particular slipping step, and Fig. 2c reveals the associated events in the
1T1R memristor array. The input value is encoded by the pulse number
according to its quantized bit number (Extended Data Fig. 2). A signed
kernel weight is mapped to the differential conductance of a pair of
memristors. In this manner, all the weights of a kernel are mapped to
two conductance rows: one row for positive weights with positive pulse
inputs and the other for negative weights with equivalent negative
pulse inputs. After inputting the encoded pulses into the bit lines, the
output currents through the two differential source lines are sensed


and accumulated. The differential current is the weighted sum corre-
sponding to the input patch and the chosen kernel. Different kernels
with different weights are mapped to different pairs of differential
rows, and the entire memristor array operates MACs in parallel under
the same inputs. All the desired weighted-sum results are obtained
concurrently.
In typical CNN training, it is necessary to propagate the objective deriva-
tive backwards with respect to the last outputs, to determine all weight
updates^10. This task requires highly complex operations to apply encoded
read pulses to source lines from back to front and layer by layer. Further-
more, it is challenging to train a complicated memristor DNN, owing to
non-ideal device characteristics, such as nonlinearity and asymmetric
conductance tuning^6 ,^27. In contrast to the pure in situ training solution,
the ex situ training method appears to be a shortcut that takes advantage
of existing high-performing parameters. However, inevitable hardware
imperfections, such as defective devices and parasitic wire resistance and
capacitance, would blur the weights and degrade the system performance

Convolutional kernel
1 × 3 × 3 × 8

a
Convolutional kernel
8 × 3 × 3 × 12

Pooling
3 × 3

FC weights
192 × 10

Pooling
2 × 2

Input image
28 × 28 C1,
feature maps
26 × 26 × 8

S2,
feature maps
9 × 9 × 8

S4,
feature maps
4 × 4 × 12

S4,
expand to vector
192

FC,
output vector
10

PE1 PE3 PE5 PE7

C1, kernel 16 × 9

C3, kernel 96 × 9
FC, weights 192 × 10
Unused

b

Input image

c

Kernel i

Kernel j x2,2 x2,3 x4,3 x4,4

Kernel i

Kernel j

x2,2x2,3x2,4
x3,2x3,3x3,4
x4,2x4,3x4,4

w 2 iw 3 i
w 4 i w 5 i w 6 i
w 7 i w 8 iw 9 i w 1 jw 2 j w 3 j
w 4 jw 5 j w 6 j
w 7 jw 8 j w 9 j

g 1 i+

g 1 i–

g 2 i+

g 2 i–

g 8 i+ g 9 i+

g 8 i– g 9 i–

w 1 i= g 1 i+– g 1 i–

g 1 j+

g 1 j–

g 2 j+

g 2 j–

g 8 j+ g 9 j+

g 8 j– g 9 j–

w 1 i

BL (^1) BL 2BL 8BL 9
WLi+
SLi+
WLi–
SLi–
WLj+
SLj+
WLj–
SLj–
C3,
feature maps
7 × 7 × 12
Fig. 2 | Five-layer mCNN with memristor convolver. a, Structure of the five-
layer mCNN used for MNIST image recognition, with alternating convolutional
(C1, C3) and subsampling (S2, S4) layers. The input is a 28 × 28 greyscale (8-bit)
digit image. The mapping relations between the weights of different layers and
the independent PEs are illustrated (see Methods for details). The top labels
give the kernel size (input channel depth × filter height × filter width × filter
batch size) for C1 and C3, the pooling size for S2 and S4 and the weight size for
the FC layer (input neuron number × output neuron number). The bottom
labels provide the feature map dimension (height × width × channel depth) or
the vector dimension. b, Typical convolutional case during the slipping
process. The grey box in the image confines the input patch of this sample case.
xm,n indicates the relevant pixel at the crossing of row m and column n. Kernels i
and j each have a total of 3 × 3 weights. c, The equivalent memristor convolver of
the convolutional operation in b. Throughout the entire parallel computing
process, all word lines (WL) are set to VWL = 4.5 V. The injected bit line (BL) pulses
are 0.2 V, and the source lines (SL) are clamped at 0 V. w represents the element
value in the weight matrix, and g+ and g− represent the device conductance
values for the positive and negative weights in the differential pair,
respectively.

Free download pdf