Science - USA (2020-06-05)

(Antfer) #1

without affecting application programming
interfaces (APIs). Performance engineering
and hardware streamlining that do not re-
quire coordination already occur this way,
and we expect them to continue to do so.
Many of the most valuable future opportu-
nities for improving performance will not
arise locally within a single module, however,
but from broad-based and systematic changes
across many modules affecting major parts of
a system. For example, because so much soft-
ware is reductionist, the larger a system, the
greater the opportunity for huge performance
gains when layer upon layer of reductions are
replaced by a lean and direct implementa-
tion. But large-scale reengineering that affects
many modules requires the coordination of
many engineers, which is costly and risky.
Big system components (web browsers, data-
base systems, operating systems, micropro-
cessors, GPUs, compilers, disk-storage systems,
and so on) offer both technical opportunities
for making applications run fast and a con-
text where it is economically viable to take ad-
vantage of them. As an example of the types of
changes that can be made within a big compo-
nent, consider a microprocessor. An instruction-
set architecture (ISA) is the interface by which
software tells the processor what to do. Man-
ufacturers routinely make major internal changes
to improve performance without changing
the ISA so that old software continues to run
correctly. For example, the Intel 8086 released
in 1978 had 29,000 transistors ( 56 ), whereas
the 22-core Xeon E5-2699 v4, released in 2016,
has about 248,000 times more transistors ( 57 )
that produce more than a million times better
performance, according to SPECint rate ( 39 ).
In that time, the ISA has grown by less than
afactoroffour( 58 ), and old programs con-
tinuetoworkevenastheinterfaceaddsnew
functionality.
Big components provide good opportunities
for obtaining performance, but opportunity
alone is not enough. To outweigh the costs
and risks of making changes, an economic
incentive must exist. If a commercial enter-
prise or a nonprofit organization owns a big
component, it can justify the investment to
enhance performance because it reaps the
benefit when the job is done. Single owner-
ship also helps with coordination costs. If
many parties must agree to make a change,
but it takes only a few to veto it, then it can
be hard for change to happen. Even the fear
of high coordination costs can block change
in a large codebase if too many parties are
involved. But when a single entity owns a big
component, it has the power to make vast
changes and pool the costs and benefits. It
can choose to reengineer as many modules
as it can justify economically and coordinate
with the outside world only at the big com-
ponent’s interface.


Conclusions
As miniaturization wanes, the silicon-fabrication
improvements at the Bottom will no longer
provide the predictable, broad-based gains in
computer performance that society has enjoyed
for more than 50 years. Performance-engineering
of software, development of algorithms, and
hardware streamlining at the Top can con-
tinue to make computer applications faster in
the post-Moore era, rivaling the gains accrued
over many years by Moore’slaw.Unlikethe
historical gains at the Bottom, however, the
gains at the Top will be opportunistic, uneven,
sporadic, and subject to diminishing returns
as problems become better explored. But
even where opportunities exist, it may be
hard to exploit them if the necessary mod-
ifications to a component require compatibil-
ity with other components. Big components
can allow their owners to capture the eco-
nomic advantages from performance gains at
the Top while minimizing external disruptions.
In addition to the potential at the Top, nas-
cent technologies—such as 3D stacking, quan-
tum computing, photonics, superconducting
circuits, neuromorphic computing, and graphene
chips—might provide a boost from the Bottom.
At the moment, these technologies are in their
infancyandlackthematuritytocompetewith
today’ssilicon-basedsemiconductor technol-
ogy in the near future ( 59 ). Although we ap-
plaud investment in these technologies at the
Bottom because of their long-term potential,
we view it as far more likely that, at least in the
near term, performance gains for most appli-
cations will originate at the Top.

Methods
Table 1
Each running time is the minimum of five runs
on an Amazon AWS c4.8xlarge spot instance,
a dual-socket Intel Xeon E5-2666 v3 system
with a total of 60 gibibytes of memory. Each
Xeon is a 2.9-GHz 18-core CPU with a shared
25-mebibyte L3-cache. Each processor core
has a 32–kibibyte (KiB) private L1-data-cache
and a 256-KiB private L2-cache. The machine
was running Fedora 22, using version 4.0.4 of
the Linux kernel. The Python version was exe-
cuted using Python 2.7.9. The Java version was
compiled and run using OpenJDK version
1.8.0_51. All other versions were compiled using
GNU Compiler Collection (GCC) 5.2.1 20150826.

Figure 1
Each curve models how hardware improvements
grow the performance of a fixed maximum-
flow algorithm, starting from the year the al-
gorithm was published. The performance of
each algorithm was measured as the number
of graphs ofn=10^12 vertices,m=n1.1~ 15.8 ×
1012 edges, and 64-bit integer capacities that
can be solved in a fixed amount of time, nor-
malized such that the first point starts at 1. We

calculated the performance of each algorithm
based on its asymptotic complexity in terms of
big-Onotation, assuming that the constant
hidden by the big-Ois 1. We excluded approx-
imate algorithms from consideration, because
these algorithms do not necessarily return the
same qualitative answer. We also excluded algo-
rithms whose stated asymptotic bounds ignore
logarithmic factors to simplify the performance
comparisons. We identified four maximum-
flow algorithms developed since 1978 that
each delivered a performance improvement of
more than a factor of four over its predecessor:
Edmonds and Karp’sO(m^2 logU) algorithm ( 60 );
Sleator and Tarjan’sO(mnlogn) algorithm ( 61 );
Ahuja, Orlin, and Tarjan’sOðmnlogðn

ffiffiffiffiffiffiffiffiffiffiffi
logU

p
=
mþ 2 ÞÞalgorithm ( 62 ); and Goldberg and
Rao’sO(n2/3mlog(n^2 /m)logU)algorithm( 63 ).
Each curve plots the best performance achieved
by any processor up to that date, on the basis
of scaled MIPS (millioninstructions per sec-
ond) estimates and SPECint scores recorded in
Stanford’sCPUdatabase( 56 ). These processor
performance measurements are normalized
using the same method as in Fig. 2.
Figure 1 ignores the constant factors in the
performance of maximum-flow algorithms
because comparably engineered implemen-
tations of these algorithms are not available.
Nevertheless, the effects from changes in con-
stant factors between algorithms can be in-
ferred from the plot. For example, if a constant
factor were 10 times larger between one al-
gorithm and a later one, then the curve for
the later algorithm would be one order of
magnitude lower (i.e., one division less on
theyaxis).

Figure 2
Performance measurements for different pro-
cessors were gathered from Stanford’s CPU data-
base ( 56 ). For a processor released before 1992,
its performance is measured as the recorded
estimate of how many MIPS that processor can
perform. For subsequent processors, each per-
formance measurement corresponds to the
best recorded score for that processor on either
the SPECint1992, SPECint1995, SPECint2000,
or SPECint2006 benchmarks. Performance for
multicoreprocessorsismeasuredusingboth
standard SPECint scores (gray diamonds) and
in terms of throughput (multicolored squares),
using SPECint2006 rate scores. All SPECint
performance scores were obtained from ( 39 ).
The date for an Intel processor released after
2004 is plotted as the first day in the quarter
when that processor was released, unless a
more precise release date was found. Scores
for different SPECint benchmarks were nor-
malized by scaling factors, which were com-
puted from the geometric-mean ratios of scores
for consecutive versions of SPECint for pro-
cessors that were evaluated on both versions.
The MIPS estimates were normalized with

Leisersonet al.,Science 368 , eaam9744 (2020) 5 June 2020 6of7


RESEARCH | REVIEW

Free download pdf