Science - USA (2020-06-05)

without affecting application programming
interfaces (APIs). Performance engineering
and hardware streamlining that do not re-
quire coordination already occur this way,
and we expect them to continue to do so.
Many of the most valuable future opportu-
nities for improving performance will not
arise locally within a single module, however,
but from broad-based and systematic changes
across many modules affecting major parts of
a system. For example, because so much soft-
ware is reductionist, the larger a system, the
greater the opportunity for huge performance
gains when layer upon layer of reductions are
replaced by a lean and direct implementa-
tion. But large-scale reengineering that affects
many modules requires the coordination of
many engineers, which is costly and risky.
Big system components (web browsers, data-
base systems, operating systems, micropro-
cessors, GPUs, compilers, disk-storage systems,
and so on) offer both technical opportunities
for making applications run fast and a con-
text where it is economically viable to take ad-
vantage of them. As an example of the types of
changes that can be made within a big compo-
nent, consider a microprocessor. An instruction-
set architecture (ISA) is the interface by which
software tells the processor what to do. Man-
ufacturers routinely make major internal changes
to improve performance without changing
the ISA so that old software continues to run
correctly. For example, the Intel 8086 released
in 1978 had 29,000 transistors ( 56 ), whereas
the 22-core Xeon E5-2699 v4, released in 2016,
has about 248,000 times more transistors ( 57 )
that produce more than a million times better
performance, according to SPECint rate ( 39 ).
In that time, the ISA has grown by less than
afactoroffour( 58 ), and old programs con-
tinuetoworkevenastheinterfaceaddsnew
functionality.
Big components provide good opportunities
for obtaining performance, but opportunity
alone is not enough. To outweigh the costs
and risks of making changes, an economic
incentive must exist. If a commercial enter-
prise or a nonprofit organization owns a big
component, it can justify the investment to
enhance performance because it reaps the
benefit when the job is done. Single owner-
ship also helps with coordination costs. If
many parties must agree to make a change,
but it takes only a few to veto it, then it can
be hard for change to happen. Even the fear
of high coordination costs can block change
in a large codebase if too many parties are
involved. But when a single entity owns a big
component, it has the power to make vast
changes and pool the costs and benefits. It
can choose to reengineer as many modules
as it can justify economically and coordinate
with the outside world only at the big com-
ponent’s interface.

Conclusions As miniaturization wanes, the silicon-fabrication improvements at the Bottom will no longer provide the predictable, broad-based gains in computer performance that society has enjoyed for more than 50 years. Performance-engineering of software, development of algorithms, and hardware streamlining at the Top can continue to make computer applications faster in the post-Moore era, rivaling the gains accrued over many years by Moore’slaw.Unlikethe historical gains at the Bottom, however, the gains at the Top will be opportunistic, uneven, sporadic, and subject to diminishing returns as problems become better explored. But even where opportunities exist, it may be hard to exploit them if the necessary mod- ifications to a component require compatibil- ity with other components. Big components can allow their owners to capture the economic advantages from performance gains at the Top while minimizing external disruptions. In addition to the potential at the Top, nas- cent technologies—such as 3D stacking, quan- tum computing, photonics, superconducting circuits, neuromorphic computing, and graphene chips—might provide a boost from the Bottom. At the moment, these technologies are in their infancyandlackthematuritytocompetewith today’ssilicon-basedsemiconductor technol- ogy in the near future ( 59 ). Although we ap- plaud investment in these technologies at the Bottom because of their long-term potential, we view it as far more likely that, at least in the near term, performance gains for most applications will originate at the Top.

Methods Table 1 Each running time is the minimum of five runs on an Amazon AWS c4.8xlarge spot instance, a dual-socket Intel Xeon E5-2666 v3 system with a total of 60 gibibytes of memory. Each Xeon is a 2.9-GHz 18-core CPU with a shared 25-mebibyte L3-cache. Each processor core has a 32–kibibyte (KiB) private L1-data-cache and a 256-KiB private L2-cache. The machine was running Fedora 22, using version 4.0.4 of the Linux kernel. The Python version was exe- cuted using Python 2.7.9. The Java version was compiled and run using OpenJDK version 1.8.0_51. All other versions were compiled using GNU Compiler Collection (GCC) 5.2.1 20150826.

Figure 1 Each curve models how hardware improvements grow the performance of a fixed maximum- flow algorithm, starting from the year the algorithm was published. The performance of each algorithm was measured as the number of graphs ofn=10^12 vertices,m=n1.1~ 15.8 × 1012 edges, and 64-bit integer capacities that can be solved in a fixed amount of time, normalized such that the first point starts at 1. We

calculated the performance of each algorithm based on its asymptotic complexity in terms of big-Onotation, assuming that the constant hidden by the big-Ois 1. We excluded approx- imate algorithms from consideration, because these algorithms do not necessarily return the same qualitative answer. We also excluded algorithms whose stated asymptotic bounds ignore logarithmic factors to simplify the performance comparisons. We identified four maximum- flow algorithms developed since 1978 that each delivered a performance improvement of more than a factor of four over its predecessor: Edmonds and Karp’sO(m^2 logU) algorithm ( 60 ); Sleator and Tarjan’sO(mnlogn) algorithm ( 61 ); Ahuja, Orlin, and Tarjan’sOðmnlogðn

ffiffiffiffiffiffiffiffiffiffiffi logU

p = mþ 2 ÞÞalgorithm ( 62 ); and Goldberg and Rao’sO(n2/3mlog(n^2 /m)logU)algorithm( 63 ). Each curve plots the best performance achieved by any processor up to that date, on the basis of scaled MIPS (millioninstructions per sec- ond) estimates and SPECint scores recorded in Stanford’sCPUdatabase( 56 ). These processor performance measurements are normalized using the same method as in Fig. 2. Figure 1 ignores the constant factors in the performance of maximum-flow algorithms because comparably engineered implemen- tations of these algorithms are not available. Nevertheless, the effects from changes in constant factors between algorithms can be in- ferred from the plot. For example, if a constant factor were 10 times larger between one algorithm and a later one, then the curve for the later algorithm would be one order of magnitude lower (i.e., one division less on theyaxis).

Figure 2 Performance measurements for different processors were gathered from Stanford’s CPU data- base ( 56 ). For a processor released before 1992, its performance is measured as the recorded estimate of how many MIPS that processor can perform. For subsequent processors, each performance measurement corresponds to the best recorded score for that processor on either the SPECint1992, SPECint1995, SPECint2000, or SPECint2006 benchmarks. Performance for multicoreprocessorsismeasuredusingboth standard SPECint scores (gray diamonds) and in terms of throughput (multicolored squares), using SPECint2006 rate scores. All SPECint performance scores were obtained from ( 39 ). The date for an Intel processor released after 2004 is plotted as the first day in the quarter when that processor was released, unless a more precise release date was found. Scores for different SPECint benchmarks were normalized by scaling factors, which were com- puted from the geometric-mean ratios of scores for consecutive versions of SPECint for processors that were evaluated on both versions. The MIPS estimates were normalized with

Leisersonet al.,Science 368 , eaam9744 (2020) 5 June 2020 6of7

RESEARCH | REVIEW

Science - USA (2020-06-05)

Get our desktop app

Company

Features

Documentation

Resources