MaximumPC 2004 03

(Dariusz) #1

Prescott Arrives


32 MAXIMUMPC MARCH 2004


slowest worker on the line. The same goes
for CPU pipelines.
Over at the Pentium 4 factory, there
are 20 stages, or 20 workers, attending
to number-crunching the instructions at
hand. And because there are more work-
ers on the line, each P4 worker has rela-
tively fewer tasks to perform at his particu-
lar station. Thus the P4 assembly line can
move along much, much faster than the
Athlon XP assembly line with its modest
staff of just 10 workers.
To continue our analogy, for its Prescott
architecture, Intel has hired 31 hard hats
to build the car. So instead of each worker
having to perform two tasks (say putting
on windshield wipers and installing an air-
bag), each worker simply screws on a sin-
gle lug nut before sending the car along.
The entire assembly process is completed
more quickly because the production line
is moving at a breakneck pace.
At 3.4GHz, Prescott is simply idling;
at this speed it’s not really using the
extended pipeline. Hence the lackluster
performance we’re currently seeing. As
higher clock speeds begin to take advan-
tage of the 31-stage pipeline, we’ll see
massive speed boosts.

Q: So why don’t they build 50-stage
pipeline CPUs all the time?
Easy, turbo. You’re getting a little ahead of
the curve. A long pipeline is only beneficial
if you can run it at high speeds. If you can’t
increase the speed of the assembly line,
a long pipeline can actually hurt perfor-
mance. That’s because a long pipeline is
designed to rapidly move data without any
hitches. A hitch would be a branch mispre-
diction. Here’s how that works—or doesn’t
work, as the case may be:
A branch is like a fork in the road—a
point at which a question must be con-
sidered, and then executed upon: Will the
sales office sell more red sports cars with
turbo engines or more cars with naturally
aspirated engines? By asking this question
before a decision has to be made, and by
(hopefully) answering the question cor-
rectly, valuable time and resources can
be saved. In terms of CPU operation, at
each branch, a CPU can forecast a likely
operation that will have to be executed in
the future, and then prepare for that opera-
tion beforehand. It simply requires highly
sophisticated branch prediction algorithms.
But problems arise when the CPU
makes an incorrect prediction, and in turn

performs an unnecessary operation. In
terms of our assembly line analogy, if the
factory discovers that it should have been
building more turbo cars all along, all the
cars with regular engines currently in the
assembly line need to be flushed from the
system and sent back for turbo installation.
In our short pipeline with only 10 stages, it’s
fairly easy to clear the old engines out and
commence building cars with turbos.
On the flipside, in the Pentium 4 Prescott
factory, you have to clear far more parts
and spend more time refilling the pipeline
before work can begin again. At low clock
speeds, this process moves so slowly, the
CPU foreman might as well blow a whistle
and cry “yabba-dabba-do” before sliding
off his dinosaur for a lunch break. At higher
speeds, the impact of branch mispredic-
tions is minimized because the entire line is
moving so much faster. Branch mispredic-
tions can also be minimized by the branch
predictors’ level of sophistication.
In addition to smarter branch predic-
tion logic, Intel has doubled the L1 and L2
caches, as well as increased internal buffers
to protect against branch mispredictions.

Q: Is there an optimal speed at which
the Prescott should be running? How
far will it scale?
The CPU nerds we talked to were sur-
prised to hear that a 31-stage CPU would
be introduced at just 3.4GHz instead of
a fantastical-sounding 5GHz. It’s hard to
say at exactly what speed the Prescott will
really begin to offer the kind of dramatic
performance we’ve been drooling over. To
give you an idea, the 20-stage Pentium 4
Northwood didn’t really exhibit substantial
performance increases over the Athlon
until it was well into the 2.5GHz range. Our
guess is that the Prescott won’t realize its
full performance potential until it reaches
clock speeds of 4GHz. Like the Pentium 4,
Intel designed the Prescott core to scale up
to 5GHz; other iterations of its architecture
(in subsequent CPU releases) will likely be
capable of reaching 10GHz.
In addition to the 31-stage pipeline,
the Prescott features greatly improved
clock distribution in its core. If you think
of the CPU die as a small silicon city
where data must flow through grid-locked
streets, Prescott’s general plan for roads
is designed for electrons to travel more
efficiently. This translates into even greater
clock speeds.
Prescott is also Intel’s first attempt to use
strained silicon on a mass CPU. Strained
silicon is a method of stretching silicon so
that its atoms are aligned to allow electrons
to move through the chip faster.
With more internal buffers (32 versus
24 in Northwood), Prescott has more
write-combining mechanisms that help

Prescott's longer pipeline = higher clock speed


A CPU’s pipeline generally determines the clock speed its micro-architecture can hit,
process technologies notwithstanding. The more stages in a pipeline, the less each
stage has to do, and the faster you can push data through. Intel’s new Prescott core
features a shockingly long 31-stage pipeline. That’s 11 more stages than the original
P4, and eight more than Apple’s G5! To illustrate the pipeline/clock speed relation-
ship, we compared the Pentium III, P4 Northwood core, and the P4 Prescott core.

10 STAGES
ARCHITECTURE: P6
CPUS: Pentium Pro, Pentium II, Pentium III, and Pentium III for Servers
CLOCK SPEED AND RELEASE DATE: 200MHz Pentium Pro on August 18, 1997
HIGHEST CLOCK AND RELEASE DATE: 1.4GHz Pentium III-S on January 8, 2002

20 STAGES
ARCHITECTURE: NetBurst
CPUS: Pentium 4
CLOCK SPEED AND RELEASE DATE: 1.5GHz Pentium 4 Willamette on November 20, 2000
HIGHEST CLOCK AND RELEASE DATE: 3.4GHz Pentium 4 Northwood on February 2, 2004

31 STAGES
ARCHITECTURE: NetBurst
CPUS: Pentium 4 Prescott core
CLOCK SPEED AND RELEASE DATE: 3.4GHz Pentium 4E on February 2, 2004
HIGHEST CLOCK AND RELEASE DATE: Projected to reach 10GHz

Pipeline Architecture

����������������

�����������������������������������������������������������
��������������������������������������������

������������������������

������������������������������������������������������������
�����������������������������������������������������������������
��������������������������������������������������������
����������������������������������������������������������
���������������������������������������������������������������
���������������������������������������������������������
����������������������������

��������
���������������
�����������������������������������������������������������
��������������������������������������������������������
������������������������������������������������������������

��������

Q


Q: So why don’t they build 50-stage

Q


Q: So why don’t they build 50-stage
pipeline CPUs all the time?

Q


pipeline CPUs all the time?
Easy, turbo. You’re getting a little ahead of
Q

Easy, turbo. You’re getting a little ahead of
Qthe curve. A long pipeline is only beneficial the curve. A long pipeline is only beneficial

Q


Q: Is there an optimal speed at which

Q


Q: Is there an optimal speed at which
the Prescott should be running? How

Q


the Prescott should be running? How
far will it scale?
Q

far will it scale?
QThe CPU nerds we talked to were sur-The CPU nerds we talked to were sur-

����������������

�����������������������������������������������������������
��������������������������������������������

������������������������

������������������������������������������������������������
�����������������������������������������������������������������
��������������������������������������������������������
����������������������������������������������������������
���������������������������������������������������������������
���������������������������������������������������������
����������������������������

��������
���������������
�����������������������������������������������������������
��������������������������������������������������������
������������������������������������������������������������

��������

����������������

�����������������������������������������������������������
��������������������������������������������

������������������������

������������������������������������������������������������
�����������������������������������������������������������������
��������������������������������������������������������
����������������������������������������������������������
���������������������������������������������������������������
���������������������������������������������������������
����������������������������

��������
���������������
�����������������������������������������������������������
��������������������������������������������������������
������������������������������������������������������������

��������
Free download pdf