times, so that each iteration actually performs the work of three iterations
instead of one. The counter incrementing code has been corrected to increment
by 3 instead of 1 in each iteration. This is more efficient because the loop’s
overhead is greatly reduced—instead of executing the CMPand JLinstructions
0x3e7(999) times, they will only be executed 0x14d(333) times.
A more aggressive type of loop unrolling is to simply eliminate the loop
altogether and actually duplicate its body as many times as needed. Depend-
ing on the number of iterations (and assuming that number is known in
advance), this may or may not be a practical approach.
Branchless Logic
Some optimizing compilers have special optimization techniques for generat-
ing branchless logic. The main goal behind all of these techniques is to eliminate
or at least reduce the number of conditional jumps required for implementing
a given logical statement. The reasons for wanting to reduce the number of
jumps in the code to the absolute minimum is explained in the section titled
“Hardware Execution Environments in Modern Processors” in Chapter 2.
Briefly, the use of a processor pipeline dictates that when the processor
encounters a conditional jump, it must guess or predict whether the jump will
take place or not, and based on that guess decide which instructions to add to
the end of the pipeline—the ones that immediately follow the branch or the
ones at the jump’s target address. If it guesses wrong, the entire pipeline is
emptied and must be refilled. The amount of time wasted in these situations
heavily depends on the processor’s internal design and primarily on its
pipeline length, but in most pipelined CPUs refilling the pipeline is a highly
expensive operation.
Some compilers implement special optimizations that use sophisticated
arithmetic and conditional instructions to eliminate or reduce the number of
jumps required in order to implement logic. These optimizations are usually
applied to code that conditionally performs one or more arithmetic or assign-
ment operations on operands. The idea is to convert the two or more condi-
tional execution paths into a single sequence of arithmetic operations that
result in the same data, but without the need for conditional jumps.
There are two major types of branchless logic code emitted by popular com-
pilers. One is based on converting logic into a purely arithmetic sequence that
provides the same end result as the original high-level language logic. This
technique is very limited and can only be applied to relatively simple
sequences. For slightly more involved logical statements, compilers some-
times employ special conditional instructions (when available on the target
CPU). The two primary approaches for implementing branchless logic are dis-
cussed in the following sections.
Deciphering Code Structures 509
21_574817 appa.qxd 3/16/05 8:54 PM Page 509