Assembly Language for Beginners

(nextflipdebug2) #1
1.22. MANIPULATING SPECIFIC BIT(S)
pop rbp
ret

Optimizing GCC 4.8.2

Listing 1.291: Optimizing GCC 4.8.2
1 f:
2 xor eax, eax ; rt variable will be in EAX register
3 xor ecx, ecx ; i variable will be in ECX register
4 .L3:
5 mov rsi, rdi ; load input value
6 lea edx, [rax+1] ; EDX=EAX+1
7 ; EDX here is anew version of rt,
8 ; which will be written into rt variable, if the last bit is 1
9 shr rsi, cl ; RSI=RSI>>CL
10 and esi, 1 ; ESI=ESI&1
11 ; the last bit is 1? If so, write new version of rtinto EAX
12 cmovne eax, edx
13 add rcx, 1 ; RCX++
14 cmp rcx, 64
15 jne .L3
16 rep ret ; AKA fatret


This code is terser, but has a quirk.

In all examples that we see so far, we were incrementing the “rt” value after comparing a specific bit, but
the code here increments “rt” before (line 6), writing the new value into registerEDX. Thus, if the last bit is
1, theCMOVNE^152 instruction (which is a synonym forCMOVNZ^153 )commitsthe new value of “rt” by moving
EDX(“proposed rt value”) intoEAX(“current rt” to be returned at the end).

Hence, the incrementing is performed at each step of loop, i.e., 64 times, without any relation to the input
value.

The advantage of this code is that it contain only one conditional jump (at the end of the loop) instead of
two jumps (skipping the “rt” value increment and at the end of loop). And that might work faster on the
modern CPUs with branch predictors:2.10.1 on page 466.

The last instruction isREP RET(opcodeF3 C3) which is also calledFATRETby MSVC. This is somewhat
optimized version ofRET, which is recommended by AMD to be placed at the end of function, ifRETgoes
right after conditional jump: [[Software Optimization Guide for AMD Family 16h Processors, (2013)]p.15]

(^154).
Optimizing MSVC 2010
Listing 1.292: Optimizing MSVC 2010
a$ = 8
f PROC
; RCX = input value
xor eax, eax
mov edx, 1
lea r8d, QWORD PTR [rax+64]
; R8D=64
npad 5
$LL4@f:
test rdx, rcx
; there are no such bit in input value?
; skip the next INC instruction then.
je SHORT $LN3@f
inc eax ; rt++
$LN3@f:
rol rdx, 1 ; RDX=RDX<<1
(^152) Conditional MOVe if Not Equal
(^153) Conditional MOVe if Not Zero
(^154) More information on it:http://go.yurichev.com/17328

Free download pdf