CHAPTER 19. MANIPULATING SPECIFIC BIT(S) CHAPTER 19. MANIPULATING SPECIFIC BIT(S)
Optimizing GCC 4.8.2
Listing 19.30: Optimizing GCC 4.8.2
1 f:
2 xor eax, eax ; rt variable will be in EAX register
3 xor ecx, ecx ; i variable will be in ECX register
4 .L3:
5 mov rsi, rdi ; load input value
6 lea edx, [rax+1] ; EDX=EAX+1
7 ; EDX here is a "new version of rt", which will be written into rt variable, if the last bit is 1
8 shr rsi, cl ; RSI=RSI>>CL
9 and esi, 1 ; ESI=ESI&1
10 ; the last bit is 1? If so, write "new version of rt" into EAX
11 cmovne eax, edx
12 add rcx, 1 ; RCX++
13 cmp rcx, 64
14 jne .L3
15 rep ret ; AKA fatret
This code is terser, but has a quirk. In all examples that we see so far, we were incrementing the “rt” value after comparing
a specific bit, but the code here increments “rt” before (line 6), writing the new value into registerEDX. Thus, if the last bit
is 1, theCMOVNE^8 instruction (which is a synonym forCMOVNZ^9 )commitsthe new value of “rt” by movingEDX(“proposed rt
value”) intoEAX(“current rt” to be returned at the end). Hence, the incrementing is done at each step of loop, i.e., 64 times,
without any relation to the input value.
The advantage of this code is that it contain only one conditional jump (at the end of the loop) instead of two jumps (skipping
the “rt” value increment and at the end of loop). And that might work faster on the modern CPUs with branch predictors:33.1
on page 436.
The last instruction isREP RET(opcodeF3 C3) which is also calledFATRETby MSVC. This is somewhat optimized version
ofRET, which is recommended by AMD to be placed at the end of function, ifRETgoes right after conditional jump: [AMD13b,
p.15]^10.
Optimizing MSVC 2010
Listing 19.31: MSVC 2010
a$ = 8
f PROC
; RCX = input value
xor eax, eax
mov edx, 1
lea r8d, QWORD PTR [rax+64]
; R8D=64
npad 5
$LL4@f:
test rdx, rcx
; there are no such bit in input value?
; skip the next INC instruction then.
je SHORT $LN3@f
inc eax ; rt++
$LN3@f:
rol rdx, 1 ; RDX=RDX<<1
dec r8 ; R8--
jne SHORT $LL4@f
fatret 0
f ENDP
Here theROLinstruction is used instead ofSHL, which is in fact “rotate left” instead of “shift left”, but in this example it
works just asSHL.
You can read more about the rotate instruction here:A.6.3 on page 893.
R8here is counting from 64 to 0. It’s just like an invertedi.
Here is a table of some registers during the execution:
(^8) Conditional MOVe if Not Equal
(^9) Conditional MOVe if Not Zero
(^10) More information on it:http://go.yurichev.com/17328