CHAPTER 19. MANIPULATING SPECIFIC BIT(S) CHAPTER 19. MANIPULATING SPECIFIC BIT(S)
Optimizing GCC
Listing 19.13: Optimizing GCC
public f
f proc near
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov eax, [ebp+arg_0]
pop ebp
or ah, 40h
and ah, 0FDh
retn
f endp
That’s shorter. It is worth noting the compiler works with theEAXregister part via theAHregister—that is theEAXregister
part from the 8th to the 15th bits included.
7th(byte number) 6th 5th 4th 3rd 2nd 1st 0th
RAXx64
EAX
AX
AH AL
N.B. The 16-bit CPU 8086 accumulator was namedAXand consisted of two 8-bit halves—AL(lower byte) andAH(higher byte).
In 80386 almost all registers were extended to 32-bit, the accumulator was namedEAX, but for the sake of compatibility, its
older partsmay be still accessed asAX/AH/AL.
Since all x86 CPUs are successors of the 16-bit 8086 CPU, theseolder16-bit opcodes are shorter than the newer 32-bit
ones. That’s why theor ah, 40hinstruction occupies only 3 bytes. It would be more logical way to emit hereor eax,
04000hbut that is 5 bytes, or even 6 (in case the register in the first operand is notEAX).
Optimizing GCC and regparm
It would be even shorter if to turn on the-O3optimization flag and also setregparm=3.
Listing 19.14: Optimizing GCC
public f
f proc near
push ebp
or ah, 40h
mov ebp, esp
and ah, 0FDh
pop ebp
retn
f endp
Indeed, the first argument is already loaded inEAX, so it is possible to work with it in-place. It is worth noting that both the
function prologue (push ebp / mov ebp,esp) and epilogue (pop ebp) can easily be omitted here, but GCC probably
is not good enough to do such code size optimizations. However, such short functions are better to beinlined functions( 43
on page 481).
19.2.2 ARM + Optimizing Keil 6/2013 (ARM mode).
Listing 19.15: Optimizing Keil 6/2013 (ARM mode)
02 0C C0 E3 BIC R0, R0, #0x200
01 09 80 E3 ORR R0, R0, #0x4000
1E FF 2F E1 BX LR
BIC(BItwise bit Clear) is an instruction for clearing specific bits. This is just like theANDinstruction, but with inverted
operand. I.e., it’s analogous to aNOT+ANDinstruction pair.
ORRis “logical or”, analogous toORin x86.
So far it’s easy.