1.22. MANIPULATING SPECIFIC BIT(S)
leave
retn
f endp
There is a redundant code present, however, it is shorter than the MSVC version without optimization.
Now let’s try GCC with optimization turned on-O3:
Optimizing GCC
Listing 1.274: Optimizing GCC
public f
f proc near
arg_0 = dword ptr 8
push ebp
mov ebp, esp
mov eax, [ebp+arg_0]
pop ebp
or ah, 40h
and ah, 0FDh
retn
f endp
That’s shorter. It is worth noting the compiler works with theEAXregister part via theAHregister—that is
theEAXregister part from the 8th to the 15th bits included.
Byte number:
7th 6th 5th 4th 3rd 2nd 1st 0th
RAXx64
EAX
AX
AH AL
N.B. The 16-bit CPU 8086 accumulator was namedAXand consisted of two 8-bit halves—AL(lower byte)
andAH(higher byte). In 80386 almost all registers were extended to 32-bit, the accumulator was named
EAX, but for the sake of compatibility, itsolder partsmay be still accessed asAX/AH/AL.
Since all x86 CPUs are successors of the 16-bit 8086 CPU, theseolder16-bit opcodes are shorter than the
newer 32-bit ones. That’s why theor ah, 40hinstruction occupies only 3 bytes. It would be more logical
way to emit hereor eax, 04000hbut that is 5 bytes, or even 6 (in case the register in the first operand
is notEAX).
Optimizing GCC and regparm
It would be even shorter if to turn on the-O3optimization flag and also setregparm=3.
Listing 1.275: Optimizing GCC
public f
f proc near
push ebp
or ah, 40h
mov ebp, esp
and ah, 0FDh
pop ebp
retn
f endp
Indeed, the first argument is already loaded inEAX, so it is possible to work with it in-place. It is worth
noting that both the function prologue (push ebp / mov ebp,esp) and epilogue (pop ebp) can easily be
omitted here, but GCC probably is not good enough to do such code size optimizations. However, such
short functions are better to beinlined functions(3.11 on page 507).