1.28. 64-BIT VALUES IN 32-BIT ENVIRONMENT
var_10 = -0x10
var_4 = -4
lui $gp, (__gnu_local_gp >> 16)
addiu $sp, -0x20
la $gp, (__gnu_local_gp & 0xFFFF)
sw $ra, 0x20+var_4($sp)
sw $gp, 0x20+var_10($sp)
lw $t9, (__umoddi3 & 0xFFFF)($gp)
or $at, $zero
jalr $t9
or $at, $zero
lw $ra, 0x20+var_4($sp)
or $at, $zero
jr $ra
addiu $sp, 0x20
There are a lot ofNOPs, probably delay slots filled after the multiplication instruction (it’s slower than
other instructions, after all).
1.28.4 Shifting right
#include <stdint.h>
uint64_t f (uint64_t a)
{
return a>>7;
};
x86
Listing 1.382: Optimizing MSVC 2012 /Ob1
_a$ = 8 ; size = 8
_f PROC
mov eax, DWORD PTR _a$[esp-4]
mov edx, DWORD PTR _a$[esp]
shrd eax, edx, 7
shr edx, 7
ret 0
_f ENDP
Listing 1.383: Optimizing GCC 4.8.1 -fno-inline
_f:
mov edx, DWORD PTR [esp+8]
mov eax, DWORD PTR [esp+4]
shrd eax, edx, 7
shr edx, 7
ret
Shifting also occurs in two passes: first the lower part is shifted, then the higher part. But the lower part
is shifted with the help of theSHRDinstruction, it shifts the value ofEAXby 7 bits, but pulls new bits from
EDX, i.e., from the higher part. In other words, 64-bit value fromEDX:EAXregister’s pair, as a whole, is
shifted by 7 bits and lowest 32 bits of result are placed intoEAX. The higher part is shifted using the much
more popularSHRinstruction: indeed, the freed bits in the higher part must be filled with zeros.
ARM
ARM doesn’t have such instruction asSHRDin x86, so the Keil compiler ought to do this using simple shifts
andORoperations: