CHAPTER 24. 64-BIT VALUES IN 32-BIT ENVIRONMENT CHAPTER 24. 64-BIT VALUES IN 32-BIT ENVIRONMENT
jalr $t9
or $at, $zero
lw $ra, 0x20+var_4($sp)
or $at, $zero
jr $ra
addiu $sp, 0x20
There are a lot ofNOPs, probably delay slots filled after the multiplication instruction (it’s slower than other instructions,
after all).
24.4 Shifting right
#include <stdint.h>
uint64_t f (uint64_t a)
{
return a>>7;
};
24.4.1 x86
Listing 24.14: Optimizing MSVC 2012 /Ob1
_a$ = 8 ; size = 8
_f PROC
mov eax, DWORD PTR _a$[esp-4]
mov edx, DWORD PTR _a$[esp]
shrd eax, edx, 7
shr edx, 7
ret 0
_f ENDP
Listing 24.15: Optimizing GCC 4.8.1 -fno-inline
_f:
mov edx, DWORD PTR [esp+8]
mov eax, DWORD PTR [esp+4]
shrd eax, edx, 7
shr edx, 7
ret
Shifting also occurs in two passes: first the lower part is shifted, then the higher part. But the lower part is shifted with the
help of theSHRDinstruction, it shifts the value ofEDXby 7 bits, but pulls new bits fromEAX, i.e., from the higher part. The
higher part is shifted using the more popularSHRinstruction: indeed, the freed bits in the higher part must be filled with
zeroes.
24.4.2 ARM.
ARM doesn’t have such instruction as SHRD in x86, so the Keil compiler ought to do this using simple shifts and OR operations:
Listing 24.16: Optimizing Keil 6/2013 (ARM mode)
||f|| PROC
LSR r0,r0,#7
ORR r0,r0,r1,LSL #25
LSR r1,r1,#7
BX lr
ENDP
Listing 24.17: Optimizing Keil 6/2013 (Thumb mode)
||f|| PROC
LSLS r2,r1,#25
LSRS r0,r0,#7