CHAPTER 8. ACCESSING PASSED ARGUMENTS CHAPTER 8. ACCESSING PASSED ARGUMENTS
The very firstMOV R3, R0, instruction is, apparently, redundant (a singleMLAinstruction could be used here instead). The
compiler has not optimized it, since this is non-optimizing compilation.
TheBXinstruction returns the control to the address stored in theLRregister and, if necessary, switches the processor mode
from Thumb to ARM or vice versa. This can be necessary since, as we can see, functionf()is not aware from what kind of
code it may be called, ARM or Thumb. Thus, if it gets called from Thumb code,BXis not only returns control to the calling
function, but also switches the processor mode to Thumb. Or not switch, if the function was called from ARM code [ARM12,
A2.3.2].
8.3.2 Optimizing Keil 6/2013 (ARM mode)
.text:00000098 f
.text:00000098 91 20 20 E0 MLA R0, R1, R0, R2
.text:0000009C 1E FF 2F E1 BX LR
And here is thef()function compiled by the Keil compiler in full optimization mode (-O3). TheMOVinstruction was
optimized out (or reduced) and nowMLAuses all input registers and also places the result right intoR0, exactly where the
calling function will read and use it.
8.3.3 Optimizing Keil 6/2013 (Thumb mode).
.text:0000005E 48 43 MULS R0, R1
.text:00000060 80 18 ADDS R0, R0, R2
.text:00000062 70 47 BX LR
TheMLAinstruction is not available in Thumb mode, so the compiler generates the code doing these two operations (multi-
plication and addition) separately. First theMULSinstruction multipliesR0byR1, leaving the result in registerR1. The
second instruction (ADDS) adds the result andR2leaving the result in registerR0.
8.3.4 ARM64
Optimizing GCC (Linaro) 4.9
Everything here is simple.MADDis just an instruction doing fused multiply/add (similar to theMLAwe already saw). All 3
arguments are passed in the 32-bit parts of X-registers. Indeed, the argument types are 32-bitint’s. The result is returned
inW0.
Listing 8.9: Optimizing GCC (Linaro) 4.9
f:
madd w0, w0, w1, w2
ret
main:
; save FP and LR to stack frame:
stp x29, x30, [sp, -16]!
mov w2, 3
mov w1, 2
add x29, sp, 0
mov w0, 1
bl f
mov w1, w0
adrp x0, .LC7
add x0, x0, :lo12:.LC7
bl printf
; return 0
mov w0, 0
; restore FP and LR
ldp x29, x30, [sp], 16
ret
.LC7:
.string "%d\n"
Let’s also extend all data types to 64-bituint64_tand test: