1.10. ACCESSING PASSED ARGUMENTS
Multiplication and addition at once (Fused multiply–add) is a very useful operation. By the way, there was
no such instruction in x86 before FMA-instructions appeared in SIMD^84.
The very firstMOV R3, R0, instruction is, apparently, redundant (a singleMLAinstruction could be used
here instead). The compiler has not optimized it, since this is non-optimizing compilation.
TheBXinstruction returns the control to the address stored in theLRregister and, if necessary, switches
the processor mode from Thumb to ARM or vice versa. This can be necessary since, as we can see,
functionf()is not aware from what kind of code it may be called, ARM or Thumb. Thus, if it gets called
from Thumb code,BXis not only returns control to the calling function, but also switches the processor
mode to Thumb. Or not switch, if the function has been called from ARM code [ARM(R) Architecture
Reference Manual, ARMv7-A and ARMv7-R edition, (2012)A2.3.2].
Optimizing Keil 6/2013 (ARM mode)
.text:00000098 f
.text:00000098 91 20 20 E0 MLA R0, R1, R0, R2
.text:0000009C 1E FF 2F E1 BX LR
And here is thef()function compiled by the Keil compiler in full optimization mode (-O3).
TheMOVinstruction was optimized out (or reduced) and nowMLAuses all input registers and also places
the result right intoR0, exactly where the calling function will read and use it.
Optimizing Keil 6/2013 (Thumb mode)
.text:0000005E 48 43 MULS R0, R1
.text:00000060 80 18 ADDS R0, R0, R2
.text:00000062 70 47 BX LR
TheMLAinstruction is not available in Thumb mode, so the compiler generates the code doing these two
operations (multiplication and addition) separately.
First theMULSinstruction multipliesR0byR1, leaving the result in registerR0. The second instruction
(ADDS) adds the result andR2leaving the result in registerR0.
ARM64
Optimizing GCC (Linaro) 4.9
Everything here is simple. MADDis just an instruction doing fused multiply/add (similar to theMLAwe
already saw). All 3 arguments are passed in the 32-bit parts of X-registers. Indeed, the argument types
are 32-bitint’s. The result is returned inW0.
Listing 1.94: Optimizing GCC (Linaro) 4.9
f:
madd w0, w0, w1, w2
ret
main:
; save FP and LR to stack frame:
stp x29, x30, [sp, -16]!
mov w2, 3
mov w1, 2
add x29, sp, 0
mov w0, 1
bl f
mov w1, w0
adrp x0, .LC7
add x0, x0, :lo12:.LC7
bl printf
; return 0
(^84) wikipedia