1.19. FLOATING-POINT UNIT
So, we see here new some registers used, with D prefix.
These are 64-bit registers, there are 32 of them, and they can be used both for floating-point numbers
(double) but also for SIMD (it is called NEON here in ARM).
There are also 32 32-bit S-registers, intended to be used for single precision floating pointer numbers
(float).
Itiseasytomemorize: D-registersarefordoubleprecisionnumbers,whileS-registers—forsingleprecision
numbers. More about it:.2.3 on page 1040.
Both constants (3.14 and 4.1) are stored in memory in IEEE 754 format.
VLDRandVMOV, as it can be easily deduced, are analogous to theLDRandMOVinstructions, but they work
with D-registers.
It has to be noted that these instructions, just like the D-registers, are intended not only for floating point
numbers, but can be also used for SIMD (NEON) operations and this will also be shown soon.
The arguments are passed to the function in a common way, via the R-registers, however each number
that has double precision has a size of 64 bits, so two R-registers are needed to pass each one.
VMOV D17, R0, R1at the start, composes two 32-bit values fromR0andR1into one 64-bit value and
saves it toD17.
VMOV R0, R1, D16is the inverse operation: what has been inD16is split in two registers,R0andR1,
because a double-precision number that needs 64 bits for storage, is returned inR0andR1.
VDIV,VMULandVADD,areinstructionforprocessingfloating pointnumbersthatcomputequotient,product
and sum, respectively.
The code for Thumb-2 is same.
ARM: Optimizing Keil 6/2013 (Thumb mode)
f
PUSH {R3-R7,LR}
MOVS R7, R2
MOVS R4, R3
MOVS R5, R0
MOVS R6, R1
LDR R2, =0x66666666 ; 4.1
LDR R3, =0x40106666
MOVS R0, R7
MOVS R1, R4
BL aeabi_dmul
MOVS R7, R0
MOVS R4, R1
LDR R2, =0x51EB851F ; 3.14
LDR R3, =0x40091EB8
MOVS R0, R5
MOVS R1, R6
BL __aeabi_ddiv
MOVS R2, R7
MOVS R3, R4
BL aeabi_dadd
POP {R3-R7,PC}
; 4.1 in IEEE 754 form:
dword_364 DCD 0x66666666 ; DATA XREF: f+A
dword_368 DCD 0x40106666 ; DATA XREF: f+C
; 3.14 in IEEE 754 form:
dword_36C DCD 0x51EB851F ; DATA XREF: f+1A
dword_370 DCD 0x40091EB8 ; DATA XREF: f+1C
Keil generated code for a processor without FPU or NEON support.
Thedouble-precisionfloating-pointnumbersarepassedviagenericR-registers,andinsteadofFPU-instructions,
service library functions are called
(likeaeabi_dmul,__aeabi_ddiv,aeabi_dadd) which emulate multiplication, division and addition
for floating-point numbers.