Assembly Language for Beginners

(nextflipdebug2) #1

1.19. FLOATING-POINT UNIT


Of course, that is slower than FPU-coprocessor, but it’s still better than nothing.


By the way, similar FPU-emulating libraries were very popular in the x86 world when coprocessors were
rare and expensive, and were installed only on expensive computers.


The FPU-coprocessor emulation is calledsoft floatorarmel(emulation) in the ARM world, while using the
coprocessor’s FPU-instructions is calledhard floatorarmhf.


ARM64: Optimizing GCC (Linaro) 4.9


Very compact code:


Listing 1.204: Optimizing GCC (Linaro) 4.9

f:
; D0 = a, D1 = b
ldr d2, .LC25 ; 3.14
; D2 = 3.14
fdiv d0, d0, d2
; D0 = D0/D2 = a/3.14
ldr d2, .LC26 ; 4.1
; D2 = 4.1
fmadd d0, d1, d2, d0
; D0 = D1D2+D0 = b4.1+a/3.14
ret


; constants in IEEE 754 format:
.LC25:
.word 1374389535 ; 3.14
.word 1074339512
.LC26:
.word 1717986918 ; 4.1
.word 1074816614


ARM64: Non-optimizing GCC (Linaro) 4.9


Listing 1.205: Non-optimizing GCC (Linaro) 4.9

f:
sub sp, sp, #16
str d0, [sp,8] ; save "a" in Register Save Area
str d1, [sp] ; save "b" in Register Save Area
ldr x1, [sp,8]
; X1 = a
ldr x0, .LC25
; X0 = 3.14
fmov d0, x1
fmov d1, x0
; D0 = a, D1 = 3.14
fdiv d0, d0, d1
; D0 = D0/D1 = a/3.14


fmov x1, d0
; X1 = a/3.14
ldr x2, [sp]
; X2 = b
ldr x0, .LC26
; X0 = 4.1
fmov d0, x2
; D0 = b
fmov d1, x0
; D1 = 4.1
fmul d0, d0, d1
; D0 = D0D1 = b4.1


fmov x0, d0
; X0 = D0 = b*4.1
fmov d0, x1

Free download pdf