Assembly Language for Beginners

(nextflipdebug2) #1

3.20. PACKING 12-BIT VALUES INTO ARRAY


TST r0,#1
LDR r12,|array|
; R12 = address of array
LSR r0,r0,#1
; R0 = R0>>1 = idx>>1
ADD r0,r0,r0,LSL #1
; R0 = R0+R0<<1 = R0+R02 = R03 = (idx>>1)3 = idx/23 = idx1.5
ADD r3,r2,r2,LSL #1
; R3 = R2+R2<<1 = R2+R2
2 = R23 = (idx>>1)3 = idx/23 = idx1.5
ADD r0,r0,r12
; R0 = R0+R12 = idx1.5 + array
; jump if idx is even:
BEQ |L0.56|
; idx is odd, go on:
; load middle byte at R0+1:
LDRB r3,[r0,#1]
; R3 = middle byte
AND r3,r3,#0xf0
; R3 = R3&0xF0 = middle_byte&0xF0
ORR r2,r3,r1,LSR #8
; R2 = R3 | R1>>8 = middle_byte&0xF0 | val>>8
; store middle_byte&0xF0 | val>>8 at R0+1 (at the place of middle byte):
STRB r2,[r0,#1]
; store low 8 bits of val (or val&0xFF) at R0+2 (at the place of right byte):
STRB r1,[r0,#2]
BX lr
|L0.56|
; idx is even, go on:
LSR r2,r1,#4
; R2 = R1>>4 = val>>4
; store val>>4 at R12+R3 or array + idx
1.5 (place of left byte):
STRB r2,[r12,r3]
; load byte at R0+1 (middle byte):
LDRB r2,[r0,#1]
; R2 = middle_byte
; drop high 4 bits of middle byte:
AND r2,r2,#0xf
; R2 = R2&0xF = middle_byte&0xF
; update middle byte:
ORR r1,r2,r1,LSL #4
; R1 = R2 | R1<<4 = middle_byte&0xF | val<<4
; store updated middle byte at R0+1:
STRB r1,[r0,#1]
BX lr
ENDP


Value ofidx*1.5is calculated twice, this is redundancy Keil compiler produced can be eliminated. You can
rework assembly function as well to make it shorter. Do not forget about tests!


3.20.9 (32-bit ARM) Comparison of code density in Thumb and ARM modes


Thumb mode in ARM CPUs was introduced to make instructions shorter (16-bits) instead of 32-bit instruc-
tions in ARM mode. But as we can see, it’s hard to say, if it was worth it: code in ARM mode is always
shorter (however, instructions are longer).


3.20.10 Optimizing GCC 4.9.3 for ARM64.


Getter


:
; W0 = idx
0: lsr w2, w0, #1
; W2 = W0>>1 = idx>>1
4: lsl w1, w2, #2
; W1 = W2<<2 = (W0>>1)<<2 = (idx&(~1))<<1
Free download pdf