3.20. PACKING 12-BIT VALUES INTO ARRAY
; W3 = W2+1 = idx1.5+1, i.e., offset of middle byte
6c: add x0, x0, offset of array within page
; X0 = address of array
70: lsr w4, w1, #4
; W4 = W1>>4 = val>>4
74: sxtw x3, w3
; X3 = sign-extended 32-bit W3 (idx1.5+1) to 64-bit
; sign-extension is needed here because the value will be used as offset within array,
; and negative offsets are possible in standard C/C++
78: ubfiz w1, w1, #4, #4
; W1 = W1<<4 = val<<4
; store W4 (val>>4) at X0+W2 = array + idx1.5, i.e., address of left byte:
7c: strb w4, [x0,w2,sxtw]
; load middle byte at X0+X3 = array+idx1.5+1
80: ldrb w2, [x0,x3]
; W2 = middle byte
84: and w2, w2, #0xf
; W2 = W2&0xF = middle_byte&0xF (high 4 bits in middle byte are dropped)
; merge parts of new version of middle byte:
88: orr w1, w2, w1
; W1 = W2|W1 = middle_byte&0xF | val<<4
; store W2 (new middle byte) at X0+X3 = array+idx*1.5+1
8c: strb w1, [x0,x3]
90: ret
; idx is odd, go on:
94: add w4, w2, #0x1
; W4 = W2+1 = idx1.5+1, i.e., offset of middle byte
98: adrp x0, page of array
9c: add x0, x0, offset of array within page
; X0 = address of array
a0: add w2, w2, #0x2
; W2 = W2+2 = idx1.5+2, i.e., offset of right byte
a4: sxtw x4, w4
; X4 = sign-extended 64-bit version of 32-bit W4
; load at X0+X4 = array+idx1.5+1:
a8: ldrb w3, [x0,x4]
; W3 = middle byte
ac: and w3, w3, #0xfffffff0
; W3 = W3&0xFFFFFFF0 = middle_byte&0xFFFFFFF0, i.e., clear lowest 4 bits
b0: orr w3, w3, w1, lsr #8
; W3 = W3|W1>>8 = middle_byte&0xFFFFFFF0 | val>>8
; store new version of middle byte at X0+X4=array+idx1.5+1:
b4: strb w3, [x0,x4]
; now store lowest 8 bits of val (in W1) at X0+W2=array+idx*1.5+2, i.e., place of right byte
; SXTW suffix means W2 will be sign-extended to 64-bit value before summing with X0
b8: strb w1, [x0,w2,sxtw]
bc: ret
3.20.11 Optimizing GCC 4.4.5 for MIPS
Needless to keep in mind that each instruction after jump/branch instruction is executed first. It’s called
branch delay slotin RISC CPUs lingo. To make things simpler, just swap instructions (mentally) in each
instruction pair which is started with branch or jump instruction.
MIPS has no flags (apparently, to simplify data dependencies), so branch instructions (likeBNE) does both
comparison and branching.
There is also GP (Global Pointer) set up code in the function prologue, which can be ignored so far.
Getter
get_from_array:
; $4 = idx
srl $2,$4,1
; $2 = $4>>1 = idx>>1
