3.20. PACKING 12-BIT VALUES INTO ARRAY
; sign-extend address in EAX to RDX:
cdqe
; prepare left byte in ECX by shifting it:
shr ecx, 4
; prepare 4 bits for middle byte:
sal esi, 4
; put left byte:
mov BYTE PTR array[rdx], cl
; load middle byte (its address still in RAX):
movzx edx, BYTE PTR array[rax]
; drop high 4 bits:
and edx, 15 ; 15=0xF
; merge our data and low 4 bits which were there before:
or esi, edx
; put middle byte back:
mov BYTE PTR array[rax], sil
ret
.L5:
; this is odd element, go on
; calculate address of middle byte and put it to ECX:
lea ecx, [rax+1]
; copy val value from ESI to EDI:
mov edi, esi
; calculate address of right byte:
add eax, 2
; get high 4 bits of input value by shifting it 8 bits right:
shr edi, 8
; sign-extend address in EAX into RAX:
cdqe
; sign-extend address of middle byte in ECX to RCX:
movsx rcx, ecx
; load middle byte into EDX:
movzx edx, BYTE PTR array[rcx]
; drop low 4 bits in middle byte:
and edx, -16 ; -16=0xF0
; merge data from input val with what was in middle byte before:
or edx, edi
; store middle byte:
mov BYTE PTR array[rcx], dl
; store right byte. val is still in ESI and SIL is a part of ESI register which has lowest 8⤦
Çbits:
mov BYTE PTR array[rax], sil
ret
Other comments
All addresses in Linux x64 are 64-bit ones, so during pointer arithmetic, all values should also be 64-bit.
The code calculating offsets inside of array operates on 32-bit values (inputidxargument has type of
int, which has width of 32 bits), and so these values must be converted to 64-bit addresses before actual
memoryload/store. Sotherearealotofsign-extendinginstructions(likeCDQE,MOVSX)usedforconversion.
Why to extend sign? By C/C++ standards, pointer arithmetic can operate on negative values (it’s possible
to access array using negative index likearray[-123], see:3.19 on page 593). Since GCC compiler cannot
be sure if all indices are always positive, it adds sign-extending instructions.
3.20.7 Optimizing Keil 5.05 (Thumb mode).
Getter
The following code has final OR operation in the function epilogue. Indeed, it executes at the end of both
branches, so it’s possible to save some space.
get_from_array PROC
; R0 = idx
PUSH {r4,r5,lr}
LSRS r1,r0,#1