3.20. PACKING 12-BIT VALUES INTO ARRAY
0 0000 even
1 0001 odd
2 0010 even
3 0011 odd
4 0100 even
5 0101 odd
6 0110 even
7 0111 odd
8 1000 even
9 1001 odd
10 1010 even
11 1011 odd
12 1100 even
13 1101 odd
14 1110 even
15 1111 odd
Zero is also even number, it’s so in two’s complement system, where it’s located between two odd num-
bers (-1 and 1).
For math geeks, I could also say that even or odd sign is also remainder of division by 2. Division of a
number by 2 is merely dropping the last bit, which is remainder of division. Well, we do not have to do
shifts here, just isolate lowest bit.
Iftheelementisodd, wetakemiddleandrightbytes(array[array_idx+1]andarray[array_idx+2]). Lowest
4 bits of the middle byte is isolated. Right byte is taken as a whole. Now we have to combine these parts
into 12-bit value. To do so, shift 4 bits from the middle byte by 8 bits left, so these 4 bits will be allocated
right behind highest 8th bit of byte. Then, using OR operation, we just add these parts.
If the element is even, high 8 bits of 12-bit value is located in left byte, while lowest 4 bits are located in
the high 4 bits of middle byte. We isolate highest 4 bits in the middle byte by shifting it 4 bits right and
then applying AND operation, just to be sure that nothing is left there. We also take left byte and shift
its value 4 bits left, because it has bits from 11th to 4th (inclusive, starting at 0), Using OR operation, we
combine these two parts.
Setter
Setter calculates triplet’s address in the same way. It also operates on left/right bytes in the same way.
But it’s not correct just to write to the middle byte, because write operation will destroy the information
related to the other element. So the common way is to load byte, drop bits where you’ll write, write it
there, but leave other part intact. Using AND operation (&in C/C++), we drop everything exceptourpart.
Using OR operation (|in C/C++), we then update the middle byte.
3.20.6 Optimizing GCC 4.8.2 for x86-64.
Let’s see what optimizing GCC 4.8.2 for Linux x64 will do. I added comments. Sometimes readers are
confused because instructions order is not logical. It’s OK, because optimizing compilers take CPU out-of-
order-executionmechanismsintoconsiderations, andsometimes, swappedinstructionsperformingbetter.
Getter
get_from_array:
; EDI=idx
; make a copy:
mov eax, edi
; calculate idx>>1:
shr eax
; determine if the element even or odd by isolation of the lowest bit:
and edi, 1
; calculate (idx>>1)3.
; multiplication is slow in geenral, and can be replaced by one shifting and addition operation
; LEA is capable to do both:
lea edx, [rax+rax2]