1.15. SWITCH()/CASE/DEFAULT
There we get an index for the second table of code pointers and we jump to it (line 14).
What is also worth noting is that there is no case for input value 0.
That’s why we see theDECinstruction at line 10, and the table starts ata= 1, because there is no need to
allocate a table element fora= 0.
This is a very widespread pattern.
So why is this economical? Why isn’t it possible to make it as before (1.15.2 on page 172), just with one
table consisting of block pointers? The reason is that the elements in index table are 8-bit, hence it’s all
more compact.
GCC
GCC does the job in the way we already discussed (1.15.2 on page 172), using just one table of pointers.
ARM64: Optimizing GCC 4.9.1
There is no code to be triggered if the input value is 0, so GCC tries to make the jump table more compact
and so it starts at 1 as an input value.
GCC 4.9.1 for ARM64 uses an even cleverer trick. It’s able to encode all offsets as 8-bit bytes.
Let’s recall that all ARM64 instructions have a size of 4 bytes.
GCC is uses the fact that all offsets in my tiny example are in close proximity to each other. So the jump
table consisting of single bytes.
Listing 1.159: Optimizing GCC 4.9.1 ARM64
f14:
; input value in W0
sub w0, w0, #1
cmp w0, 21
; branch if less or equal (unsigned):
bls .L9
.L2:
; print "default":
adrp x0, .LC4
add x0, x0, :lo12:.LC4
b puts
.L9:
; load jumptable address to X1:
adrp x1, .L4
add x1, x1, :lo12:.L4
; W0=input_value-1
; load byte from the table:
ldrb w0, [x1,w0,uxtw]
; load address of the Lrtx label:
adr x1, .Lrtx4
; multiply table element by 4 (by shifting 2 bits left) and add (or subtract) to the address of⤦
Ç Lrtx:
add x0, x1, w0, sxtb #2
; jump to the calculated address:
br x0
; this label is pointing in code (text) segment:
.Lrtx4:
.section .rodata
; everything after ".section" statement is allocated in the read-only data (rodata) segment:
.L4:
.byte (.L3 - .Lrtx4) / 4 ; case 1
.byte (.L3 - .Lrtx4) / 4 ; case 2
.byte (.L5 - .Lrtx4) / 4 ; case 3
.byte (.L5 - .Lrtx4) / 4 ; case 4
.byte (.L5 - .Lrtx4) / 4 ; case 5
.byte (.L5 - .Lrtx4) / 4 ; case 6
.byte (.L3 - .Lrtx4) / 4 ; case 7
.byte (.L6 - .Lrtx4) / 4 ; case 8
.byte (.L6 - .Lrtx4) / 4 ; case 9