3.16. TOUPPER() FUNCTION
Thetmpvariable must be signed.
This makes two subtraction operations in case of a transformation plus one comparison.
In contrast the original algorithm uses two comparison operations plus one subtracting.
Optimizing GCC is even better, it gets rid of the jumps (which is good:2.10.1 on page 466) by using the
CMOVcc instruction:
Listing 3.73: Optimizing GCC 4.9 (x64)
1 toupper:
2 lea edx, [rdi-97] ; 0x61
3 lea eax, [rdi-32] ; 0x20
4 cmp dl, 25
5 cmova eax, edi
6 ret
At line 3 the code prepares the subtracted value in advance, as if the conversion will always happen.
Atline5thesubtractedvalueinEAXisreplacedbytheuntouchedinputvalueifaconversionisnotneeded.
And then this value (of course incorrect) is dropped.
Advance subtracting is a price the compiler pays for the absence of conditional jumps.
3.16.2 ARM
Optimizing Keil for ARM mode also generates only one comparison:
Listing 3.74: Optimizing Keil 6/2013 (ARM mode)
toupper PROC
SUB r1,r0,#0x61
CMP r1,#0x19
SUBLS r0,r0,#0x20
ANDLS r0,r0,#0xff
BX lr
ENDP
The SUBLS and ANDLS instructions are executed only if the value inR1is less than 0x19 (or equal). They
also do the actual conversion.
Optimizing Keil for Thumb mode generates only one comparison operation as well:
Listing 3.75: Optimizing Keil 6/2013 (Thumb mode)
toupper PROC
MOVS r1,r0
SUBS r1,r1,#0x61
CMP r1,#0x19
BHI |L0.14|
SUBS r0,r0,#0x20
LSLS r0,r0,#24
LSRS r0,r0,#24
|L0.14|
BX lr
ENDP
The last two LSLS and LSRS instructions work likeAND reg, 0xFF: they are equivalent to the C/C++-
expression(i<<24)>> 24.
Seems like that Keil for Thumb mode deduced that two 2-byte instructions are shorter than the code that
loads the 0xFF constant into a register plus an AND instruction.
GCC for ARM64
Listing 3.76: Non-optimizing GCC 4.9 (ARM64)
toupper:
sub sp, sp, #16