3.16. TOUPPER() FUNCTION
10 movsx eax, BYTE PTR c$[rsp]
11 sub eax, 32
12 jmp SHORT $LN3@toupper
13 jmp SHORT $LN1@toupper ; compiler artefact
14 $LN2@toupper:
15 movzx eax, BYTE PTR c$[rsp] ; unnecessary casting
16 $LN1@toupper:
17 $LN3@toupper: ; compiler artefact
18 ret 0
19 toupper ENDP
It’s important to notice that the input byte is loaded into a 64-bit local stack slot at line 3.
All the remaining bits ([8..63]) are untouched, i.e., contain some random noise (you’ll see it in debugger).
All instructions operate only on byte-level, so it’s fine.
The lastMOVZXinstruction at line 15 takes the byte from the local stack slot and zero-extends it to aint
32-bit data type.
Non-optimizing GCC does mostly the same:
Listing 3.71: Non-optimizing GCC 4.9 (x64)
toupper:
push rbp
mov rbp, rsp
mov eax, edi
mov BYTE PTR [rbp-4], al
cmp BYTE PTR [rbp-4], 96
jle .L2
cmp BYTE PTR [rbp-4], 122
jg .L2
movzx eax, BYTE PTR [rbp-4]
sub eax, 32
jmp .L3
.L2:
movzx eax, BYTE PTR [rbp-4]
.L3:
pop rbp
ret
One comparison operation
Optimizing MSVC does a better job, it generates only one comparison operation:
Listing 3.72: Optimizing MSVC 2013 (x64)
toupper PROC
lea eax, DWORD PTR [rcx-97]
cmp al, 25
ja SHORT $LN2@toupper
movsx eax, cl
sub eax, 32
ret 0
$LN2@toupper:
movzx eax, cl
ret 0
toupper ENDP
It was explained earlier how to replace the two comparison operations with a single one:3.10.2 on
page 505.
We will now rewrite this in C/C++:
int tmp=c-97;
if (tmp>25)
return c;
else
return c-32;