1.17. MORE ABOUT STRINGS
int my_strlen (const char str)
{
const char eos = str;
while( *eos++ ) ;
return( eos - str - 1 );
}
int main()
{
// test
return my_strlen("hello!");
};
x86
Non-optimizing MSVC
Let’s compile:
_eos$ = -4 ; size = 4
_str$ = 8 ; size = 4
_strlen PROC
push ebp
mov ebp, esp
push ecx
mov eax, DWORD PTR _str$[ebp] ; place pointer to string from "str"
mov DWORD PTR eos$[ebp], eax ; place it to local variable "eos"
$LN2@strlen:
mov ecx, DWORD PTR _eos$[ebp] ; ECX=eos
; take 8-bit byte from address in ECX and place it as 32-bit value to EDX with sign⤦
Çextension
movsx edx, BYTE PTR [ecx]
mov eax, DWORD PTR _eos$[ebp] ; EAX=eos
add eax, 1 ; increment EAX
mov DWORD PTR eos$[ebp], eax ; place EAX back to "eos"
test edx, edx ; EDX is zero?
je SHORT $LN1@strlen ; yes, then finish loop
jmp SHORT $LN2@strlen ; continue loop
$LN1@strlen:
; here we calculate the difference between two pointers
mov eax, DWORD PTR _eos$[ebp]
sub eax, DWORD PTR _str$[ebp]
sub eax, 1 ; subtract 1 and return result
mov esp, ebp
pop ebp
ret 0
strlen ENDP
We get two new instructions here:MOVSXandTEST.
The first one—MOVSX—takes a byte from an address in memory and stores the value in a 32-bit register.
MOVSXstands forMOV with Sign-Extend.MOVSXsets the rest of the bits, from the 8th to the 31th, to 1 if
the source byte isnegativeor to 0 if ispositive.
And here is why.
By default, thechartype is signed in MSVC and GCC. If we have two values of which one ischarand the
other isint, (intis signed too), and if the first value contain -2 (coded as0xFE) and we just copy this byte
into theintcontainer, it makes0x000000FE, and this from the point of signedintview is 254, but not -2.
In signed int, -2 is coded as0xFFFFFFFE. So if we have to transfer0xFEfrom a variable ofchartype toint,
we have to identify its sign and extend it. That is whatMOVSXdoes.