Assembly Language for Beginners

(nextflipdebug2) #1

3.11. INLINE FUNCTIONS


Listing 3.39: Optimizing GCC 4.9.1 x64

f:
mov QWORD PTR [rdi], 0
mov QWORD PTR [rdi+8], 0
mov QWORD PTR [rdi+16], 0
mov QWORD PTR [rdi+24], 0
ret


By the way, that remind us of unrolled loops:1.16.1 on page 192.


Example#2


Listing 3.40: 67 bytes

#include <stdio.h>


void f(char *out)
{
memset(out, 0, 67);
};


When the block size is not a multiple of 4 or 8, the compilers can behave differently.


For instance, MSVC 2012 continues to insertMOVs:


Listing 3.41: Optimizing MSVC 2012 x64

out$ = 8
f PROC
xor eax, eax
mov QWORD PTR [rcx], rax
mov QWORD PTR [rcx+8], rax
mov QWORD PTR [rcx+16], rax
mov QWORD PTR [rcx+24], rax
mov QWORD PTR [rcx+32], rax
mov QWORD PTR [rcx+40], rax
mov QWORD PTR [rcx+48], rax
mov QWORD PTR [rcx+56], rax
mov WORD PTR [rcx+64], ax
mov BYTE PTR [rcx+66], al
ret 0
f ENDP


...while GCC usesREP STOSQ, concluding that this would be shorter than a pack ofMOVs:


Listing 3.42: Optimizing GCC 4.9.1 x64

f:
mov QWORD PTR [rdi], 0
mov QWORD PTR [rdi+59], 0
mov rcx, rdi
lea rdi, [rdi+8]
xor eax, eax
and rdi, -8
sub rcx, rdi
add ecx, 67
shr ecx, 3
rep stosq
ret


memcpy()


Short blocks


The routine to copy short blocks is often implemented as a sequence ofMOVinstructions.

Free download pdf