3.11. INLINE FUNCTIONS
Listing 3.39: Optimizing GCC 4.9.1 x64
f:
mov QWORD PTR [rdi], 0
mov QWORD PTR [rdi+8], 0
mov QWORD PTR [rdi+16], 0
mov QWORD PTR [rdi+24], 0
ret
By the way, that remind us of unrolled loops:1.16.1 on page 192.
Example#2
Listing 3.40: 67 bytes
#include <stdio.h>
void f(char *out)
{
memset(out, 0, 67);
};
When the block size is not a multiple of 4 or 8, the compilers can behave differently.
For instance, MSVC 2012 continues to insertMOVs:
Listing 3.41: Optimizing MSVC 2012 x64
out$ = 8
f PROC
xor eax, eax
mov QWORD PTR [rcx], rax
mov QWORD PTR [rcx+8], rax
mov QWORD PTR [rcx+16], rax
mov QWORD PTR [rcx+24], rax
mov QWORD PTR [rcx+32], rax
mov QWORD PTR [rcx+40], rax
mov QWORD PTR [rcx+48], rax
mov QWORD PTR [rcx+56], rax
mov WORD PTR [rcx+64], ax
mov BYTE PTR [rcx+66], al
ret 0
f ENDP
...while GCC usesREP STOSQ, concluding that this would be shorter than a pack ofMOVs:
Listing 3.42: Optimizing GCC 4.9.1 x64
f:
mov QWORD PTR [rdi], 0
mov QWORD PTR [rdi+59], 0
mov rcx, rdi
lea rdi, [rdi+8]
xor eax, eax
and rdi, -8
sub rcx, rdi
add ecx, 67
shr ecx, 3
rep stosq
ret
memcpy()
Short blocks
The routine to copy short blocks is often implemented as a sequence ofMOVinstructions.