CHAPTER 43. INLINE FUNCTIONS CHAPTER 43. INLINE FUNCTIONS
Listing 43.10: 32 bytes
#include <stdio.h>
void f(char *out)
{
memset(out, 0, 32);
};
Many compilers don’t generate a call to memset() for short blocks, but rather insert a pack ofMOVs:
Listing 43.11: Optimizing GCC 4.9.1 x64
f:
mov QWORD PTR [rdi], 0
mov QWORD PTR [rdi+8], 0
mov QWORD PTR [rdi+16], 0
mov QWORD PTR [rdi+24], 0
ret
By the way, that remind us of unrolled loops:14.1.4 on page 180.
Example#2
Listing 43.12: 67 bytes
#include <stdio.h>
void f(char *out)
{
memset(out, 0, 67);
};
When the block size is not a multiple of 4 or 8, the compilers can behave differently.
For instance, MSVC 2012 continues to insertMOVs:
Listing 43.13: Optimizing MSVC 2012 x64
out$ = 8
f PROC
xor eax, eax
mov QWORD PTR [rcx], rax
mov QWORD PTR [rcx+8], rax
mov QWORD PTR [rcx+16], rax
mov QWORD PTR [rcx+24], rax
mov QWORD PTR [rcx+32], rax
mov QWORD PTR [rcx+40], rax
mov QWORD PTR [rcx+48], rax
mov QWORD PTR [rcx+56], rax
mov WORD PTR [rcx+64], ax
mov BYTE PTR [rcx+66], al
ret 0
f ENDP
...while GCC usesREP STOSQ, concluding that this would be shorter than a pack ofMOVs:
Listing 43.14: Optimizing GCC 4.9.1 x64
f:
mov QWORD PTR [rdi], 0
mov QWORD PTR [rdi+59], 0
mov rcx, rdi
lea rdi, [rdi+8]
xor eax, eax
and rdi, -8
sub rcx, rdi
add ecx, 67
shr ecx, 3
rep stosq
ret