3.11. INLINE FUNCTIONS
Listing 3.43: memcpy() example
void memcpy_7(char inbuf, char outbuf)
{
memcpy(outbuf+10, inbuf, 7);
};
Listing 3.44: Optimizing MSVC 2010
_inbuf$ = 8 ; size = 4
_outbuf$ = 12 ; size = 4
_memcpy_7 PROC
mov ecx, DWORD PTR _inbuf$[esp-4]
mov edx, DWORD PTR [ecx]
mov eax, DWORD PTR _outbuf$[esp-4]
mov DWORD PTR [eax+10], edx
mov dx, WORD PTR [ecx+4]
mov WORD PTR [eax+14], dx
mov cl, BYTE PTR [ecx+6]
mov BYTE PTR [eax+16], cl
ret 0
_memcpy_7 ENDP
Listing 3.45: Optimizing GCC 4.8.1
memcpy_7:
push ebx
mov eax, DWORD PTR [esp+8]
mov ecx, DWORD PTR [esp+12]
mov ebx, DWORD PTR [eax]
lea edx, [ecx+10]
mov DWORD PTR [ecx+10], ebx
movzx ecx, WORD PTR [eax+4]
mov WORD PTR [edx+4], cx
movzx eax, BYTE PTR [eax+6]
mov BYTE PTR [edx+6], al
pop ebx
ret
That’s usually done as follows: 4-byte blocks are copied first, then a 16-bit word (if needed), then the last
byte (if needed).
Structures are also copied usingMOV:1.24.4 on page 361.
Long blocks
The compilers behave differently in this case.
Listing 3.46: memcpy() example
void memcpy_128(char inbuf, char outbuf)
{
memcpy(outbuf+10, inbuf, 128);
};
void memcpy_123(char inbuf, char outbuf)
{
memcpy(outbuf+10, inbuf, 123);
};
For copying 128 bytes, MSVC uses a singleMOVSDinstruction (because 128 divides evenly by 4):
Listing 3.47: Optimizing MSVC 2010
_inbuf$ = 8 ; size = 4
_outbuf$ = 12 ; size = 4
_memcpy_128 PROC
push esi
mov esi, DWORD PTR _inbuf$[esp]