Reverse Engineering for Beginners

(avery) #1

CHAPTER 25. SIMD CHAPTER 25. SIMD


exception is raised in case of overflow and no flags are to be set, just the low 32 bits of the result are to be stored. If
one ofPADDD’s operands is the address of a value in memory, then the address must be aligned on a 16-byte boundary.
If it is not aligned, an exception will be triggered^5.


  • MOVDQA(Move Aligned Double Quadword) is the same asMOVDQU, but requires the address of the value in memory to
    be aligned on a 16-bit boundary. If it is not aligned, exception will be raised.MOVDQAworks faster thanMOVDQU, but
    requires aforesaid.


So, these SSE2-instructions are to be executed only in case there are more than 4 pairs to work on and the pointerar3is
aligned on a 16-byte boundary.


Also, ifar2is aligned on a 16-byte boundary as well, this fragment of code is to be executed:


movdqu xmm0, xmmword ptr [ebx+edi4] ; ar1+i4
paddd xmm0, xmmword ptr [esi+edi4] ; ar2+i4
movdqa xmmword ptr [eax+edi4], xmm0 ; ar3+i4


Otherwise, the value fromar2is to be loaded intoXMM0usingMOVDQU, which does not require aligned pointer, but may
work slower:


movdqu xmm1, xmmword ptr [ebx+edi4] ; ar1+i4
movdqu xmm0, xmmword ptr [esi+edi4] ; ar2+i4 is not 16-byte aligned, so load it to XMM0
paddd xmm1, xmm0
movdqa xmmword ptr [eax+edi4], xmm1 ; ar3+i4


In all other cases, non-SSE2 code is to be executed.


GCC


GCC may also vectorize in simple cases^6 , if the-O3option is used and SSE2 support is turned on:-msse2.


What we get (GCC 4.4.1):


; f(int, int , int , int *)
public _Z1fiPiSS
_Z1fiPiSS proc near


var_18 = dword ptr -18h
var_14 = dword ptr -14h
var_10 = dword ptr -10h
arg_0 = dword ptr 8
arg_4 = dword ptr 0Ch
arg_8 = dword ptr 10h
arg_C = dword ptr 14h


push ebp
mov ebp, esp
push edi
push esi
push ebx
sub esp, 0Ch
mov ecx, [ebp+arg_0]
mov esi, [ebp+arg_4]
mov edi, [ebp+arg_8]
mov ebx, [ebp+arg_C]
test ecx, ecx
jle short loc_80484D8
cmp ecx, 6
lea eax, [ebx+10h]
ja short loc_80484E8

loc_80484C1: ; CODE XREF: f(int,int ,int ,int )+4B
; f(int,int
,int ,int )+61 ...
xor eax, eax
nop
lea esi, [esi+0]


loc_80484C8: ; CODE XREF: f(int,int ,int ,int *)+36


(^5) More about data alignment:Wikipedia: Data structure alignment
(^6) More about GCC vectorization support:http://go.yurichev.com/17083

Free download pdf