Reverse Engineering for Beginners

(avery) #1

CHAPTER 92. OPENMP CHAPTER 92. OPENMP


found (thread 3): [hello, world!_1611446522]. seconds spent=3
min[0]=0x00000000 max[0]=0x1fffffff
min[1]=0x20000000 max[1]=0x3fffffff
min[2]=0x40000000 max[2]=0x5fffffff
min[3]=0x60000000 max[3]=0x7ffffffe


Yes, the result is correct, the first 3 bytes are zeroes:


C:...\sha512sum test
000000f4a8fac5a4ed38794da4c1e39f54279ad5d9bb3c5465cdf57adaf60403
df6e3fe6019f5764fc9975e505a7395fed780fee50eb38dd4c0279cb114672e2 *test


The running time is≈ 2 :: 3 seconds on 4-core Intel Xeon E3-1220 3.10 GHz. In the task manager we see 5 threads: 1 main
thread + 4 more. No further optimizations are done to keep this example as small and clear as possible. But probably it can
be done much faster. MyCPUhas 4 cores, that is why OpenMP started exactly 4 threads.


By looking at the statistics table we can clearly see how the loop was sliced in 4 even parts. Oh well, almost even, if we
don’t consider the last bit.


There are also pragmas foratomic operations.


Let’s see how this code is compiled:


#pragma omp atomic
checked++;

#pragma omp critical
if ((checked % 100000)==0)
printf ("checked=%d\n", checked);

Listing 92.3: MSVC 2012
push edi
push OFFSET _checked
call vcomp_atomic_addi4
; Line 55
push OFFSET
$vcomp$critsect$
call
vcomp_enter_critsect
add esp, 12 ; 0000000cH
; Line 56
mov ecx, DWORD PTR _checked
mov eax, ecx
cdq
mov esi, 100000 ; 000186a0H
idiv esi
test edx, edx
jne SHORT $LN1@check_nonc
; Line 57
push ecx
push OFFSET ??_C@_0M@NPNHLIOO@checked?$DN?$CFd?6?$AA@
call _printf
pop ecx
pop ecx
$LN1@checknonc:
push DWORD PTR
$vcomp$critsect$
call __vcomp_leave_critsect
pop ecx


As it turns out, thevcomp_atomic_add_i4()function in the vcomp*.dll is just a tiny function with theLOCK XADD
instruction^4 in it.


vcomp_enter_critsect()eventually calling win32APIfunctionEnterCriticalSection()^5.


92.2 GCC


GCC 4.8.1 produces a program which shows exactly the same statistics table, so, GCC’s implementation divides the loop in
parts in the same fashion.


(^4) Read more about LOCK prefix:A.6.1 on page 885
(^5) You can read more about critical sections here:68.4 on page 699

Free download pdf