Assembly Language for Beginners

(nextflipdebug2) #1

3.27. OPENMP


add esp, 12
; Line 56
mov ecx, DWORD PTR _checked
mov eax, ecx
cdq
mov esi, 100000 ; 000186a0H
idiv esi
test edx, edx
jne SHORT $LN1@check_nonc
; Line 57
push ecx
push OFFSET ??_C@_0M@NPNHLIOO@checked?$DN?$CFd?6?$AA@
call _printf
pop ecx
pop ecx
$LN1@checknonc:
push DWORD PTR
$vcomp$critsect$
call __vcomp_leave_critsect
pop ecx


As it turns out, thevcomp_atomic_add_i4()function in the vcomp*.dll is just a tiny function with theLOCK
XADDinstruction^59 in it.


vcomp_enter_critsect()eventually calling win32APIfunction
EnterCriticalSection()^60.


3.27.2 GCC


GCC 4.8.1 produces a program which shows exactly the same statistics table,


so, GCC’s implementation divides the loop in parts in the same fashion.


Listing 3.125: GCC 4.8.1
mov edi, OFFSET FLAT:main._omp_fn.0
call GOMP_parallel_start
mov edi, 0
call main._omp_fn.0
call GOMP_parallel_end

Unlike MSVC’s implementation, what GCC code does is to start 3 threads, and run the fourth in the current
thread. So there are 4 threads instead of the 5 in MSVC.


Here is themain._omp_fn.0function:


Listing 3.126: GCC 4.8.1

main._omp_fn.0:
push rbp
mov rbp, rsp
push rbx
sub rsp, 40
mov QWORD PTR [rbp-40], rdi
call omp_get_num_threads
mov ebx, eax
call omp_get_thread_num
mov esi, eax
mov eax, 2147483647 ; 0x7FFFFFFF
cdq
idiv ebx
mov ecx, eax
mov eax, 2147483647 ; 0x7FFFFFFF
cdq
idiv ebx
mov eax, edx
cmp esi, eax
jl .L15


(^59) Read more about LOCK prefix:.1.6 on page 1026
(^60) You can read more about critical sections here:6.5.4 on page 787

Free download pdf