1.16. LOOPS
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
mov [esp+10h+var_10], 2
call printing_function
mov [esp+10h+var_10], 3
call printing_function
mov [esp+10h+var_10], 4
call printing_function
mov [esp+10h+var_10], 5
call printing_function
mov [esp+10h+var_10], 6
call printing_function
mov [esp+10h+var_10], 7
call printing_function
mov [esp+10h+var_10], 8
call printing_function
mov [esp+10h+var_10], 9
call printing_function
xor eax, eax
leave
retn
main endp
Huh, GCC just unwound our loop.
Loop unwindinghas an advantage in the cases when there aren’t much iterations and we could cut some
executiontimebyremovingallloopsupportinstructions. Ontheotherside, theresultingcodeisobviously
larger.
Big unrolled loops are not recommended in modern times, because bigger functions may require bigger
cache footprint^100.
OK, let’s increase the maximum value of theivariable to 100 and try again. GCC does:
Listing 1.166: GCC
public main
main proc near
var_20 = dword ptr -20h
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
push ebx
mov ebx, 2 ; i=2
sub esp, 1Ch
; aligning label loc_80484D0 (loop body begin) by 16-byte border:
nop
loc_80484D0:
; pass (i) as first argument to printing_function():
mov [esp+20h+var_20], ebx
add ebx, 1 ; i++
call printing_function
cmp ebx, 64h ; i==100?
jnz short loc_80484D0 ; if not, continue
add esp, 1Ch
xor eax, eax ; return 0
pop ebx
mov esp, ebp
pop ebp
retn
main endp
(^100) A very good article about it: [Ulrich Drepper,What Every Programmer Should Know About Memory, (2007)] (^101). Another
recommendations about loop unrolling from Intel are here: [[Intel® 64 and IA-32 Architectures Optimization Reference Manual,
(2014)]3.4.1.7].