CHAPTER 14. LOOPS CHAPTER 14. LOOPS
What happens here is that space for theivariable is not allocated in the local stack anymore, but uses an individual register
for it,ESI. This is possible in such small functions where there aren’t many local variables.
One very important thing is that thef()function must not change the value inESI. Our compiler is sure here. And if the
compiler decides to use theESIregister inf()too, its value would have to be saved at the function’s prologue and restored
at the function’s epilogue, almost like in our listing: please notePUSH ESI/POP ESIat the function start and end.
Let’s try GCC 4.4.1 with maximal optimization turned on (-O3option):
Listing 14.4: Optimizing GCC 4.4.1
main proc near
var_10 = dword ptr -10h
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
sub esp, 10h
mov [esp+10h+var_10], 2
call printing_function
mov [esp+10h+var_10], 3
call printing_function
mov [esp+10h+var_10], 4
call printing_function
mov [esp+10h+var_10], 5
call printing_function
mov [esp+10h+var_10], 6
call printing_function
mov [esp+10h+var_10], 7
call printing_function
mov [esp+10h+var_10], 8
call printing_function
mov [esp+10h+var_10], 9
call printing_function
xor eax, eax
leave
retn
main endp
Huh, GCC just unwound our loop.
Loop unwindinghas an advantage in the cases when there aren’t much iterations and we could cut some execution time by
removing all loop support instructions. On the other side, the resulting code is obviously larger.
Big unrolled loops are not recommended in modern times, because bigger functions may require bigger cache footprint^1.
OK, let’s increase the maximum value of theivariable to 100 and try again. GCC does:
Listing 14.5: GCC
public main
main proc near
var_20 = dword ptr -20h
push ebp
mov ebp, esp
and esp, 0FFFFFFF0h
push ebx
mov ebx, 2 ; i=2
sub esp, 1Ch
; aligning label loc_80484D0 (loop body begin) by 16-byte border:
nop
loc_80484D0:
; pass (i) as first argument to printing_function():
mov [esp+20h+var_20], ebx
add ebx, 1 ; i++
call printing_function
(^1) A very good article about it: [Dre07]. Another recommendations about loop unrolling from Intel are here : [Int14, p. 3.4.1.7].