Reverse Engineering for Beginners

CHAPTER 39. LOOPS: SEVERAL ITERATORS CHAPTER 39. LOOPS: SEVERAL ITERATORS

GCC (Linaro) 4.9 for ARM64 does the same, but it precalculates the last index ofa1instead ofa2, which, of course has the
same effect:

Listing 39.3: Optimizing GCC (Linaro) 4.9 ARM64

; X0=a1
; X1=a2
; X2=cnt
f:
cbz x2, .L1 ; cnt==0? exit then
; calculate last element of "a1" array
add x2, x2, x2, lsl 1
; X2=X2+X2<<1=X2+X22=X23
mov x3, 0
lsl x2, x2, 2
; X2=X2<<2=X24=X234=X212
.L3:
ldr w4, [x1],28 ; load at X1, add 28 to X1 (post-increment)
str w4, [x0,x3] ; store at X0+X3=a1+X3
add x3, x3, 12 ; shift X3
cmp x3, x2 ; end?
bne .L3
.L1:
ret

GCC 4.4.5 for MIPS does the same:

Listing 39.4: Optimizing GCC 4.4.5 for MIPS (IDA)

; $a0=a1
; $a1=a2
; $a2=cnt
f:
; jump to loop check code:
beqz $a2, locret_24
; initialize counter (i) at 0:
move $v0, $zero ; branch delay slot, NOP

loc_8:
; load 32-bit word at $a1
lw $a3, 0($a1)
; increment counter (i):
addiu $v0, 1
; check for finish (compare "i" in $v0 and "cnt" in $a2):
sltu $v1, $v0, $a2
; store 32-bit word at $a0:
sw $a3, 0($a0)
; add 0x1C (28) to \$a1 at each iteration:
addiu $a1, 0x1C
; jump to loop body if i<cnt:
bnez $v1, loc_8
; add 0xC (12) to \$a0 at each iteration:
addiu $a0, 0xC ; branch delay slot

locret_24:
jr $ra
or $at, $zero ; branch delay slot, NOP

39.3 Intel C++ 2011 case.

Compiler optimizations can also be weird, but nevertheless, still correct. Here is what the Intel C++ compiler 2011 does:

Listing 39.5: Optimizing Intel C++ 2011 x64

f PROC
; parameter 1: rcx = a1
; parameter 2: rdx = a2
; parameter 3: r8 = cnt
.B1.1:: ; Preds .B1.0

Reverse Engineering for Beginners

39.3 Intel C++ 2011 case.

Get our desktop app

Company

Features

Documentation

Resources