Reverse Engineering for Beginners

(avery) #1

CHAPTER 14. LOOPS CHAPTER 14. LOOPS


MOV r3,#0
|L0.4|
; all bytes copied?
CMP r3,r2
; the following block is executed only if "less than" condition,
; i.e., if R2<R3 or i<size.
; load byte at R1+i:
LDRBCC r12,[r1,r3]
; store byte at R1+i:
STRBCC r12,[r0,r3]
; i++
ADDCC r3,r3,#1
; the last instruction of the "conditional block".
; jump to loop begin if i<size
; do nothing otherwise (i.e., if i>=size)
BCC |L0.4|
; return
BX lr
ENDP


That’s why there is only one branch instruction instead of 2.


14.2.3 MIPS.


Listing 14.14: GCC 4.4.5 optimized for size (-Os) (IDA)

my_memcpy:
; jump to loop check part:
b loc_14
; initialize counter (i) at 0
; it will always reside in \$v0:
move $v0, $zero ; branch delay slot


loc_8: # CODE XREF: my_memcpy+1C
; load byte as unsigned at address in $t0 to $v1:
lbu $v1, 0($t0)
; increment counter (i):
addiu $v0, 1
; store byte at $a3
sb $v1, 0($a3)


loc_14: # CODE XREF: my_memcpy
; check if counter (i) in $v0 is still less then 3rd function argument ("cnt" in $a2):
sltu $v1, $v0, $a2
; form address of byte in source block:
addu $t0, $a1, $v0
; $t0 = $a1+$v0 = src+i
; jump to loop body if counter sill less then "cnt":
bnez $v1, loc_8
; form address of byte in destination block (\$a3 = \$a0+\$v0 = dst+i):
addu $a3, $a0, $v0 ; branch delay slot
; finish if BNEZ wasnt triggered:'
jr $ra
or $at, $zero ; branch delay slot, NOP


Here we have two new instructions: LBU (“Load Byte Unsigned”) and SB (“Store Byte”). Just like in ARM, all MIPS registers are
32-bit wide, there are no byte-wide parts like in x86. So when dealing with single bytes, we have to allocate whole 32-bit
registers for them. LBU loads a byte and clears all other bits (“Unsigned”). On the other hand, LB (“Load Byte”) instruction
sign-extends the loaded byte to a 32-bit value. SB just writes a byte from lowest 8 bits of register to memory.


14.2.4 Vectorization


Optimizing GCC can do much more on this example:25.1.2 on page 396.

Free download pdf