Reverse Engineering for Beginners

(avery) #1

CHAPTER 40. DUFF’S DEVICE CHAPTER 40. DUFF’S DEVICE


Chapter 40


Duff’s device


Duff’s device^1 is an unrolled loop with the possibility to jump inside it. The unrolled loop is implemented using a
fallthrough switch() statement.


We would use here a slightly simplified version of Tom Duff’s original code.


Let’s say, we need to write a function that clears a region in memory. One can come with a simple loop, clearing byte by byte.
It’s obviously slow, since all modern computers have much wider memory bus. So the better way is to clear the memory
region using 4 or 8 byte blocks. Since we are going to work with a 64-bit example here, we are going to clear the memory
in 8 byte blocks. So far so good. But what about the tail? Memory clearing routine can also be called for regions of size
that’s not a multiple of 8.


So here is the algorithm:



  • calculate the number of 8-byte blocks, clear them using 8-byte (64-bit) memory accesses;

  • calculate the size of the tail, clear it using 1-byte memory accesses.


The second step can be implemented using a simple loop. But let’s implement it as an unrolled loop:


#include <stdint.h>
#include <stdio.h>


void bzero(uint8_t* dst, size_t count)
{
int i;


if (count&(~7))
// work out 8-byte blocks
for (i=0; i<count>>3; i++)
{
*(uint64_t*)dst=0;
dst=dst+8;
};

// work out the tail
switch(count & 7)
{
case 7: dst++ = 0;
case 6:
dst++ = 0;
case 5: dst++ = 0;
case 4:
dst++ = 0;
case 3: dst++ = 0;
case 2:
dst++ = 0;
case 1: *dst++ = 0;
case 0: // do nothing
break;
}
}


Let’s first understand how the calculation is done. The memory region size comes as a 64-bit value. And this value can be
divided in two parts:


(^1) wikipedia

Free download pdf