Reverse Engineering for Beginners

CHAPTER 40. DUFF’S DEVICE CHAPTER 40. DUFF’S DEVICE

Chapter 40

Duff’s device

Duff’s device^1 is an unrolled loop with the possibility to jump inside it. The unrolled loop is implemented using a
fallthrough switch() statement.

We would use here a slightly simplified version of Tom Duff’s original code.

Let’s say, we need to write a function that clears a region in memory. One can come with a simple loop, clearing byte by byte.
It’s obviously slow, since all modern computers have much wider memory bus. So the better way is to clear the memory
region using 4 or 8 byte blocks. Since we are going to work with a 64-bit example here, we are going to clear the memory
in 8 byte blocks. So far so good. But what about the tail? Memory clearing routine can also be called for regions of size
that’s not a multiple of 8.

So here is the algorithm:

calculate the number of 8-byte blocks, clear them using 8-byte (64-bit) memory accesses;

calculate the size of the tail, clear it using 1-byte memory accesses.

The second step can be implemented using a simple loop. But let’s implement it as an unrolled loop:

#include <stdint.h>
#include <stdio.h>

void bzero(uint8_t* dst, size_t count)
{
int i;

if (count&(~7)) // work out 8-byte blocks for (i=0; i<count>>3; i++) { *(uint64_t*)dst=0; dst=dst+8; };

// work out the tail
switch(count & 7)
{
case 7: dst++ = 0;
case 6: dst++ = 0;
case 5: dst++ = 0;
case 4: dst++ = 0;
case 3: dst++ = 0;
case 2: dst++ = 0;
case 1: *dst++ = 0;
case 0: // do nothing
break;
}
}

Let’s first understand how the calculation is done. The memory region size comes as a 64-bit value. And this value can be
divided in two parts:

(^1) wikipedia

Reverse Engineering for Beginners

Chapter 40

Duff’s device

Get our desktop app

Company

Features

Documentation

Resources