CHAPTER 40. DUFF’S DEVICE CHAPTER 40. DUFF’S DEVICE
Chapter 40
Duff’s device
Duff’s device^1 is an unrolled loop with the possibility to jump inside it. The unrolled loop is implemented using a
fallthrough switch() statement.
We would use here a slightly simplified version of Tom Duff’s original code.
Let’s say, we need to write a function that clears a region in memory. One can come with a simple loop, clearing byte by byte.
It’s obviously slow, since all modern computers have much wider memory bus. So the better way is to clear the memory
region using 4 or 8 byte blocks. Since we are going to work with a 64-bit example here, we are going to clear the memory
in 8 byte blocks. So far so good. But what about the tail? Memory clearing routine can also be called for regions of size
that’s not a multiple of 8.
So here is the algorithm:
- calculate the number of 8-byte blocks, clear them using 8-byte (64-bit) memory accesses;
- calculate the size of the tail, clear it using 1-byte memory accesses.
The second step can be implemented using a simple loop. But let’s implement it as an unrolled loop:
#include <stdint.h>
#include <stdio.h>
void bzero(uint8_t* dst, size_t count)
{
int i;
if (count&(~7))
// work out 8-byte blocks
for (i=0; i<count>>3; i++)
{
*(uint64_t*)dst=0;
dst=dst+8;
};
// work out the tail
switch(count & 7)
{
case 7: dst++ = 0;
case 6: dst++ = 0;
case 5: dst++ = 0;
case 4: dst++ = 0;
case 3: dst++ = 0;
case 2: dst++ = 0;
case 1: *dst++ = 0;
case 0: // do nothing
break;
}
}
Let’s first understand how the calculation is done. The memory region size comes as a 64-bit value. And this value can be
divided in two parts:
(^1) wikipedia