CHAPTER 39. LOOPS: SEVERAL ITERATORS CHAPTER 39. LOOPS: SEVERAL ITERATORS
for (i=0; i<cnt; i++)
{
a1[idx1]=a2[idx2];
idx1+=3;
idx2+=7;
};
};
So, at the cost of updating 3 iterators at each iteration instead of one, we can remove two multiplication operations.
39.2 Two iterators
GCC 4.9 does even more, leaving only 2 iterators:
Listing 39.2: Optimizing GCC 4.9 x64
; RDI=a1
; RSI=a2
; RDX=cnt
f:
test rdx, rdx ; cnt==0? exit then
je .L1
; calculate last element address in "a2" and leave it in RDX
lea rax, [0+rdx4]
; RAX=RDX4=cnt4
sal rdx, 5
; RDX=RDX<<5=cnt32
sub rdx, rax
; RDX=RDX-RAX=cnt32-cnt4=cnt28
add rdx, rsi
; RDX=RDX+RSI=a2+cnt28
.L3:
mov eax, DWORD PTR [rsi]
add rsi, 28
add rdi, 12
mov DWORD PTR [rdi-12], eax
cmp rsi, rdx
jne .L3
.L1:
rep ret
There is nocountervariable any more: GCC concluded that it is not needed. The last element of thea2array is calculated
before the loop begins (which is easy:cnt∗ 7 ) and that’s how the loop is to be stopped: just iterate until the second index
has not reached this precalculated value.
You can read more about multiplication using shifts/additions/subtractions here:16.1.3 on page 200.
This code can be rewritten into C/C++ like that:
#include <stdio.h>
void f(int a1, int a2, size_t cnt)
{
size_t i;
size_t idx1=0; idx2=0;
size_t last_idx2=cnt*7;
// copy from one array to another in some weird scheme
for (;;)
{
a1[idx1]=a2[idx2];
idx1+=3;
idx2+=7;
if (idx2==last_idx2)
break;
};
};