3.12. C99 RESTRICT
jne .L6
.L1:
pop rbx rsi rdi rbp r12 r13 r14 r15
ret
Listing 3.54: GCC x64: f2()
f2:
push r13 r12 rbp rdi rsi rbx
mov r13, QWORD PTR 104[rsp]
mov rbp, QWORD PTR 88[rsp]
mov r12, QWORD PTR 96[rsp]
test r13, r13
je .L7
add r13, 1
xor r10d, r10d
mov edi, 1
xor eax, eax
jmp .L10
.L11:
mov rax, rdi
mov rdi, r11
.L10:
mov esi, DWORD PTR [rcx+rax4]
mov r11d, DWORD PTR [rdx+rax4]
mov DWORD PTR [r12+rax4], r10d ; store to update_me[]
add r10d, 123
lea ebx, [rsi+r11]
imul r11d, esi
mov DWORD PTR [r8+rax4], ebx ; store to sum[]
mov DWORD PTR [r9+rax4], r11d ; store to product[]
add r11d, ebx
mov DWORD PTR 0[rbp+rax4], r11d ; store to sum_product[]
lea r11, 1[rdi]
cmp r11, r13
jne .L11
.L7:
pop rbx rsi rdi rbp r12 r13
ret
Thedifferencebetweenthecompiledf1()andf2()functionsisasfollows: inf1(),sum[i]andproduct[i]
are reloaded in the middle of the loop, and inf2()there is no such thing, the already calculated values
are used, since we “promised” the compiler that no one and nothing will change the values insum[i]and
product[i]during the execution of the loop’s body, so it is “sure” that there is no need to load the value
from memory again.
Obviously, the second example works faster.
But what if the pointers in the function’s arguments intersect somehow?
This is on the programmer’s conscience, and the results will be incorrect.
Let’s go back to Fortran.
Compilers of this programming language treats all pointers as such, so when it was not possible to set
restrictin C, Fortran could generate faster code in these cases.
How practical is it?
In the cases when the function works with several big blocks in memory.
There are a lot of such in linear algebra, for instance.
Supercomputers/HPC^14 are very busy with linear algebra, so probably that is why, traditionally, Fortran is
still used there [Eugene Loh,The Ideal HPC Programming Language, (2010)].
But when the number of iterations is not very big, certainly, the speed boost may not to be significant.
(^14) High-Performance Computing