CHAPTER 93. ITANIUM CHAPTER 93. ITANIUM
Chapter 93
Itanium
Although almost failed, Intel Itanium (IA64) is a very interesting arcutecture. WhileOOECPUs decides how to rearrange
their instructions and execute them in parallel,EPIC^1 was an attempt to shift these decisions to the compiler: to let it group
the instructions at the compile stage.
This resulted in notoriously complex compilers.
Here is one sample ofIA64code: simple cryptographic algorithm from the Linux kernel:
Listing 93.1: Linux kernel 3.2.0.4
#define TEA_ROUNDS 32
#define TEA_DELTA 0x9e3779b9
static void tea_encrypt(struct crypto_tfm tfm, u8 dst, const u8 src)
{
u32 y, z, n, sum = 0;
u32 k0, k1, k2, k3;
struct tea_ctx ctx = crypto_tfm_ctx(tfm);
const le32 *in = (const le32 )src;
__le32 out = (__le32 *)dst;
y = le32_to_cpu(in[0]);
z = le32_to_cpu(in[1]);
k0 = ctx->KEY[0];
k1 = ctx->KEY[1];
k2 = ctx->KEY[2];
k3 = ctx->KEY[3];
n = TEA_ROUNDS;
while (n-- > 0) {
sum += TEA_DELTA;
y += ((z << 4) + k0) ^ (z + sum) ^ ((z >> 5) + k1);
z += ((y << 4) + k2) ^ (y + sum) ^ ((y >> 5) + k3);
}
out[0] = cpu_to_le32(y);
out[1] = cpu_to_le32(z);
}
Here is how it was compiled:
Listing 93.2: Linux Kernel 3.2.0.4 for Itanium 2 (McKinley)
0090| tea_encrypt:
0090|08 80 80 41 00 21 adds r16 = 96, r32 // ptr to ctx->KEY⤦
Ç[2]
0096|80 C0 82 00 42 00 adds r8 = 88, r32 // ptr to ctx->KEY⤦
Ç[0]
009C|00 00 04 00 nop.i 0
00A0|09 18 70 41 00 21 adds r3 = 92, r32 // ptr to ctx->KEY⤦
Ç[1]
(^1) Explicitly parallel instruction computing