166 6 Linear Algebra
a[i][j]+=b[i][k]*c[k][j]
}
}
}
The fact that we have the constrainti≤jleads to the requirement for the computation ofai j
of 2 (j−i+ 1 )flops. The total number of flops is then
n
∑
i= 1
n
∑
j= 1
2 (j−i+ 1 ) =
n
∑
i= 1
n−i+ 1
∑
j= 1
2 j≈
n
∑
i= 1
2 (n−i+ 1 )^2
2
,
where we used that∑nj= 1 j=n(n+ 1 )/ 2 ≈n^2 / 2 for largenvalues. Using in addition that∑nj= 1 j^2 ≈
n^3 / 3 for largenvalues, we end up with approximatelyn^3 / 3 flops for the multiplication of two
upper triangular matrices. This means that if we deal with matrix multiplication of upper
triangular matrices, we reduce the number of flops by a factorsix if we code our matrix
multiplication in an efficient way.
It is also important to keep in mind that computers are finite,we can thus not store in-
finitely large matrices. To calculate the space needed in memory for ann×nmatrix with dou-
ble precision, 64 bits or 8 bytes for every matrix element, one needs simply computen×n× 8
bytes. Thus, ifn= 10000 , we will need close to 1GB of storage. Decreasing the precision to
single precision, only halves our needs.
A further point we would like to stress, is that one should in general avoid fixed (at com-
pilation time) dimensions of matrices. That is, one could always specify that a given matrixA
should have sizeA[ 100 ][ 100 ], while in the actual execution one may use onlyA[ 10 ][ 10 ]. If one
has several such matrices, one may run out of memory, while the actual processing of the
program does not imply that. Thus, we will always recommend that you use dynamic memory
allocation, and deallocation of arrays when they are no longer needed. In Fortran one uses
the intrisic functionsALLOCATEandDEALLOCATE, while C++ employs the functionsnew
anddelete.
6.3.3.1 Strassen’s algorithm
As we have seen, the straightforward algorithm for matrix-matrix multiplication will requirep
multiplications andp− 1 additions for each of them×nelements. The total number of floating-
point operations is thenmn( 2 p− 1 )∼O(mn p). When the matricesAandBcan be divided into
four equally sized blocks, [
C 11 C 12
C 21 C 22
]
=
[
A 11 A 12
A 21 A 22
][
B 11 B 12
B 21 B 22
]
, (6.7)
we get eight multiplications of smaller blocks,
[
C 11 C 12
C 21 C 22
]
=
[
A 11 B 11 +A 12 B 21 A 11 B 12 +A 12 B 22
A 21 B 11 +A 22 B 21 A 21 B 12 +A 22 B 22
]
. (6.8)
Strassen discovered in 1968 how the number of multiplications could be reduced from
eight to seven [28]. Following Strassen’s approach we definesome intermediates,
S 1 =A 21 +A 22 ,T 1 =B 12 −B 11 ,
S 2 =S 1 −A 11 , T 2 =B 22 −T 1 ,
S 3 =A 11 −A 21 ,T 3 =B 22 −B 12 ,
S 4 =A 12 −S 2 , T 4 =B 21 −T 2 ,
(6.9)
and need seven multiplications,