t0 = t.time()
a = get_randoms(i, 1 )
t1 = t.time()
cpu_times.append(t1 - t0)
t2 = t.time()
a = get_cuda_randoms(i, 1 )
t3 = t.time()
cuda_times.append(t3 - t2)
print “Bytes of largest array %i” % a.nbytes
return cuda_times, cpu_times
And a helper function to visualize performance results:
In [ 105 ]: def plot_results(cpu_times, cuda_times, factor):
plt.plot(x * factor, cpu_times,‘b’, label=‘NUMPY’)
plt.plot(x * factor, cuda_times, ‘r’, label=‘CUDA’)
plt.legend(loc= 0 )
plt.grid(True)
plt.xlabel(‘size of random number array’)
plt.ylabel(‘time’)
plt.axis(‘tight’)
Let’s take a look at the first test series with a medium workload:
In [ 106 ]: factor = 100
cuda_times, cpu_times = time_comparsion(factor)
Out[106]: Bytes of largest array 8000800
Calculation time for the random numbers on the GPU is almost independent of the
numbers to be generated. By constrast, time on the CPU rises sharply with increasing size
of the random number array to be generated. Both statements can be verified in Figure 8-
5 :
In [ 107 ]: x = np.arange( 1 , 10002 , step)
In [ 108 ]: plot_results(cpu_times, cuda_times, factor)
Figure 8-5. Random number generation on GPU and CPU (factor = 100)
Now let’s look at the second test series, with a pretty low workload:
In [ 109 ]: factor = 10
cuda_times, cpu_times = time_comparsion(factor)
Out[109]: Bytes of largest array 800080
The overhead of using the GPU is too large for low workloads — something quite obvious
from inspecting Figure 8-6:
In [ 110 ]: plot_results(cpu_times, cuda_times, factor)