Advanced Rails - Building Industrial-Strength Web Apps in Record Time

(Tuis.) #1
Measurement Tools | 149

This gives us predictable results:


samples.sum # => 53.0
samples.length # => 5
samples.mean # => 10.6

Everyone is familiar with the mean, but the problem is that by itself, the mean is
nearly worthless for describing a data set. Consider these two sets of samples:


samples1 = %w(10 11 12 10 10 9 12 10 9 9).map{|x|x.to_f}
samples2 = %w( 2 11 6 14 20 21 3 4 8 13).map{|x|x.to_f}

These two data sets in fact have the same mean, 10.2. But they clearly represent
wildly different performance profiles, as can be seen from their graph (see
Figure 6-1).


We need a new statistic to measure how much the data varies from the mean. That
statistic is thestandard deviation. The standard deviation of a sample is calculated by
taking the root mean square deviation from the sample mean. In Ruby, it looks like this:


module Enumerable
def population_stdev
Math.sqrt( map{|x| (x - mean) ** 2}.mean )
end
end

This code maps over the collection, taking the square of the deviation of each ele-
ment from the mean. It then takes the mean of those squared values, and takes the
square root of the mean, yielding the standard deviation.


However, this is only half the story. What has been introduced so far is the
population standard deviation, while what we really want is thesample standard
deviation. Without completely diving into the relevant mathematics, the basic differ-
ence between the two is whether the data represent an entire population or only a
portion of it.


Figure 6-1. Two vastly different response-time profiles with the same mean


samples1

16

2

6

11

20

samples2
Free download pdf