Advanced Rails - Building Industrial-Strength Web Apps in Record Time

(Tuis.) #1

150 | Chapter 6: Performance


As our data set represents application response times, from which we want to infer a
mean and confidence interval applicable to data points we have not sampled, we want
to use the sample standard deviation. Using the population standard deviation on our
sample would underestimate our population’s actual standard deviation. Here is the
Ruby code for the sample standard deviation, which we will use from here on out:


module Enumerable
def stdev
Math.sqrt( map{|x| (x - mean) ** 2}.sum / (length-1) )
end
end

The standard deviation is a very useful way to get a feel for the amount of variation
in a data set. We see that the second set of samples from above has a much larger
standard deviation than the first:


samples1.stdev # => 1.13529242439509
samples2.stdev # => 6.7954232964384

The standard deviation has the same units as the sample data; if the original data were
in milliseconds, then the samples have standard deviations of 1.1 ms and 6.4 ms,
respectively.


We can use the standard deviations to estimate aconfidence interval. The confidence
interval and mean will give us a good idea for the limits of the data. Assuming a nor-
mal distribution,* the following guidelines apply:



  • Approximately 68% of the data points lie within one standard deviation (σ)of
    the mean.

  • 95% of the data is within 2σ of the mean.

  • 99.7% of the data is within 3σ of the mean.


Using the second rule, we will generate a 95% confidence interval from the statistics
we have generated. This Ruby code uses the mean and standard deviation to return a
range in which 95% of the data should lie:


module Enumerable
def confidence_interval
(mean - 2*stdev) .. (mean + 2*stdev)
end
end

samples1.confidence_interval # => 7.92941515120981..12.4705848487902
samples2.confidence_interval # => -3.39084659287681..23.7908465928768


  • It is reasonable to assume a normal distribution here. We can safely treat series of server response times as
    i.i.d. random variables; therefore, by the central limit theorem, the distribution will converge to normal given
    enough samples.

Free download pdf