Excel 2019 Bible

(singke) #1

Chapter 16: Using Formulas for Statistical Analysis


16


The quart argument in our QUARTILE function uses the expression 5-ROW(A1). This allows
the quart argument to decrease by 1 as the formula is copied down. For cell G3, the expres-
sion returns 4 for the maximum value in the range. When the formula is copied down to G4,
the A1 reference changes to A2, and the expression returns 5-2 or 3 for the 75th percentile.

The QUARTILE function applies a percentage to one less than the count of values to find the two values that sur-
round the demarcation line. Then it interpolates between those two values to find the result.


For the 75th percentile, QUARTILE computes .75(30-1) for the 30 values in Figure 16.14 to get 21.75. Then it
sorts the data from lowest to highest and counts down 21 rows from the lowest value. Since the result of the first
calculation is not a whole number, it interpolates between the two values. In this case, counting down 21 rows is the
value 43, and the next value is 45. The interpolation uses the decimal portion of 21.75 to find the value that is 75%
of the way between 43 and 45 or 43+((45-43)
.75), which is 44.5.


Similarly, for the 50th percentile, QUARTILE computes .5*(30-1) to get 14.5. Counting down from the lowest values,
the 50th percentile falls between Alex Cox’s 31 and Katelyn Howard’s 31. Since both values are the same, the inter-
polation is easy and returns 31. Figure 16.14 shows the same employee and defect data sorted with the demarcation
lines identified.


To find into which quartile each value falls, the MATCH function is used against the range
of QUARTILE calculations. Because our quartile data is in descending order, the last argu-
ment of MATCH is -1— Greater than. MATCH returns the position in the list where the value
is found but stops when the next value is less than the lookup value. When attempting to
match the value 47, MATCH sees that the second value (44.5) is less than the lookup value
and stops at the first position.

Identifying Statistical Outliers with an Interquartile


Range
In the preceding formula, we used the QUARTILE function to group data into buckets.
While the QUARTILE function is handy when you require relatively symmetrical quartiles,
this function shrinks the distance between the upper and lower quartiles (making it more
difficult to identify true statistical outliers).

Excel offers another function called QUARTILE.EXC. QUARTILE.EXC excludes the median
(middle number) from the population. This function results in quartiles that are further
from the center of the whole. This gives us a better estimate of actual population and a
potentially more accurate view of what values should be considered outliers.

Figure 16.15 shows a set of employees and the number of defects per 1,000 products. This
data set has a wider spread of values. We want to determine which employees are outside a
reasonable range (outliers) for further investigation.
Free download pdf