We  have    a   pretty  good    intuitive   sense   of  what    an  outlier is: it’s    a   value   far removed from    the others.
There   is  no  rigorous    mathematical    formula for determining whether or  not something   is  an  outlier,    but
there   are a   few conventions that    people  seem    to  agree   on. Not surprisingly,   some    of  them    are based   on  the
mean    and some    are based   on  the median!
A   commonly    agreed-upon way to  think   of  outliers    based   on  the mean    is  to  consider    how many
standard    deviations  away    from    the mean    a   term    is. Some    texts   identify    a   potential   outlier as  a   datapoint
that    is  more    than    two or  three   standard    deviations  from    the mean.
In  a   mound-shaped,   symmetric,  distribution,   this    is  a   value   that    has only    about   a   5%  chance  (for    two
standard    deviations) or  a   0.3%    chance  (for    three   standard    deviations) of  being   as  far removed from    the
center  of  the distribution    as  it  is. Think   of  it  as  a   value   that    is  way out in  one of  the tails   of  the distribution.
Most    texts   now use a   median-based    measure and identify    potential   outliers    in  terms   of  how far a
datapoint   is  above   or  below   the quartiles   in  a   distribution.   To  find    if  a   distribution    has any outliers,   do  the
following   (this   is  known   as  the “1.5    (IQR)   rule”):
•           Find    the IQR.
•           Multiply    the IQR by  1.5.
•           Find    Q1  –   1.5(IQR)    and Q3  +   1.5(IQR).
•           Any value   below   Q1  –   1.5(IQR)    or  above   Q3  +   1.5(IQR)    is  a   potential   outlier .
Some    texts   call    an  outlier defined as  above   a   mild outlier.   An  extreme outlier would   then    be  one that
lies    more    than    3   IQRs    beyond  Q1  or  Q3.
example: The    following   data    represent   the amount  of  money,  in  British pounds, spent   weekly  on
tobacco for 11  regions in  Britain:    4.03,   3.76,   3.77,   3.34,   3.47,   2.92,   3.20,   2.71,   3.53,   4.51,
4.56.   Do  any of  the regions seem    to  be  spending    a   lot more    or  less    than    the other   regions?    That
is, are there   any outliers    in  the data?
solution: Using a   calculator, we  find        ,   Sx =    s = .59,    Q1  =   3.2,    Q3  =   4.03.•           Using   means:  3.62    ±   2(0.59) =   (2.44,  4.8).   There   are no  values  in  the dataset less    than    2.44    or  greater
than    4.8,    so  there   are no  outliers    by  this    method. We  don’t   need    to  check   ±   3s since    there   were    no
outliers    using   ±   2s  .
•           Using   the 1.5(IQR)    rule:   Q1  –   1.5(IQR)    =   3.2 –   1.5(4.03    –   3.2)    =   1.96,   Q3  +   1.5(IQR)    =   4.03    +
1.5(4.03    –   3.2)    =   5.28.   Because there   are no  values  in  the data    less    than    1.96    or  greater than    5.28,   there
are no  outliers    by  this    method  either.
Outliers    are important   because they    will    often   tell    us  that    something   unusual or  unexpected  is  going   on
with    the data    that    we  need    to  know    about.  A   manufacturing   process that    produces    products    so  far out of
spec    that    they    are outliers    often   indicates   that    something   is  wrong   with    the process.    Sometimes   outliers
are just    a   natural,    but rare,   variation.  Often,  however,    an  outlier can indicate    that    the process generating
the data    is  out of  control in  some    fashion.
Position of a Term in a Distribution
Up  until   now,    we  have    concentrated    on  the nature  of  a   distribution    as  a   whole.  We  have    been    concerned
with    the shape,  center, and spread  of  the entire  distribution.   Now we  look    briefly at  individual  cases   in
the distribution.
