18.08333 ( 29) 24.25 ( 112)
18.16667 ( 72) 27.16667 ( 108)
Output in each of the sections headed moments, quantiles, and extremes is explained
below:
Moments
N is the number of observations with non-missing values for the variable being
summarized here AGEY.
Mean is the arithmetic average. Here the mean age is 19.1769 years. This value is a case of
spurious accuracy. Computation of the mean has introduced more apparent accuracy
than there was in the original data. In reporting this result the age should be rounded or
cut to one decimal place, i.e., 19.2 years. You should always be consistent when
rounding data for reporting and either round up to the next nearest decimal place or
round down, but do not round up on one occasion and down on another. The mean is not
the best summary statistic of central location because the distribution is positively
skewed, see positive skewness and kurtosis. Also the mode (18.8)<median (18.9)<mean
(19.2).
Std Dev is the standard deviation. This indicates the amount of dispersion about the mean, but is
easier to interpret than the variance (also a measure of dispersion about the mean) since
the units of measurement for the standard deviation are the same as those for the data. In
this example the standard deviation is 1.2 years. If the data had been approximately
normally distributed we could have used the standard deviation to estimate that the
middle 68 per cent of observations would fall between 18.0yrs and 20.4yrs, that is within
plus or minus 1 standard deviation from the mean. This is an example of use of one of
the inferential properties of the normal distribution. We will look at this in more detail in
a later chapter.
USS is the uncorrected sum of squares and is given by:
Unless specified otherwise the weight w is 1. We will mention sums of squares in later
chapters.
CV is the coefficient of variation. This is a less common descriptive statistic which compares
the dispersion of observations with their magnitude. It is calculated as S/mean*100 (S is
the sample standard deviation, see Table 3.5. Unlike the standard deviation, it is a
unitless measure of relative variability which is sometimes useful when variables with
different dimensions are being compared, for example, weight in kilograms and weight
in pounds.
Similar to the standard deviation, a low value of this coefficient indicates greater
precision or less variability in observations. However, this judgment is only helpful
when there is another measure of the coefficient with which to compare. If all sample
values are multiplied by a constant CV remains unchanged.
T:Mean=0 is the value of the t test statistic for the null hypothesis that μ(population mean)=0yrs (in
this example clearly a silly hypothesis to entertain). It is calculated as:
which is the difference between the sample mean and the hypothesized population mean
divided by the estimated standard error of the mean. You can base a decision rule on the
t statistic:
if t<–1.98 decide on H 1 (Alternative hypothesis: μ<0)
if −1.98≤t≤1.98, reserve judgment
Initial data analysis 73