other, for example, overall performance in final examinations could be one response
variable and a second, may be salary in first year of employment. Number of A-levels
(categorical) could be an independent variable. A multivariate analysis of variance,
manova, may be an appropriate way to analyse data structured in this way. Response
variables do not have to be different in kind, they may be substantively similar but
measured on a number of different occasions, such as exam performance at the end of
years one, two and three. This is a special case of multivariate analysis called repeated
measures analysis. In summary, whenever a simultaneous single analysis is performed
with multiple dependent variables (instead of two or more univariate analyses on each
dependent variable) then a multivariate analysis should be used.
The above examples of multivariate analysis are more concerned with testing
inferences than data exploration. Other forms of multivariate analysis which are useful if
a researcher is interested in the wider perspective of how observations on several
variables may be related include principal component analysis, cluster analysis, and
correspondence analysis. Multiple regression analysis, although usually presented in text
books as a way of evaluating the effect that independent variables have on the prediction
of a response variable, can be used in an exploratory way to identify which independent
variables are important (i.e. have explanatory power).
3.4 Descriptive Statistics
An important part of IDA and data description is the use of summary statistics to
characterize important features of a distribution. Three essential descriptive statistics
which help describe a data distribution are measures of central tendency or position,
measures of shape, and measures of dispersion (spread).
Measures of Central Tendency
Common statistics which identify the centre of a distribution include the mode, the
median, and the arithmetic mean. Less common measures of centrality are the
weighted mean, the trimmed mean, and the geometric mean.
The mode is the most frequently occurring value in a distribution. In the following
distribution of 10 values,
2 15 9 2 18 14 0 6 11 3,
the mode is 2. In a grouped frequency distribution the class interval which has the
largest frequency (largest number of values) is called the modal interval. Looking at
Table 3.4 the modal class interval is 19–20.
The median, you may recall, is the 50th percentile or the middle value in a set of
observations ordered in magnitude. In an ordered series which has an odd number of
values the median is the middle value. In an ordered series which has an even number of
values,
0 2 2 3 6 9 11 14 15 18,
the median is the average of the middle two values. In this example the median is
between the 5th and the 6th values ie (6+9)/2=7.5.
Statistical analysis for education and psychology researchers 64