The SAS code that produced the histogram in Figure 3.15 is:
proc format;
value clasfmt
16.5–18.5='17–18'
18.5–20.5='19–20'
20.5–22.5='21–22'
22.5–24.5='23–24'
24.5–26.5='25–26'
26.5–28.5='27–28';
run;
proc chart;
vbar agey/midpoints=17.5 19.5 21.5 23.5 25.5 27.5;
format agey clasfmt.;
run;
The class intervals are set up using PROC FORMAT, and the mid-points option is
used to specify the mid-points of each bar. For example, the mid-point for the interval
27–28 is (28.5−26.5)/2+26.5=27.5.
The horizontal axis shows the class intervals which are centred on the mid-point
values of each class. The vertical axis shows the frequency or number of observations in
each class interval. Clearly the majority of subjects were in the age range 19–20 years.
Univariate and Multivariate Analyses
When a data distribution for one response variable of interest is displayed this is called a
univariate distribution; univariate statistics describe essential features of the
distribution such as the mean and the standard deviation. These summary statistics are
introduced in a later section.
Univariate statistical analysis does not imply analysis involving only one variable,
there may be one or more independent variables, as well as the response variable of
interest. For example, a researcher may want to investigate differences, in final
examinations performance, among different groups of candidates. The response variable,
performance in final examinations, (continuous dependent variable) may be explained by
a candidate’s age (classified as mature candidate, not mature) and gender. Analysis of
variance (ANOVA), which is a classical example of a univariate statistical analysis, may
well be an appropriate statistical procedure to use. This would still be a univariate
analysis because the research question relates to whether there are any differences
between groups with respect to a single response variable.
When the joint distribution of two continuous variables is shown, for example in a
scatterplot, this is called a bivariate distribution. Calculation of statistics to assess the
degree of relationship between two variables, neither of which is deemed to be a response
(outcome) variable, would be an example of a bivariate statistical analysis. Univariate
analysis is, in fact, a special case of a more general statistical model. The ideas
underpinning univariate analysis can be extended to the analysis of two or more
variables. Analysis of multiple response variables which may be related is called
multivariate analysis. The response variables may be substantively different from each
Initial data analysis 63