http://www.ck12.org Chapter 1. An Introduction to Analyzing Statistical Data
Experiments
The other widely used method for conducting research is called anexperiment. In an experiment, the researcher
imposes a treatment on a group of subjects in an effort to determine a “cause and effect” relationship between
variables. While observational studies could appear to show a relationship between diet and heart disease, for
example, there could be another factor that is actually causing an individual’s heart condition. An experiment
designed to investigate this relationship might take two groups of similar subjects, impose different diets on each
group of those subjects, and then record any differences in the condition of their hearts. What makes this difficult,
and in some instances impossible, is that the researcher would then need to make sure that anything else that might
have an influence on a subject’s heart health (e.g. exercise, genetics, stress level) is controlled, or exactly the same
for each individual in the study. One of the ways that statisticians insure this control is by randomly assigning
subjects and treatments, thereby using the laws of probability to help guarantee the validity of the results. Designing
experiments can be difficult and costly, but they are the only way to establish meaningful and reliable cause and
effect relationships. We will study the elements of designing experiments in more detail in later chapters.
Measures of Center and Spread
Let us assume that you have collected some data on one of the various levels of measurement (nominal, ordinal,
interval, or ratio) using a statistically valid procedure (observational study or experiment). How do you summarize
this information? One of the most important tools for summarizing data is to display it visually, and the various
methods for doing so will be covered in later chapters. If we want to use one number or value to summarize the
data, we can look at where the data is centered. Data measured at different levels can be characterized by different
summaries. Look back at the Tortoise data. This data was collected through an observational study. The variable
“Climate Type” is a categorical variable that has been measured at the nominal level. The easiest way to summarize
this variable is to identify the most common value (mode), which is “humid.” Variables that are measured at the
ratio level, like “population density,” we might find theaverage(mean) or the middle number (median) in the data
to summarize it.
Another important element of a data set is how it is spread. In the tortoise population estimate data, the numbers per
species range from 6320, down to 1, or a spread of approximately 6,000 tortoises. However, the population of the
Alcedo tortoises is much larger than the other species, so this number might not give a true indication of how most
of the other populations vary. We have other measures that might help shed some light on the spread of the typical
tortoise species,such as theinterquartile rangeand thestandard deviation, which we will cover in detail in the
following lessons.
Lesson Summary
Data can be measured at different levels depending on the type of variable and amount of detail that is collected.
A widely used method for categorizing the different types of measurement breaks them down into four groups.
Nominaldata is measured by classification or categories. Ordinaldata uses numerical categories that convey a
meaningful order. Intervalmeasurements show order, and the spaces between the values also have significant
meaning. Inratiomeasurement, the ratio between any two values has meaning because the data includes an absolute
zero value.
Statisticians and researchers use two main techniques to form important conclusions about the relationships between
variables. Anobservational studyis when a researcher observes the subjects in the real world without manipulating
them. Anexperimentis the way to establish true cause-and-effect relationships. It involves the researcher imposing