CK-12 Probability and Statistics - Advanced

(Marvins-Underground-K-12) #1

1.1. Definitions of Statistical Terminology http://www.ck12.org


habitat to be sure that you counted every tortoise. In an example closer to home, it is very expensive (and maybe
even impossible!!) to get accurate and complete information aboutallthe residents of the United States to help
effectively address the needs of a changing population. This is why a complete counting (census) is only attempted
every ten years.


Because of these problems, it is common to use a smaller, representative group from the population called asample.


You may recall the tortoise data included a variable for the estimate of the population size. This number was found
using a sample and is actually just an approximation of the true number of tortoises. When a researcher wanted to
find an estimate for the population of a species of tortoise, she would go into the field and locate and mark a number
of tortoises. She would then use statistical techniques that we will discover later in this text to obtain an estimate for
the total number of tortoises in the population. In statistics, we call the actual number of tortoises aparameter. The
number of tortoises in the sample, or any other number that describes the individuals in the sample (like their length,
or weight, or age), is called astatistic. In general, eachstatisticis an estimate of aparameter, whose value is not
known exactly.


In theTable1.3, are the actual data from the species of tortoise found on the Volcano Darwin, on Isabela Island.
(Note:the word “data” is the plural of the word “datum”, which means the result of a single measurement.) The
number of captured individuals is a statistic as it deals with the sample. The actual population is a parameter that we
are trying to estimate.


TABLE1.3: Tortoise Data for Darwin Volcano, Isabela Island.


Number of Individuals Captured Population Estimate Population Estimate Interval
160 818 561 − 1075

Errors in Sampling


Unfortunately, there is a downside to using sampling. We have to accept that estimates using a sample have a chance
of being inaccurate or even downright wrong! This cannot be avoided unless we sample the entire population. You
can see this in the next figure. The actual data not only includes an estimate, but also an interval of the likely true
values for the population parameter. The researcher has to accept that there could be variations in the sample due to
chance which lead to changes in the population estimate. A statistician would not say that the parameter is a specific
number like 915, but would most likely report something like the following:


“I am fairly confident that the true number of tortoises is actually between 561 and 1075.”


This range of values is the unavoidable result of using a sample, and not due to some mistake that was made in
the process of collecting and analyzing the sample. In general, the potential difference between the true parameter
and the statistic obtained from using a sample is calledsampling error. It is also possible that the researchers
made mistakes in their sampling methods in a way that led to a sample that does not accurately represent the true
population. For example, they could have picked an area to search for tortoises where a large number tend to
congregate (near a food or water source perhaps). If this sample were used to estimate the number of tortoises in all
locations, it may lead to a population estimate that is too high. This type of systematic error in sampling is called
bias. Statisticians go to great lengths to avoid the many potential sources of bias. We will investigate this in more
detail in a later chapter.

Free download pdf