Comp. by: VPugazhenthi Stage : Revises1 ChapterID: 9781405132879_4_S Date:1/4/
09 Time:15:23:34 Filepath:H:/00_Blackwell/00_3B2/Gregory-9781405132879/appln/3B2/
revises/9781405132879_4_S.3d
other small projects, the absolute min-
imum is 100 and preferably 250 when sub-
groups (male and female; young and old)
are being analysed. The aim should be a
focusedquestionnaireto a lot of people,
rather than a long questionnaire to few, or
a recourse tosecondary data.
. Stratified samplinggroups the population
into strata so as to maximize similarity
within a stratum and maximize between-
strata differences. This can considerably
increase the sample’s efficiency if stratifica-
tion is based on a variable strongly related
to the estimate. If income is strongly
related to region, then regions could be
used for stratifying and reducing the stand-
ard error of the mean income estimate. We
can also disproportionately sample from
particular strata when there are important
groups of the population that are numeric-
ally small and so would yield only small
numbers if SRS were used within strata
such as ethnic groups (see ethnicity)
with the non-indigenous groups over-sam-
pled to get more precise estimates. Such a
strategy requires detailed knowledge of the
sample frame in terms of an ethnic classi-
fication, and the analysis should be
weighted to get correct estimates.
. Multi-stage designs involve sampling in
stages. For example, a sample of constitu-
encies may be selected at random (the so-
called primary sampling units), then wards
within them, then households within
wards and individuals within households.
This design is often used for major scien-
tific surveys, as it only requires a sampling
frame at each stage; thus at stage one only
a list of constituencies is required, while at
stage two, only ward names are required
for those constituencies already selected.
Another advantage is the cost reduction
resulting from basing a team of interview-
ers in the higher-level units. A variant is the
cluster designwhen at some stage all the
lower level units are sampled – everybody
in a ward is selected, for example. A prob-
lem with these designs is that there is a
tendency for people living in the same
place to be somewhat similar so that the
resultant sample is more alike than a ran-
dom sample and standard statistical theory
gives overly precise results. Clustered data
lead to inefficiency and it is not unknown
for an SRS a third of the size to achieve
the same standard error. It is clearly vital
to measure this dependency (the intra-
classcorrelation) and correct for it. The
development of multi-level models
allows this even when the sample is unbal-
anced with a different number of units
in each higher level unit. Consequently,
multistage designs are recommended
for studying variation simultaneously at a
number of differentscales, with the popu-
lation itself seen as having a hierarchical
structure, which is itself of substantive
interest (Jones, 1997). Indeed highly clus-
tered designs are needed if survey informa-
tion is to be gathered on individuals as
well as their peers. With such designs, it is
necessary to specify the number of units at
each level; Raudenbush and Xiaofeng
(2000) provide the necessary background,
which is put into practice by Stoker
and Bowers (2002) in their geographically
sensitive designs for surveying American
voting behaviour.
These three designs can be used in combin-
ation; the UK Millennium Cohort study,
unlike previous birth cohorts, is spatially clus-
tered specifically to study neighbourhood
effects. Wards are disproportionately strati-
fied to ensure adequate representation of all
four UK countries, deprived areas and areas
with high concentrations of particular ethnic
groups, and then all babies aged 9 months in
selected wards over a 12-month period. The
resultant sample includes 19,000 infants who
are being followed longitudinally.
Other probabilistic designs may be used
for different circumstances; they include
capture__recapture methods to estimate
population size with mobile populations, and
response-based sampling (see extensive
designs) when a numerically small but
important outcome is over-sampled. In geo-
graphical studies, the standard procedures
may be modified to ensure spatial coverage.
Methods of random, systematic and stratified
sampling of points on amaphave been devised
using coordinate systems, for example, as have
methods of selecting transects (line samples)
across an area (Berry and Baker, 1968).
Increasingly, these designs are being used
adaptively (Thompson and Seber, 2002), so
that the degree ofspatial autocorrelation
is being assessed as the survey proceeds and
there is increased sampling in areas where the
outcome variable is most varied and least spa-
tially dependent.
When testing ahypothesisit is crucial to
assess and control for two types of error in a
probabilistic design. Type I errors, finding an
effect when there really is none, are controlled
Gregory / The Dictionary of Human Geography 9781405132879_4_S Final Proof page 663 1.4.2009 3:23pm
SAMPLING