The Dictionary of Human Geography

(nextflipdebug2) #1

Comp. by: VPugazhenthi Stage : Revises1 ChapterID: 9781405132879_4_S Date:1/4/
09 Time:15:23:42 Filepath:H:/00_Blackwell/00_3B2/Gregory-9781405132879/appln/3B2/
revises/9781405132879_4_S.3d


Many surveys have a complex sampling
design involving clustering, stratification and
disproportionate sampling. These attributes
should be taken into account during analysis
so as to avoid biased estimates of standard
errors and increased likelihood of Type I
errors. Clustering, or multi-stage selection of
sample units, typically generates dependency
so that there is less information than appears.
multi-level modelsestimate and correct for
the degree of dependency even when there are
more than two sampling stages. Their results
can also be substantively interesting, finding
for example that members of the samehouse-
holdtend to vote together (Johnston, Jones,
Sarker, Burgess, Propper and Bolster, 2005).
Stratification involves grouping the sampling
frame into believed-to-be homogeneous
groups and thereby reducing standard error.
Often there is disproportionate sampling of
strata so that having grouped primary sam-
pling units into strata based on percentage
ethnic minority population, ethnic areas are
over-sampled. This requires at the analysis
stage that the over-sampled ethnic areas are
down-weighted to their correct population
proportion. Sturgis (2004), using data from
the 2000 UK Time Use Survey, shows how
these factors should be incorporated into the
estimates and illustrates the threats to infer-
ence if ignored.longitudinal data analysis
faces its own particular analytical problems
arising from the analysis of repeated measures
over time.
Non-response is a growing problem with
surveys. The best approach is to ensure
detailed follow-up so that the issue is minim-
ized at the design and collection phase; this is a
problem where doing nothing is doing some-
thing, and where prevention is better than any
cure. We can distinguish betweenfull, or unit,
non-responseanditem non-response, the latter
being when the respondent has only answered
some questions. For the former, differential
weightingcan be used to reduce bias by boost-
ing the effective size of subgroups (such as
young men) that are under-represented in the
survey, but their relative size is known from
other large-scale surveys or censuses. There is
a danger of increasing standard errors, how-
ever, when the variance of the weights is large.
Most software for quantitative analysis
automatically excludes the entire respondent
when any values are missing and this is known
as complete case analysis. If the data are miss-
ing completely at random (MCAR: Rubin,
1976) this will not bias the results if the obser-
vations are excluded but it will reduce the

effective sample size. If the data are missing
at random (MAR) but the ‘missingness’
depends on recorded information, then com-
plete case analysis can be used, but the deter-
minants of the ‘missingness’ must be included
in the analysis to avoid non-response bias.
This suggests that at collection phase, vari-
ables that should be easier to collect are
obtained alongside those that are thought to
be difficult (e.g. income may be difficult so
also collect information about property value).
If the ‘missingness’ depends on unobserved
predictors, even after accounting for informa-
tion in the observed data, then the data are
said to be not missing at random (NMAR). In
this case, complete case analysis is likely to
produce biased results.
There are two main approaches to ‘missing-
ness’, either explicit modelling of the under-
lying mechanism generating the missing data
or some form of imputation (or ‘guessing’) to
replace the missing values. A number of ad hoc
procedures can be used for the latter, such as
carrying the last observation forward, creating
an extra category for the missing observation,
or replacing missing observations by the mean
of the variable used. All can give unpredictable
results. Consequently, the only practical, gen-
erally applicable, approach for substantial
datasets is multiple imputation whereby each
missing value is replaced by several (typically
less than five) imputed values which come from
an imputation model which also reflects sam-
pling variability. A sensitivity analysis can then
be undertaken to investigate the robustness of
the estimates to differential ‘missingness’.
Survey analysts are aware of criticisms that
see quantitative approaches as imposing
meaning on people’s attitudes and behaviours.
Consequently, as Marsh (1982) argued in her
defence of the survey method, researchers
have been highly attentive to just such issues,
and developments continue to be made. One
set of issues relates to whether respondents,
particularly from different cultures, under-
stand questions in different ways, or if
researchers mean one thing and respondents
think they mean something else. Analytical
approaches to this treat survey questions as a
function of the actual quantity being measured
along with an element of interpersonal incom-
parability that is potentially different for each
respondent. The new idea is to use anchoring
vignettes as a common reference point (King,
Murray, Salomon and Tandon, 2004; King
and Wand, 2007) to measure directly and then
‘subtract off’ the incomparable portion.
Respondents are asked for their own response

Gregory / The Dictionary of Human Geography 9781405132879_4_S Final Proof page 736 1.4.2009 3:23pm

SURVEY ANALYSIS
Free download pdf