Advances in Corpus-based Contrastive Linguistics - Studies in honour of Stig Johansson

(Joyce) #1

Quantity approximation in English and French 143


An inductive corpus-driven approach (Tognini-Bonelli 2001; Biber 2009)
was adopted to uncover the linguistic devices that are used around numbers
to approximate quantities. Numbers are used as a starting point as they can be
identified relatively easily in the two languages under study using part-of-speech
tagged versions of the corpora. BENews and FRENews were part-of-speech tagged
using the Tree Tagger (Schmid 1994) which features a tag for cardinal numbers
in English (‘CD’) and in French (‘NUM’). A separate recall study showed that
number nouns like hundreds and dizaine are not tagged as ‘CD’ or ‘NUM’ by
the Tree Tagger, although they are considered as numbers within the framework
of the project. Numbers that are directly preceded or followed by a symbol (e.g.
$20m) are also tagged differently in the Tree Tagger, mostly as ‘JJ’ (‘adjective’) in
English (e.g. 99p) and ‘ABR’ (‘abbreviation’) in French (e.g. +4%). As a result, both
‘CD’-, ‘JJ’-, ‘NUM’- and ‘ABR’-tagged items and number nouns such as hundreds
were retrieved from the corpora using Concord in WordSmith Tools (Scott 2008).
The concordances were manually scanned in order to discard the numbers that
do not denote quantities (e.g. dates: January 14 2008 (BENews); years: Christmas
2006 (BENews) or numbers which are part of a name: BBC2 (BENews)) and to
uncover the various approximators used around the numbers. An ‘approximator’
was defined as any linguistic device (i.e. word, part of a word or punctuation mark)
that is used around a number to denote an approximate quantity (e.g. some £100
billion, a 10-year-plus transformation, 30–40 euros). The approximators uncovered
in the concordances from the POS-tagged data under study were then retrieved
from the raw data to make sure all items were recorded, including approxima-
tors occurring around numbers that had been tagged incorrectly (e.g. $700m was
tagged as a noun in about $700m).



  1. Do English and French tend to use imprecision around numbers
    denoting quantities to the same extent?


As can be seen from Table 1, although BENews contains fewer numbers denot-
ing quantities than FRENews (the difference is statistically highly significant at
p < 0.0001) the number (tokens) and proportion of numbers denoting quantities
that occur together with an approximator are higher in BENews than in FRENews
(the difference is statistically significant at p < 0.0001). In other words, combina-
tions of approximators and numbers denoting quantities appear to be more typical
of BENews.

Free download pdf