Advances in Corpus-based Contrastive Linguistics - Studies in honour of Stig Johansson

(Joyce) #1

Recurrent word-combinations in contrast 181


Table 1. Examples of three-word combinations produced by the n-gram lists^5


Three-word
combination


No. of
occurrences
in English
original texts

No. of
occurrences
in English
translated texts

No. of
occurrences
in Norwegian
original texts

No. of
occurrences
in Norwegian
translated texts
a long time 23 103
the same time 18 76
for a moment 37 79
the first time 44 70


all the way 17 49
all the same 13 37
i det hele
(Gloss: on the
whole < 10)


34 51

i det minste
(Gloss: in the
least < 10)


20 27

The word combinations listed in Table 1 drew our attention in particular, as all of
them have lexical constraints, all are (potentially) meaningful units, some have
fully transparent meaning, some can have fully or semi-transparent meaning,
some are fully opaque, and all have a greater frequency in translated text than in
original text. Thus, these groups of three-word combinations form the basis for
three separate case-studies.



  1. Methodological issues


Even though n-grams of three (orthographic) words have been shown to be fre-
quent enough in small corpora to yield interesting data for English, this may not
hold for other languages. Norwegian, for instance, like German, tends to form
compounds by joining word stems together whereas English spelling keeps them
apart, as with mobiltelefon versus mobile phone. Furthermore, definiteness is typi-
cally encoded by a suffix in Norwegian, not a separate word such as the, some-
thing that may also influence the size of n (-grams). Moreover, variant spellings
of words and expressions of the language(s) compared should not be underes-
timated. In Norwegian, for instance, i hvert fall (‘at least’) can also be written



  1. Because the subcorpora in the ENPC are of the same size, raw figures are directly
    comparable.

Free download pdf