Advances in Corpus-based Contrastive Linguistics - Studies in honour of Stig Johansson

(Joyce) #1

72 Rosa Rabadán and Marlén Izquierdo



  1. Control data: The CREA corpus


In the selection of the querying elements for CREA to establish significant differ-
ences (or otherwise) between translated and original data, two important consid-
erations need to be borne in mind: (i) this study aims to analyse the distribution of
formally diverse resources, not particular lexical items, e.g. prepositional phrases,
clausal negatives; (ii) when searching for parts of items or combinations of items
some (apparently obvious) searches are inefficient, because the querying capabili-
ties of CREA do not match those of P-ACTRES exactly.
Hence, depending on the nature of the input resource, one of two different
strategies were employed: (i) to search the CREA 10,000-item frequency list for
the ten most frequent occurrences of one particular resource in non-translated
Spanish,^5 e.g. affixed negative items, and use them as querying inputs; (ii) to use
P-ACTRES findings as input query in CREA. This second strategy is employed
when the first one is either not possible or simply inefficient. For example, search-
ing for the negative pattern sin + N in CREA is out of the question, but using the
top ten sin + N combinations yielded by P-ACTRES and running them against the
CREA frequency list results in a far more robust set of querying items. However,
for some types of search, due to the degree of lexicalization and/or grammatical-
ization in Spanish, it is recommendable to confine the search in CREA to the ten
most frequent P-ACTRES findings, as is the case with No + (positive) lexical item.
The frequency list strategy has been applied to affixal, lexical and clausal nega-
tion; the top ten diagnostic findings (in one of its variants) have served as querying
strategy for the rest.
Affixal negation. The CREA 10,000-item frequency list was searched for the ten
most frequent affixal negative items in Spanish (see Table 8). The search yielded
a population (N) of 5,388 occurrences, which constitute the raw figures of our
control data (see Table 9).

Table 8. CREA querying items for affixal negation^6
CREA order Absolute freq. Relative freq.


  1. imposible 14,178 92.93

  2. desconocido 4,399 28.83

  3. imprescindible 4,418 28.95

  4. http://corpus.rae.es/frec/10000_formas.TXT

  5. The standard English equivalents listed in the same order are: impossible, unknown, impera-
    tive, useless, unable, essential, illegal, incredible, unconscious, invisible.

Free download pdf