test-takers are asked to summarize this paragraph in their own words while listening
to it, the category wordsituationcould be omitted in the original input. Then,
test-takers need to reach a hypernym from hyponyms like“party”and“meeting”.In
order to test test-takers’summarizing ability, no exact general word should be given
in the audio input and then test-takers naturally resort to their summarizing skills,
i.e. deleting, generating or reconstructing the relevant information instead of merely
depending upon their selection of words.
Finally, there must be a pretest for large-scaled high-stakes tests. Especially for a
gap-filling task, the given information must not serve as direct clues to the answers
to the gaps. It is acknowledged that designing gaps calibrated toward testing
inferences, i.e., the designing of indirect items, is the most challenging. In the case
of gap-filling on summaries, gaps can test general linguistic knowledge, discourse
knowledge, inferencing ability, dependent upon the nature of information to be
filled in. Buck (2001: 147) categorized two types of inferences in test tasks, one is
what is intended by the speaker, considered construct-relevant, and the other about
guessing what the test-developer expects and what the task-specific test-taking
strategy is, considered construct-irrelevant. For gap-filling tasks, clues in the given
information might betray the answer. Therefore, in order to minimize those
construct-irrelevant variance, certain measures should be adopted. First, pre-test the
items on a small sample similar to the target population. Afterward, test-developers
need to readjust the difficulty level, differentiation index according to the pretest
results and the sample’s feedback of the test, especially on the construct-irrelevant
variance, and thereafter, all the results are subject to modification and thus
low-quality items can be eliminated. On the other hand, experienced teachers can
also be hired to spot the construct-irrelevant items. Again, double check can help
guarantee the quality of test items.
9.5 Integration of Task Types
Field (2013) has argued that item writers target information at several different
levels of cognitive processing and different test formats might target different
cognitive processes. We have emphasized that even for one task type like the
gap-filling task, different targets are realized via the type of information those gaps
are calibrated to. Other test types, such as summarizing or multiple matching are
adapted to more global processing (Field 2013: 142).
In the current project, the verbal protocols across the test-taking phase and
retelling phase related to all the four gap types consistently present test-takers’
different cognitive operations involved in decoding, selective attention, meaning
and discourse construction. Therefore, retelling can be treated as a way to assess
academic listening construct. If a written format is needed, test-takers can write
down their summary of the mini-lecture after hearing it. Since a whole summary is a
complete discourse, test-takers’ discourse construction process is successfully
attained.
162 9 Conclusion and Recommendations