The language of the Web 211
and automatic document classification, once software is adapted
to cope.^25
Thelackofevenanelementarysemanticsalsobedevilsthosesoft-
ware systems which attempt to evaluate the content of Web sites
(censorware), replacing parts of words by X’s, filtering out pages, or
blocking access to sites which contain ‘dangerous’ words.^26 Thus,
in one report, a student was barred from his high school’s Web site
from the school library because the software objected to his word
high.A woman named Hilary Anne was not allowed to register the
usernamehilaryannewith a particular e-mail company because
it contained the wordaryan.Sites which fight against pornogra-
phy can be banned because they contain words likepornography.
In 2000, Digital Freedom Network held a ‘Foil the Filters’ con-
test to demonstrate the unreliability of censorware. Their Silicon
Eye Award (‘for finding objectionable content where only a com-
puter would look’) was given to someone who wanted to register
an account with a site which would not accept the nameHeather
because it contained the phraseeather! Honourable mentions were
given to another enquirer who could not access a biotechnology
site because its name (accessexcellence.org) contained the wordsex.
Doubtless residents ofEssexandSussex, people calledCockburnand
Babcock, or anyone who usesDickas their first name, encounter
such problems routinely. Other examples of words which have been
banned includecucumbers(because it containscum),Matsushita
(shit),analysis(anal),class(ass), andspeech(pee). More puzzlingly,
among the words which some cyberware systems have blocked are
golden,mate, andscoop. The linguistic naivety which lies behind
such decision-making beggars belief.
(^25) This is no small problem either. My initial semantic investigation of the keywords cor-
responding to the encyclopedia classes for the ALFIE project (fn. 23) produced over a
hundred keywords for each class. This proved impossible for search-engine software to
implement, so the number of keywords was reduced to an arbitrary 40. Even then, some
applications could not cope with such a number. But working with large clusters of lexical
itemsisasinequanonof any semantic approach to Web-searching. A large question-mark
therefore hangs over the semantic approach to the Web envisioned by Berners-Lee, at
26 least until such time as much more powerful processing options become available.
The examples in this paragraph all come from the Digital Freedom Network site:
http://dfn.org/focus/censor/contest.htm>.