Expert C Programming

(Jeff_L) #1

It comes as no surprise that the terminal above is actually a computer program. It's operating just as
Eliza was; it analyzes the syntax and keywords in the text from the interrogator, and selects something
with a matching topic from its huge database of canned phrases. It avoids the "doctor's dilemma" by
not parroting back part of the interrogator's remark, instead keeping the talk flowing by continually
raising new (though related) topics.


It's also no surprise that the program represented above deluded five of the ten interrogators, who
marked it down as human after this and more lengthy interchanges with it. Third time unlucky for the
Turing test, and it's out for the count.


Conclusions


The above program's inability to directly answer a straightforward question ("[do you mean]
something like a hunch?") is a dead giveaway to a computer scientist, and highlights the central
weakness in the Turing test: simply exchanging semi-appropriate phrases doesn't indicate thought—
we have to look at the content of what is communicated.


The Turing test has repeatedly been shown to be inadequate. It relies on surface appear-ances, and
people are too easily deceived by surface appearance. Quite apart from the significant philosophical
question of whether mimicking the outward signs of an activity is evidence of the inner human
processes which accompany that activity, human interro-gators have usually proven incapable of
accurately making the necessary distinctions. Since the only entities in everyday experience that
converse are people, it's natural to assume that any conversation (no matter how stilted) is with
another person.


Despite the empirical failures, the artificial intelligence community is very unwilling to let the test go.
There are many defenses of it in the literature. Its theoretical simplicity has a compelling charm; but if
something does not work in practice, it must be revised or scrapped.


The original Turing test was phrased in terms of the interrogator being able to distinguish a woman,
from a man masquerading as a woman, over a teletype. Turing did not directly address in his paper
that the test would probably be inadequate for this, too.


One might think that all that is necessary is to reemphasize this aspect of the conversation; that is,
require the interrogator to debate the teletype on whether it is human or not. I doubt that is likely to be
any more fruitful. For simplicity, the 1991 Computer Museum tests restricted the conversation to a
single domain for each teletype. Different programs had different knowledge bases, covering topics as
diverse as shopping, the weather, whimsy, and so on. All that would be needed is to give the program
a set of likely remarks and clever responses on the human condition. Turing wrote that five minutes
would be adequate time for the trial; that doesn't seem nearly adequate these days.


One way to fix the Turing test is to repair the weak link: the element of human gullibility. Just as we
require doctors to pass several years of study before they can conduct medical examinations, so we
must add the condition that the Turing interrogators should not be representatives of the average
person in the street. The interrogators should instead be well versed in computer science, perhaps
graduate students familiar with the capabilities and weaknesses of computer systems. Then they won't
be thrown off by witty remarks extracted from a large database in lieu of real answers.


Another interesting idea is to explore the sense of humor displayed by the terminal. Ask it to
distinguish whether a particular story qualifies as a joke or not, and explain why it is funny. I think
such a test is too severe—too many people would fail it.

Free download pdf