The Internet Encyclopedia (Volume 3)

(coco) #1

P1: JDW


Research WL040/Bidgolio-Vol I WL040-Sample.cls June 19, 2003 17:13 Char Count= 0


SEARCHENGINES 203

from every continent. Complete with log-in instructions,
this access was the first time researchers could access li-
braries across time and space and examine holdings.
The next major development in directories was the cre-
ation of Yahoo!. Yahoo! was conceived as a personal map
of the Internet, highlighting sites useful to Filo and Yang’s
Ph.D. research, as well as Web sites of personal interest.
The magnitude of content eventually necessitated a
division into categories, and subsequently subcategories.
Yahoo! celebrated its first million-hit day in the fall of
1994, and a new era of locating content on the WWW had
successfully germinated.
The number of Internet directories has continued to
multiply, even as their scope divides. Directories with ver-
tical subject emphasis have emerged, often hosted by aca-
demic societies, libraries, universities and research facili-
ties. These directories are extremely valuable for research
purposes. As directories propagated, metadirectories, or
directories of directories, materialized. These are exceed-
ingly useful tools for locating directories that index nar-
row and obscure subject areas. A search for a subject com-
bined with the phrase “Internet directory” (for example,
herpetology and “Internet directory”) in a good search en-
gine will often yield valuable results.
While most large, general directories such as Yahoo!
or The Open Directory Project represent the compilation
of numerous contributors, there are a great number of
personal directories that are exceedingly useful for re-
searchers. These vary in size from a short list of related
links to an extensive directory such as Alan Liu’s Voice of
the Shuttle, a humanities directory hosted by the Univer-
sity of California, Santa Barbara.

Rings
Rings, although a different entity, share some characteris-
tics with Internet directories. Rings exploit the intercon-
nectivity of the Web by linking sites within a particular
specialization. Rings consist of member sites which all
have a stake in the authority and credibility of the entire
Ring.
While the creation and maintenance of early rings,
such as EUROPa, were left to the participants, commer-
cial software and servers now offer ring creators and
participants the luxury of HTML templates, scripting lan-
guage, and automatic updates. Some of these commercial
services, such as Yahoo! Rings and WebRing, offer direc-
tories of the thousands of Rings using their software or
servers. Although rings are particularly popular in the ar-
eas of sports, recreation, entertainment, and other areas of
popular culture, the serious researcher should not neglect
this resource, since they often contain sites that are oth-
erwise unavailable. Some Ring sites require registration
to use, while others allow casual users to browse freely.

Weblogs or Blogs
Web logs, or blogs, are a recent hybridization of personal
directories, electronic newsletters, clipping services, and
alert services. Blogs are collections of links to Internet in-
formation in specific areas that the author feels are useful
and important. Typically authored by one person, blogs
vary highly in content, scope, and purpose. They are often

idiosyncratic and represent strong opinions and passions.
They are updated frequently and often feature brief anno-
tations. Some blogs offer search capabilities. Directories
of blogs exist on many of the major Internet directories.

SEARCH ENGINES
Before a user can effectively utilize Internet information,
he or she needs to know that the information exists. The
most widely used tools for locating information on the In-
ternet are commonly called search engines. The name is
somewhat of a misnomer, however. What Internet users
commonly refer to as a search engine is actually an in-
formation retrieval system (Liddy, 2001). A search engine
typically consists of several components: a document lo-
cator, a document processor, a query processor, a search
and match processor, and a relevancy ranking delivery
system.

Pre-WWW Search Engines
The pre-WWW search engines were rudimentary prod-
ucts based on early database applications, but unlike
database software, they were dealing with a chaotic, of-
ten disconnected collection of data, information, and files
in varied formats. The primary problems for researchers
with the rudimentary search tools, Archie, Jughead, and
Veronica (which searched Gophers), and wide area in-
formation servers (WAIS) were three-fold. File names or
Gopher menu headings, the primary search field, often
had nothing to do with content. There were many dis-
crete networks and no one tool searched all. And each
had different commands and handled information differ-
ently. WAIS deserves special mention because it was the
first full-text Internet search tool.
Gopher was fated by the fact that its creation was virtu-
ally simultaneous to the creation and release of the World
Wide Web by Tim Berners-Lee at CERN in 1991. While the
WWW floundered until the creation of a decent browser,
Mosaic in 1993, use of the Web grew 341,634% and
Gopher 997% over the next year, and Gopher, Veronica,
and Jughead faded into relative obscurity.

WWW Search Engines
With the advent of the WWW and capable Web browsers,
Internet search engines began to actualize the capabili-
ties of what researchers had come to expect when using
commercial database services. The evolution of these in-
formation retrieval systems has been extremely vigorous.
The primary characteristic of WWW search engines is
that they typically search the full text of the documents
they have indexed. Full text searching had existed on the
Internet before these applications but it suddenly became
the status quo. This represented a departure in technique
for many researchers, who were accustomed to authority
language such as descriptors and subject headings.
Web search engines obtain their index content in sev-
eral ways, but the most prominent is by using bots,
spiders, crawlers, or other intelligent agents. These au-
tomated agents utilize hyperlinks to “travel” the expanse
of the Web, and identify and index Web content. Most
search engine providers have numerous crawlers working
Free download pdf