The Internet Encyclopedia (Volume 3)

(coco) #1

P1: JDW


Research WL040/Bidgolio-Vol I WL040-Sample.cls June 19, 2003 17:13 Char Count= 0


204 RESEARCH ON THEINTERNET

in concert. Providers also rely on content that is submit-
ted by content creators. Even today, none of the most so-
phisticated Web search engines come close to indexing
all Web content. There are several reasons for this that
will be discussed later in this section. Once the content
is identified, the document processing component iden-
tifies the elements that require indexing, removes stop-
words, performs stemming (identifies the root of a word
for truncation searching), and stores the indexed terms
in an inverted file. The program then weights the terms,
typically by weighting the term’s position within the doc-
ument’s fields, the number of occurrences of the term,
and other components. This component differs between
search engines, and leads, in part, to different result lists
and/or differing position of results when the same search
is submitted to several search engines. Different search
engines also process varying amounts of information from
each site, with some concentrating on the first few pages,
others indexing the entire documents. This is why it is
critical for serious researchers to use more than one
search engine. When all these processes have been com-
pleted, the results are stored in an index, or searchable
database.
Components of a successful search consist of the
user’s query, and how that query is constructed and
processed. Competent researchers successfully strategize
their search by teasing out key words, synonyms, and
names and exploiting the advanced features of the search
software (fields, truncation, domain, and phrase search-
ing), which differ from engine to engine.
Internet search engines are unique with regard to what
search options they offer users and how they process a
search query. Search options include using natural lan-
guage, Boolean operators, truncation, proximity opera-
tors; and/or searching by phrase, field, file type, language,
case, URL, or “links to URL.” They typically do not include
authority vocabulary (descriptors and subject headings),
although some software allows for metatag field searches.
The lack of consistent authority language is still a major
obstacle to increasing relevance, but the range of content
and content providers on the Internet makes authority
control virtually impossible.
Search engines also vary considerably in how they cre-
ate relevancy rankings for search results. The most com-
mon methods include how often the term or terms occur;
in what proximity they occur; in what fields they occur;
and how many links exist to a Web page. In addition, it is
becoming more common to charge money for search re-
turn placement. This procedure, called “paid placement,”
guarantees those willing to pay placement within the top
results. Paid placement is seen as a detriment to fair rel-
evancy, but typically involves commercial (dot-com) sites
which often do not factor highly in research. The more
upright search engines either segregate paid results or do
not accept paid placement.
How results are displayed also varies considerably,
with relevancy-ranked annotated lists the most common
search outcome. Some newer search engines employ
artificial intelligence programs to process requests and
display results as related topics, subcategories, and alter-
native topics (for example, a search for “Dolphins” might
return the categories “Marine Mammals,” “Whales and

Dolphins,” and “Miami Dolphins”). Users can then modify
their original search based on the most accurate category.
While search engines of this nature are designed primarily
for novice searchers, the categories and concepts returned
can prove valuable by offering the researcher additional
or related terminology.
General search engines are considered horizontal
search tools, in that they search across all subject areas.
Specialized search engines are vertical search tools and
address a researcher’s need to explore in depth a particu-
lar subject area. They have evolved rapidly and exist for
numerous subjects. A search on a general search engine
for specialized engines, for example (“search engine” ge-
ology), will generally give researchers the tool they desire,
as will directories of search engines such as Beaucoup or
Search Engine Colossus.
Meta-search engines are search engines that have no
database or inverted file of their own, but run a query si-
multaneously against a set of independent Internet search
engines. Some meta-search engines allow the user to con-
figure which engines are searched, while others operate
with default settings. These tools can prove extremely use-
ful for obscure topics because they cover so much ground.
Their major drawbacks are that meta-search engines can-
not handle customized searches well, due to the vagaries
of how individual engines handle a search, and some of
them fail to remove duplicates. Their software continues
to improve, however, and some of the new generation en-
gines are proving extremely useful for a wide range of
needs.
In addition to search engines, a number of commercial
products exist which add value to results in terms of or-
ganization, categorization, storage, and note integration.
Virtually all fee-based professional journal indexes and
databases now have Web interfaces. These can often be
readily accessed through local libraries. Many of these
databases index Web sites as well as journal articles. Ver-
sions of government-funded databases such as Medline,
Agricola, and ERIC are free of charge to all users and of-
fer excellent user interfaces. There are also some terrific
databases such as Ingenta and Scirus that offer document
delivery options at a cost, but offer searching and biblio-
graphic information free of charge.

Ask an Expert Services
These are typically commercial services that are modeled
after a library reference service, but actually have their
precursors in BBSs such as EasyNet. A user is connected
with an “expert” in the chosen field, who then provides re-
search assistance. While these services are generally free,
some do cost. And while they can prove useful, their pri-
mary drawback is their lack of screening “experts.”

Maximizing Search Engine Effectiveness
The difference in all areas of search engine performance is
truly staggering. For this reason, an effective researcher
needs to consistently use and master several search en-
gines, exploiting their differences. It is critical to explore
help pages, use advanced search options, and practice re-
peatedly. Keeping current on search engine developments
is also critical, since this technology changes daily. For
Free download pdf