P1: IML
Wisman WL040/Bidgoli-Vol III-Ch-59 August 14, 2003 18:3 Char Count= 0
HOWTOSEARCH—THESEARCHER’SVIEW 727Figure 2: Retrieved versus relevant references when
searching the Web. The intersection of retrieved and rel-
evant pages will likely be relatively small in part because
only a fraction of the Web is indexed.Several basic definitions common in the search liter-
ature, “relevancy,” “recall,” and “precision,” are also of
value to searchers. Relevancy measures the usefulness of
the references retrieved and is highly subjective. Figure 2
illustrates three useful points: (1) that relevant references
are generally a subset of all the references available;
(2) that retrieved information generally includes some
irrelevant references; and (3) that the intersection of re-
trieved and relevant references will often exclude ref-
erences that are relevant (Belew, 2000). The following
definitions ofrecallandprecisionnicely complement the
intuition illustrated in Figure 2:Recal l=|Retrieved∩Relevant|
|Relevant|Precision=|Retrieved∩Relevant|
|Retrieved|.Recall is the percentage of relevant references retrieved. A
recall of 50% means that only one-half of the relevant ref-
erences were retrieved; retrieving every possible reference
yields 100% recall but often includes many references that
are irrelevant. Precision is the percentage of retrieved ref-
erences that are relevant. Of the references retrieved, aprecision of 50% means that only one half of the refer-
ences were relevant; returning only one relevant refer-
ence yields 100% precision but possibly very poor recall.
A “weight loss” query can yield perfect recall by retrieving
every Web page in existence but will likely produce many
that are not relevant, so that precision is poor. Simulta-
neously achieving perfect recall and precision is nearly
impossible even when carefully designed and constructed
sets of known information are searched. Given the scope,
size, disorganization, and diversity of information sources
available on the Web, merely retrieving relevant informa-
tion is daunting.
Search engines naturally exhibit great recall when find-
ing millions of pages for a broad query, but narrowing the
query can greatly improve precision by eliminating many
pages of questionable relevancy. Fine control of a search
can yield higher quality references, is relatively easy to
use, and is common to most search engines. Table 1 lists
common search controls for popular engines with details
and examples of search refinement provided in the follow-
ing discussion (Sullivan, 2001).Adding Search Terms:The simplest and often most ef-
fective method of improving the precision of search re-
sults is adding search terms with a more precise mean-
ing. The query “weight loss” produces millions of pages
using the Teoma search engine, each page containing
the word “weight,” “loss,” or both. Adding terms for
“vegetarian weight loss diet” produces merely 10,000
but more focused references. Each page found still
contains at least one or more of the query words. Be-
cause search engines generally rank pages higher that
match more query terms, pages with fewer discrimi-
nating terms are effectively ignored.
Searching for Phrases:Phrases of specific word group-
ings can yield more precise searches than the same
words when matched independently. Searching on theTable 1Common Search Controls for Popular EnginesQuery Match How Example Popular Search Engine Support
Any Automatic vegetarian diet Teoma, Yahoo, Inktomi
OR vegetarian OR diet AltaVista, Yahoo, MSN, Google
All Automatic vegetarian diet AltaVista, Google
AND vegetarian AND diet AltaVista, Yahoo, MSN
Include ++vegetarian diet AltaVista, Yahoo, MSN, Google, Inktomi
Exclude − vegetarian–diet AltaVista, Yahoo, MSN, Google, Inktomi
AND NOT vegetarian AND NOT diet AltaVista, MSN, Inktomi
Phrase “ “ “vegetarian diet” Google, Yahoo, Inktomi, Teoma
Wildcard ∗ veget∗diet Altavista, Yahoo, MSN
Proximity NEAR vegetable NEAR diet AltaVista
Title title: title: ”vegetarian diet” Altavista, Yahoo, Inktomi
intitle: intitle: “vegetarian diet” Google
Site Only site: site:food.com vegetarian Google, Inktomi
host: host:food.com vegetarian AltaVista
URL Only url: url:food.com AltaVista, Yahoo, Inktomi
allinurl: allinurl:food.com/diet/ Google
All References link: link:www.food.com AltaVista