P1: IML
Wisman WL040/Bidgoli-Vol III-Ch-59 August 14, 2003 18:3 Char Count= 0
HOW TOBESEARCHED—VIEWSFROM THEWEBSITE 735solution is to invisibly redirect visitors requesting each old
page to the new page location through the server configu-
ration or an individual page redirection file for each page
moved.Self-Search
Does the Web site need its own search engine? One can
easily provide visitors with site search by placing a link on
site pages to a Web search engine. But Web search engines
are far from perfect at indexing Web sites, ignoring pages
that are important, and that search results focus not on
the one site alone but are mixed with results from other
sites. Moreover, changes to the site are ignored for the
long periods between spider visits.
Web search engines, which bring most new visitors
to a Web site, are often poorly suited for searching a
site exclusively. However visitors arrive, one authority
(Neilsen, 2000) has found that many visitors immediately
use search on arrival as the preferred means of locating in-
formation and ignore site navigation aides; those visitors
need a search engine tailored to the Web site.
What should a designer look for in a search engine?
A search engine for a site is comparable in function to
a Web search engine but can limit search to the given
site. Beyond the raw power required to index a complete
Web site, a key capability is to create and search spec-
ified branches of the Web site. Indexing every word of
the entire site is easy to do but ignores the different rea-
sons visitors search the site and the principle that search
works best when the information searched is narrow and
fairly homogenous. For example, a university Web site is
probably best searched by separating the business and sci-
ence schools and giving visitors the choice of whether to
search the business, science, or all school categories com-
bined. Separating business from science capitalizes upon
natural and recognized differences to create a more nar-
row, homogenous, and recognizable information area to
search.
Another issue is the sophistication and flexibility of the
search engine. Does it support automatic or manual word
stemming, common misspellings, indexing of the HTML
tags, synonyms, inclusion and exclusion query operators,
and phrase and proximity search?
Two basic options exist for search dedicated to a site:
site operated or by retaining an offsite search service.
Having someone else handle search is the easier solu-
tion but does not necessarily match the owner’s needs;
handling search oneself can be more flexible but involves
more work and expense.Search Service
Search service companies will index a few pages or an
entire Web site, will operate the search engine on their
computer, and are in some cases free. The free services
generally index a limited number of pages; in return the
service places advertising on each search result page. A
good service should index pages located on any Web site,
index with reasonable promptness whenever the owner
chooses, and provide regular summaries of search activ-
ity for the site. The main advantage is that someone else
maintains the search engine.The main disadvantages to a search service are the pos-
sible continued cost, the possible limit on pages indexed,
scheduling the occasional reindexing, the lack of control
over the search results or the result page appearance, and
that the advertising banners may not impress your site vis-
itors. However, the most serious problem occurs should
the service company drastically change policies or tech-
nology or go out of business, forcing the Web site search
to change.Reasons for a Private Search Engine
The only compelling reason to operate a private search
engine is to benefit site visitors. Fortunately, operating
a basic search engine can be relatively easy and many
Web server systems include a search engine. Commer-
cially packaged Linux systems come with the ht://Dig en-
gine installed and ready to index the entire server; the
owner need only type the “./rundig” command and add a
search form link to the site pages.
The main advantages of a private search engine are
control of parameters such as indexing depth and access
to the information of the logs produced by the search en-
gine during indexing and visitor queries. The following
examines common search engine control parameters:Excerpts:Search results can include text excerpted from
a page to help place query keywords in context. Be-
cause the excerpting generally takes place during in-
dexing, the searcher needs a little luck for the ex-
cerpt to include at least some of his or her query
terms. Controlling the size of the excerpt makes it pos-
sible to improve the likelihood that the excerpt will
contain some query keywords or possibly the entire
page.
Indexing:It is important to control the number of words
indexed and how often indexing of the site occurs.
Although most pages contain fewer than 2000 words
or about 7 pages of typed text, indexing spiders with
the word limit too low can routinely miss indexing
important parts of large pages. Also important to
sites with news, pricing, and other frequently updated
information is how often indexing occurs and whether
it occurs whenever information changes.
Stop Words:Stop words often include numbers and
common words such as “the,” “computer,” “system,”
and “HTML” and should not be indexed. Words that
occur often on many Web site pages should also be
included in the stop word list. For example, on a bread
Web site, the word “baking” would possess little value
in discriminating one page from another.
Measuring Success:Successful searches mean that the
visitor follows or “clicks through” to a suggested page.
The Web server access log will list the page the visitor
follows to, but only the search engine can log the
query words entered to find the page; connecting the
query words with the referred page measures the ef-
fectiveness of search on a Web site. Other measures of
search success are whether visitors actually sought the
information they were given or immediately searched
again, and whether they followed the referred page
further to accomplish some task such as buying a