SEO: Search Engine Optimization Bible

(Barré) #1
User-agent: CrawlerName
Disallow: /tmp/
Disallow: /links/listing.html

This bit of text tells crawlers first that allcrawlers should ignore the temporary directories. So
every crawler reading that file will automatically ignore the temporary files. But you’ve also told a
specific crawler (indicated by CrawlerName) to disallow both temporary directories and the links
on the Listing page. The problem is, the specified crawler will never get that message because it
has already read that all crawlers should ignore the temporary directories.

If you want to command multiple crawlers, you need to first begin by naming the crawlers you want
to control. Only after they’ve been named should you leave your instructions for all crawlers. Written
properly, the text from the preceding code should look like this:

User-agent: CrawlerName
Disallow: /tmp/
Disallow: /links/listing.html

User-agent: *
Disallow: /tmp/

If you have certain pages or links that you want the crawler to ignore, you can accomplish
this without causing the crawler to ignore a whole site or a whole directory or having to
put a specific meta tag on each page.

Each search engine crawler goes by a different name, and if you look at your web server log, you’ll
probably see that name. Here’s a quick list of some of the crawler names that you’re likely to see in
that web server log:

 Google: Googlebot
 MSN: MSNbot
 Yahoo! Web Search: Yahoo SLURP or just SLURP
 Ask: Teoma
 AltaVista: Scooter
 LookSmart: MantraAgent
 WebCrawler: WebCrawler
 SearchHippo: Fluffy the Spider

These are just a few of the search engine crawlers that might crawl across your site. You can find a
complete list along with the text of the Robots Exclusion Standard document on the Web Robots
Pages (www.robotstxt.org). Take the time to read the Robots Exclusion Standard document.
It’s not terribly long, and reading it will help you understand how search crawlers interact with
your web site. That understanding can also help you learn how to control crawlers better when
they come to visit.

NOTENOTE


231


Robots, Spiders, and Crawlers 16


75002c16.qxd:Layout 1 11/7/07 9:55 AM Page 231

Free download pdf