SEO: Search Engine Optimization Bible

(Barré) #1
It pays to know which crawler belongs to what search engine, because there are some spambotsand
other malicious crawlers out there that are interested in crawling your site for less than ethical rea-
sons. If you know the names of these crawlers, you can keep them off of your site and keep your
users’ information safe. Spambots in particular are troublesome, because they crawl along the Web
searching out and collecting anything that appears to be an e-mail address. These addresses are then
collected and sold to marketers or even people who are not interested in legitimate business oppor-
tunities. Most spambots will ignore your robots.txt file.

You can view the robots.txt file for any web site that has one by adding the robots
.txtextension to the base URL of the site. For example, http://www.sampleaddress.com/
robots.txtwill display a page that shows you the text file guiding robots for that site. If you use
that extension on a URL and it doesn’t pull up the robots.txtfile, then the web site does not have one.

If you don’t have a robots.txt file, you can create one in any text editor. And keep in mind that not
everyone wants or needs to use the robots.txt file. If you don’t care who is crawling your site, then
don’t even create the file. Whatever you do, though, don’t use a blank robots.txt file. Crawlers auto-
matically assume an empty file means you don’t want your site to be crawled. So using the blank file
is a good way to keep yourself out of search engine results.

Robots Meta Tag


Not everyone has access to their web server, but they still want to have control over how crawlers
behave on their web site. If you’re one of those, you can still control the crawlers that come to your
site. Instead of using the robots.txt file, you use a robots meta tagto make your preferences known
to the crawlers.

The robots meta tag is a small piece of HTML code that is inserted into the <HEAD>tag of your
web site and it works generally in the same manner that the robots.txt file does. You include your
instructions for crawlers inside the tags. The following example shows you how your robots meta
tag might look:

<html>
<head>
<meta name=”robots” content=”noindex, nofollow”>
<meta name=”description” content=”page description.”>
<title>
Web Site Title
</title>
</head>
<body>

This bit of HTML tells crawlers not to index the content on the site and not to follow the links on
the site. Of course, that might not be exactly what you had in mind. You can also use several other
robots meta tags for combinations of following, not following, indexing, and not indexing:

<meta name=”robots” content=”index,follow”>
<meta name=”robots” content=”noindex,follow”>

TIPTIP


232


Part III Optimizing Search Strategies


75002c16.qxd:Layout 1 11/7/07 9:55 AM Page 232

Free download pdf