Sams Teach Yourself HTML, CSS & JavaScript Web Publishing in One Hour a Day

(singke) #1
ptg16476052

Tools for Tracking and Managing SEO 697

25


An XML sitemap is the best method. It is widely accepted by all major search engines,
and it is extremely easy for them to parse. You can create an XML sitemap yourself with
any text editor, but it’s better to use a sitemap generator.


RSS sitemaps are easy to maintain because they are often built automatically by site
tools like blogs. RSS sitemaps are often updated automatically, but they can be large and
harder to manage because they provide more information than the plain XML format
does.


Text sitemaps are the easiest to maintain. They are typically built as one URL per line
with up to 50,000 lines. But these text files provide no information outside of the URLs
themselves.


If you want to learn more about sitemaps, visit Sitemaps.org, or you can build your own
sitemap at XML-Sitemaps.com.


The robots.txt File


The robots.txt file is a file of that exact name stored in the root of your web server. It
provides instructions to spiders and other web page crawlers about what pages of your
site they should and should not visit. When you use a robots.txt file, you can indicate
to search engines where your sitemap is, directories and pages you do not want them to
visit, and even how often they should return to your site to find new content.


A sample robots.txt file looks like this:


User-agent: *
Disallow: /includes/
Disallow: /misc/


User-agent: googlebot
Disallow: /nosearch/


The first section says that all robots should not visit the /includes/ and /misc/ directo-
ries. The second section suggests that just the Googlebot should not visit the /nosearch/
directory.


Not all web robots read or follow the robots.txt file. Some are
not well written and don’t check the file, and others have nefari-
ous purposes and will deliberately seek out directories you’ve
marked private. If the directories you’ve disallowed are critical,
use some other form of protection like HTAccess to prevent robots
from crawling them.

CAUTION
Free download pdf