Rim Thoughts Logo

Rim Thoughts And Ideas

Rim Thoughts Logo

 

Site Home

Opinion Articles

Web Design

Rim Drops

Poetry

Site Map

E-mail Us


 

 

 

   

Robots.Txt File


 

Robots.txt Can Improve Your Listing

 

Most of you know what a robots.txt file is and what is does but just in case: A robots.txt file is used to tell search engines which files and which directories should not be indexed. However, a robots.txt file only works with search engines that looks for the file. Bots looking to gleam e-mail addresses or graphics will usually ignore the robots.txt file and it’s a waste of time to try excluding them.

Note:
Some search engines look at the "robots" Meta tag and if it says FOLLOW, INDEX that is what is done even if the Robots.txt file says otherwise.

Why Use A Robots.txt File?
The use of a robots.txt file can improve your site by not allowing useless or non-informative pages to be indexed by search engines. You should exclude any pages that could harm you site ranking because these page contain very little content. E-mail and feedback forms are not usually content rich and are examples of pages that should be excluded. Even if the links on these content-poor pages don’t dilute your page rankings, it’s still a good idea to exclude them.

Also, if you have a famed site or are using SSI or some other type of include page feature for navigation, these navigation-only pages should also be excluded. While navigation-only pages can be indexed, they provide no real information for your visitors or about your site.

By excluding poor content pages you will create a cleaner search engine listing and reduce the useless, uninformative pages that show up for your site. Not only will search engines appreciate this but knowledgeable web users will also.

How To Exclude Pages:
The easiest way to exclude pages is to place those pages in a separate directory and then exclude the entire directory. As example, place all forms in a forms directory and all navigation-only pages in a navigation or include directory. This method keeps the robots.txt file simple and is much easier than listing 10 or 12 individual pages.

You should also exclude any custom error pages you create for you site, such as a custom 404 page. While these custom pages provide a more profession look than the standard browser or server fed page, they are not useful as part of your search engine listing.

Where Not Use A Robots.txt file:
If you have sensitive information, such as form results, excluding them does not protect them. Use the .htaccess file or a server configuration file to password protect sensitive data directories, or use some other password protection technique. However, before attempting to use either file, check with your host to see what they allow. Also, if you use FrontPage, it has the ability to protect files and directories via its extensions.

A Robots.txt File Example:
To create a robots.txt file, open any text editor, even Notepad will work. Do not use a word processor because any formatting will corrupt the file.

# File comments can go here
# More comments can go here
User-agent: *
Disallow: /includes/
Disallow: /legal/
Disallow: /forms/
Disallow: /404.html

Note: The "*" indicates the file applies to all engines. The # symbol indicates that line is a comment and will be ignored by search engine bots.

In of the above example, three directories and one file are being excluded from all search engine bots. You can exclude all bots, allow all bots, or only allow selected bots - the choice is yours.

If you wish to exclude (disallow) a particular robot spider:

User-agent: grub
Disallow: /

Add the above two line to exclude any other robot but change the name after the User-agent: tag.

Where To Place Your Robots.txt File:
Once you have created your robots.txt file, upload it to the root directory of your site. This is the directory where your site default or index page is located. This directory may be called your "public" directory or some other name, depending on the server type. If you are in doubt, contact your host for more information.

Helpful Links:
http://www.robotstxt.org/wc/exclusion-admin.html
http://www.robotstxt.org/wc/norobots.html
http://www.webmasterworld.com/forum23/2200.htm
http://www.searchengineworld.com/cgi-bin/robotcheck.cgi


 

 

    


 

 

 

 

   

Copyright 2002 - 2008 By Rim Thoughts Site Owner
Site Problems or Suggestions: Contact: Webmaster