The Search Engine Yearbook: A 10,000-foot view of the entire search engine worldSearch Engine Yearbook Globe
Search Engine Yearbook HomepageHave a question? Ask it here!Order info for the current version of SEYDisclaimerContact Us

THE CURRENT VERSION IS SEY 2003


Robots.txt and the Robots Meta Tag
by André le Roux
Dec. 2001

NOTE: This page is NOT maintained. For an updated discussion on the Robots.txt file and the Robots Meta Tag, please refer to the current version of the Search Engine Yearbook.

The robots.txt file

What does the "robots" text file do?

Most sites contain pages that should not be indexed by the search engines. Administrative pages, for example, Pandecta Magazine's "contact" page: "contact.html". There's no need to have it indexed, so we use the robots.txt file to tell the search engine spider (robot) to ignore it.

Very important:
The robots.txt file must be in your root directory.
Like this: www.pandecta.com/robots.txt
Not like this: www.pandecta.com/admin/robots.txt

The syntax of the robots.txt file

User-agent: *
Disallow: /images/
Disallow: /contact.html
Disallow: /privacy/privacy.html

The first line specifies which robots should ignore /images/, /contact.html and /privacy/privacy.html. The asterisk * is a wildcard - so all robots should ignore the directories and files listed below it. If I only wanted Googlebot to ignore those directories & files, I'd type "User-agent: Googlebot".

The second line refers to an entire directory. Nothing in that directory will be indexed.

The third line refers to a specific page in the root directory - in this case the contact.html file.

The fourth line refers to a specific file in a specific directory.

The robots meta tag

The Robots META tag does exactly the same thing as the robots.txt file - but it is not as reliable. Not all robots honor the robots meta tag.

Use it if your site is in a subdirectory like www.freewebspace.com/users/mycoolhomepage/ and you can't get the server administrator to add (or add changes to) a robots.txt file.

If you have access to your root directory, forget about the robots meta tag. Use the robots.txt file. No need to have both.

The syntax of the robots meta tag is:

<META NAME="ROBOTS" CONTENT="NOINDEX, NOFOLLOW">

Type that between the <HEAD> and </HEAD> tags on each page you do not want to be indexed.

More robots.txt resources

robots.txt Syntax Checker
http://www.tardis.ed.ac.uk/~sxw/robots/check/

Robot Names
Robot Names by the Search Engine Dictionary

CNN's robots.txt file:
http://www.cnn.com/robots.txt

Search Engine Book: Search Engine Year BookThis page is based on information contained in the Search Engine Yearbook 2003. For more detailed search engine information & help, please refer to the current version of the book.

EnginePaper Newsletter for Search Engine NewsStay up to date on changes in the search engine world with the EnginePaper Newsletter. It goes out only when something important changes in the search engine world. Subscribe now with a blank email to send-ep-subscribe@topica.com . It's 100% free and safe. View our full privacy policy here.


BUY THE BOOK
MORE INFORMATION ABOUT THE BOOK (INCLUDING THE FULL TOC)
DOWNLOAD THE FREE (LIMITED) VERSION

Mini Sitemap:

ABOUT SEARCH ENGINES

THE MAJOR SEARCH ENGINES

SEARCH ENGINE OPTIMIZATION

How Search Engines Work
Stats: Relative Database Sizes
Search Engine Statistics
Search Engine Relationships
Search Engine News
Meta Search Engines
Robots & Spiders
Robots.txt And The Robots Meta Tag
Internet Searching
Internet Search Strategies
Articles On Internet Searching
Tutorials On Internet Searching
Pandecta Search Engine Awards
Top Search Engine Resources

 

Google
AltaVista
Yahoo
DMOZ (ODP)
Excite
AlltheWeb
Teoma
Direct Hit
Wisenut
Topical Search Engines
Other Interesting Search Engines
Phone Number Directories / Search Engines
The Big Search Engine List


Overview: Search Engine Industry
Overview: Web Marketing
Content Is (Still) King
Keyword Targeting
Link Popularity
Invisible Text
Cloaking
Resubmission
Domain Names In SEO
Cross Linking
Getting Doorway Pages Right
Updated Thinking On Meta Tags
Submission Software
Why Can't I Get My Site Listed?
If You Can't Beat'em, Delete'em
Pay Per Click Marketing
Other Ways To Promote Your Site
Search & SEO Newsletters & Ezines
SEO Articles
SEO Training: SEO Tutorials
About SEO Companies
SEO Resources
SEO Tools

HOME -- DISCLAIMER -- CONTACT
© Copyright 2003, Pandecta Magazine. All Rights Reserved.
The information published on this site may not be reproduced without prior written permission from Pandecta Magazine.
Pandecta Magazine is not affiliated to any of the search engines or companies mentioned on this site. "Pandecta", "Search Engine
Yearbook" and "EnginePaper" are trademarks of Pandecta Magazine. All other trademarks are the property of their respective owners.
Use of this site constitutes acceptance of the
disclaimer.

SEY logo design created by Biz-logo.com

 

God 1st