How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag

Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified.

Your primary weapon of choice against duplicate content can be found within "The Robot Exclusion Protocol" which has now been adopted by all the major search engines.

There are two ways to control how the search engine spiders index your site.

1. The Robot Exclusion File or "robots.txt" and

2. The Robots < Meta > Tag

The Robots Exclusion File (Robots.txt)
This is a simple text file that can be created in Notepad. Once created you must upload the file into the root directory of your website e.g. www.yourwebsite.com/robots.txt. Before a search engine spider indexes your website they look for this file which tells them exactly how to index your site's content.

The use of the robots.txt file is most suited to static html sites or for excluding certain files in dynamic sites. If the majority of your site is dynamically created then consider using the Robots Tag.

Creating your robots.txt file

Example 1 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and make the entire site available for indexing. The robots.txt file would look like this:

User-agent: *
Disallow:

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. By leaving the "Disallow" blank all parts of the site are suitable for indexing.

Example 2 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and to stop the spiders from indexing the faq, cgi-bin the images directories and a specific page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: *
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. Preventing access to the directories is achieved by naming them, and the specific page is referenced directly. The named files & directories will now not be indexed by any search engine spiders.

Example 3 Scenario
If you wanted to make the .txt file applicable to the Google spider, googlebot and stop it from indexing the faq, cgi-bin, images directories and a specific html page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: googlebot
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation

By naming the particular search spider in the "User-agent" you prevent it from indexing the content you specify. Preventing access to the directories is achieved by simply naming them, and the specific page is referenced directly. The named files & directories will not be indexed by Google.

That's all there is to it!

As mentioned earlier the robots.txt file can be difficult to implement in the case of dynamic sites and in this case it's probably necessary to use a combination of the robots.txt and the robots tag.

The Robots Tag
This alternative way of telling the search engines what to do with site content appears in the section of a web page. A simple example would be as follows;

In this example we are telling all search engines not to index the page or to follow any of the links contained within the page.

In this second example I don't want Google to cache the page, because the site contains time sensitive information. This can be achieved simply by adding the "noarchive" directive.

What could be simpler!

Although there are other ways of preventing duplicate content from appearing in the Search Engines this is the simplest to implement and all websites should operate either a robots.txt file and or a Robot tag combination.

Should you require further information about our search engine marketing or optimization services please visit us at http://www.e-prominence.co.uk ? The search marketing company

In The News:

SEO Tools Aren't Enough for Success  Search Engine Journal
How to Do SEO for Niche Markets  Search Engine Journal
SEO in 2020: Going Beyond Google  Search Engine Journal
5 Ways SEO & Web Design Go Together  Search Engine Journal
Can SEO Be Made Predictable?  Search Engine Journal
Google's Advice on How to Hire an SEO  Search Engine Journal
Moving a Company to an SEO Focus  Search Engine Journal
What Does It Mean to 'Do SEO'?  Search Engine Journal
A Definitive Guide to Mobile SEO  Search Engine Journal

SEO Expert Explains on how to Restore a Website and Remove Sandbox Effect from Your Website

The sandbox effect or (site getting banned on google) has... Read More

How to Succeed with the Search Engines

The Cold Hard Facts?..One of the most important factors in... Read More

Advanced Uses for the Google Algorithm

Previously...In our article on Understanding Google's Algorithm, a... Read More

Website Copy-writing for Higher Sales and Higher Search Engine Ranking

Why, you ask? Mainly, because search engines want to provide... Read More

SEO Hints and Tips and Free SEO Tools

Do you realize that if you manage your website, SEO... Read More

SEO -- Tips to Optimize Your Webpage to Compete for a High Ranking

Search engine optimization (SEO) is very important to websites. If... Read More

Page Rank - A Quick Overview for Beginners

Page Rank (PR) is a specific value for a website... Read More

Duplicate the Exact Steps Used to Get a Number 1 Yahoo Ranking in Less than 30 Days

If you have ever been into a McDonalds you will... Read More

Do-It-Yourself Keyword Optimization

The first step in a search engine optimization campaign is... Read More

Learn about the Google Search Engine Tools

Think you know everything about searching with Google? Think again.... Read More

The Changing Face of Search Engine Optimization

With the ever evolving internet market for just about anything... Read More

How I Suddenly Stumbled Onto No.1 In Google and Yahoo Without Search Engine Optimization - So What?

Sometimes, the search engines act really strange. And this story... Read More

Search Engine Optimization: Site Structure and Popularity

In the Global Internet era the industry presence is undoubtedly... Read More

Goofy Mistakes that Hurt Your Search Engine Rankings

One thing is for sure, you don't want to spend... Read More

How to Get One Way Backlinks

Don't be fooled into believing that all backlinks are created... Read More

Link Popularity: Why Its The Best Investment You Can Do For Your Business

More and more search engines rank your web pages based... Read More

5 Things to Keep an Eye on in the SEO World in 2005...

After the latest PR update at Google and MSN's beta... Read More

Keyword Demand Isnt Enough

I get half of the world traffic for the term... Read More

Taking Advantage of Googles Sandbox Effect

Most new sites submitted to Google (at least within the... Read More

Search Engine Optimization and Web Site Usability

Build a Web site and the people will come.Ha! If... Read More

Search Engine Spam

Running an online business relies to a greater or lesser... Read More

Increase Your Search Engine Ranking

There are methods to increase your search engine rankings which... Read More

The Google Strategy

Webmasters across the Internet were totally floored by what happened... Read More

Search Engine Optimization: Who Do You Trust?

Internet search engines exist to organize the seemingly immeasurable amount... Read More

The SEO Game - Do You Play It?

Most people do not think of SEO as a game.... Read More