How to Prevent Duplicate Content with Effective Use of the Robots.txt and Robots Meta Tag

Duplicate content is one of the problems that we regularly come across as part of the search engine optimization services we offer. If the search engines determine your site contains similar content, this may result in penalties and even exclusion from the search engines. Fortunately it's a problem that is easily rectified.

Your primary weapon of choice against duplicate content can be found within "The Robot Exclusion Protocol" which has now been adopted by all the major search engines.

There are two ways to control how the search engine spiders index your site.

1. The Robot Exclusion File or "robots.txt" and

2. The Robots < Meta > Tag

The Robots Exclusion File (Robots.txt)
This is a simple text file that can be created in Notepad. Once created you must upload the file into the root directory of your website e.g. www.yourwebsite.com/robots.txt. Before a search engine spider indexes your website they look for this file which tells them exactly how to index your site's content.

The use of the robots.txt file is most suited to static html sites or for excluding certain files in dynamic sites. If the majority of your site is dynamically created then consider using the Robots Tag.

Creating your robots.txt file

Example 1 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and make the entire site available for indexing. The robots.txt file would look like this:

User-agent: *
Disallow:

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. By leaving the "Disallow" blank all parts of the site are suitable for indexing.

Example 2 Scenario
If you wanted to make the .txt file applicable to all search engine spiders and to stop the spiders from indexing the faq, cgi-bin the images directories and a specific page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: *
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation
The use of the asterisk with the "User-agent" means this robots.txt file applies to all search engine spiders. Preventing access to the directories is achieved by naming them, and the specific page is referenced directly. The named files & directories will now not be indexed by any search engine spiders.

Example 3 Scenario
If you wanted to make the .txt file applicable to the Google spider, googlebot and stop it from indexing the faq, cgi-bin, images directories and a specific html page called faqs.html contained within the root directory, the robots.txt file would look like this:

User-agent: googlebot
Disallow: /faq/
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /faqs.html

Explanation

By naming the particular search spider in the "User-agent" you prevent it from indexing the content you specify. Preventing access to the directories is achieved by simply naming them, and the specific page is referenced directly. The named files & directories will not be indexed by Google.

That's all there is to it!

As mentioned earlier the robots.txt file can be difficult to implement in the case of dynamic sites and in this case it's probably necessary to use a combination of the robots.txt and the robots tag.

The Robots Tag
This alternative way of telling the search engines what to do with site content appears in the section of a web page. A simple example would be as follows;

In this example we are telling all search engines not to index the page or to follow any of the links contained within the page.

In this second example I don't want Google to cache the page, because the site contains time sensitive information. This can be achieved simply by adding the "noarchive" directive.

What could be simpler!

Although there are other ways of preventing duplicate content from appearing in the Search Engines this is the simplest to implement and all websites should operate either a robots.txt file and or a Robot tag combination.

Should you require further information about our search engine marketing or optimization services please visit us at http://www.e-prominence.co.uk ? The search marketing company

In The News:

Can SEO Be Made Predictable?  Search Engine Journal
What Does It Mean to ‘Do SEO’?  Search Engine Journal
Can SEO Have a Negative Effect?  Search Engine Journal
7 Coding Barriers to SEO Success  Practical Ecommerce
SEO Calgary  Search Engine People
What Is Forensic SEO?  Search Engine Journal
My experience with SEO  Practical Ecommerce
5 Local SEO Tips for Startups  Business 2 Community
WeWork Selling Conductor SEO Company  Search Engine Journal
SEO Tricks For Email Marketers 10/17/2019  MediaPost Communications
Seo Joon, so fashionable  BusinessWorld Online
What Is a Google Penalty in SEO?  Search Engine Journal
7 Things SEO Tools Can’t Tell You  Search Engine Journal

How to Get Non-stop Free Traffic to Your Website

Yet the simple truth is that without traffic a website... Read More

Reciprocal Links to Boost Link Popularity ?

Link popularity means the number of incoming links pointing to... Read More

The Search Engine Secret That Is No Secret At All

It's common knowledge - we all know that it is... Read More

The Modern Day Search Engine

The first task most netizens do when they log on... Read More

Opinion ? Search Engine Success

This article is actually the summary to a book soon... Read More

9 Ways to Keep Google Happy

A recent Google patent application has the SEO community buzzing.... Read More

SEO Expert Guide - Search Engines Explained (part 1/10)

Before we explore the world of search engine optimization, it... Read More

Use Search Engines For A Guaranteed Web Site Promotion

For your web site to succeed, you must use is... Read More

Google Bring Deskbar Search To Windows Desktop. Now Any Website Can Take Advantage Of This

Google's premier of desktop search proves that the desktop is... Read More

Youve Got The Power, Why Arent You Promoting It?

Like all things, theft is theft, misery loves company, negative... Read More

Search Engine Optimization: What Is It?

Search Engine Optimization is the creation of a web page,... Read More

The Great Search Engine Experiment Revisited Who is the Coolest Guy in the Universe

A recent Search Engine Experiment Demonstrates how by combining Key... Read More

Search Engine Indexing - 3 Strategies Guaranteed to Skyrocket Your Success

In order to design a website that performs well with... Read More

How Search Engines Work

Before anyone can start optimizing a web site, you must... Read More

RSS Feeds - a Website Owners Friend in Disguise

We've all heard about it-it seems like all the buzz... Read More

The Need of Popularity

In very simple words, the link popularity of your site... Read More

META Tags Explained and How To Use Them For Ranking

The META tags are used to provide extra information about... Read More

Why Top Search Engine Placements Never Move?

#1 question when it comes to web advertising is how... Read More

SEO, the Simplified Version

Lets get things straight. SEO is a very competitive market.... Read More

Duplicate Content Penalty - How to Lose Google Ranking Fast!

Duplicate content penalty. Ever heard of it? This penalty is... Read More

Boost Your Search Engine Ranking And Generate Free Traffic With Reciprocal Links

Reciprocal links are an important step in your overall plan... Read More

Dont Get Banned by Google

There are many Black Hat techniques that people use to... Read More

Organic SEO: Patience For Long Term Ranking Results

When does long term SEO show ranking results? It takes... Read More

Google vs. Yahoo -- How To Rank High On Each One

Google likes incoming links, especially links from high-ranking, on-topic pages... Read More

How To Become an SEO Expert - 5 Secrets That Will Allow You to Outperform 95% of All Webmasters

Becoming an SEO expert, or a search engine optimization expert,... Read More