Playing in Googlebots Sandbox with Slurp, Teoma, & MSNbot - Spiders Display Differing Personalities

There has been endless webmaster speculation and worry about the so-called "Google Sandbox" - the indexing time delay for new domain names - rumored to last for at least 45 days from the date of first "discovery" by Googlebot. This recognized listing delay came to be called the "Google Sandbox effect."

Ruminations on the algorithmic elements of this sandbox time delay have ranged widely since the indexing delay was first noticed in spring of 2004. Some believe it to be an issue of one single element of good search engine optimization such as linking campaigns. Link building has been the focus of most discussion, but others have focused on the possibility of size of a new site or internal linking structure or just specific time delays as most relevant algorithmic elements.

Rather than contribute to this speculation and further muddy the Sandbox, we'll be looking at a case study of a site on a new domain name, established May 11, 2005 and the specific site structure, submissions activity, external and internal linking. We'll see how this plays out in search engine spider activity vs. indexing dates at the top four search engines.

Ready? We'll give dates and crawler action in daily lists and see how this all plays out on this single new site over time.

* May 11, 2005 Basic text on large site posted on newly purchased domain name and going live by days end. Search friendly structure implemented with text linking making full discovery of all content possible by robots. Home page updated with 10 new text content pages added daily. Submitted site at Google's "Add URL" submission page.

* May 12 - 14 - No visits by Slurp, MSNbot, Teoma or Google. (Slurp is Yahoo's spider and Teoma is from Ask Jeeves) Posted link on WebSite101 to new domain at Publish101.com

* May 15 - Googlebot arrives and eagerly crawls 245 pages on new domain after looking for, but not finding the robots.txt file. Oooops! Gotta add that robots.txt file!

* May 16 - Googlebot returns for 5 more pages and stops. Slurp greedily gobbles 1480 pages and 1892 bad links! Those bad links were caused by our email masking meant to keep out bad bots. How ironic slurp likes these.

* May 17 - Slurp finds 1409 more masking links & only 209 new content pages. MSNbot visits for the first time and asks for robots.txt 75 times during the day, but leaves when it finds that file missing! Finally get around to add robots.txt by days end & stop slurp crawling email masking links and let MSNbot know it's safe to come in!

* May 23 - Teoma spider shows up for the first time and crawls 93 pages. Site gets slammed by BecomeBot, a spider that hits a page every 5 to 7 seconds and strains our resources with 2409 rapid fire requests for pages. Added BecomeBot to robots.txt exclusion list to keep 'em out.

* May 24 - MSNbot has stopped showing up for a week since finding the robots.txt file missing. Slurp is showing up every few hours looking at robots.txt and leaving again without crawling anything now that it is excluded from the email masking links. BecomeBot appears to be honoring the robots.txt exclusion but asks for that file 109 times during the day. Teoma crawls 139 more pages.

* May 25 - We realize that we need to re-allocate server resources and database design and this requires changes to URL's, which means all previously crawled pages are now bad links! Implement subdomains and wonder what now? Slurp shows up and finds thousands of new email masking links as the robots.txt was not moved to new directory structures. Spiders are getting errors pages upon new visits. Scampering to put out fires after wide-ranging changes to site, we miss this for a week. Spider action is spotty for 10 days until we fix robots.txt

* June 4 - Teoma returns and crawls 590 pages! No others.

* June 5 - Teoma returns and crawls 1902 pages! No others.

* June 6 - Teoma returns and crawls 290 pages. No others.

* June 7 - Teoma returns and crawls 471 pages. No others.

* June 8-14 Odd spider behavior, looking at robots.txt only.

* June 15 - Slurp gets thirsty, gulps 1396 pages! No others.

* June 16 - Slurp still thirsty, gulps 1379 pages! No others.

So we'll take a break here at the 5 weeks point and take note of the very different behavior of the top crawlers. Googlebot visits once and looks at a substantial number of pages but doesn't return for over a month. Slurp finds bad links and seems addicted to them as it stops crawling good pages until it is told to lay off the bad liquor, er that is links by getting robots.txt to slap slurp to its senses. MSNbot visits looking for that robots.txt and won't crawl any pages until told what NOT to do by the robots.txt file. Teoma just crawls like crazy, takes breaks, then comes back for more.

This behavior may imitate the differing personalities of the software engineers who designed them. Teoma is tenacious and hard working. MSNbot is timid and needs instruction and some reassurance it is doing the right thing, picks up pages slowly and carefully. Slurp has addictive personality and performs erratically on a random schedule. Googlebot takes a good long look and leaves. Who knows whether it will be back and when.

Now let's look at indexing by each engine. As of this writing on July 7, each engine also shows differing indexing behavior as well. Google shows no pages indexed although it crawled 250 pages nearly two months ago. Yahoo has three pages indexed in a clear aging routine that doesn't list any of the nearly 8,000 pages it has crawled to date (not all itemized above.) MSN has 187 pages indexed while crawling fewer pages than any of the others. Ask Jeeves has crawled more pages to date than any search engine, yet has not indexed a single page.

Each of the engines will show the number of pages indexed if you use the query operator "site:publish101.com" without the quotes. MSN 187 pages, Ask none, Yahoo 3 pages, Google none.

The daily activity not listed in the three weeks since June 16 above has not varied dramatically, with Teoma crawling a bit more than other engines, Slurp erratically up and down and MSN slowly gathering 30 to 50 pages daily. Google is absent.

Linking campaign has been minimal with posts to discussion lists, a couple of articles and some blog activity. Looking back over this time it is apparent that a listing delay is actually quite sensible from the view of the search engines. Our site restructuring and bobbled robots.txt implementation seems to have abruptly stalled crawling but the indexing behavior of each engine displays distinctly differing policy by each major player.

The sandbox is apparently not just Google's playground, but it is certainly tiresome after nearly two months. I think I'd like to leave for home, have some lunch and take a nap now.

Back to class before we leave for the day kiddies. What did we learn today? Watch early crawler activity and be certain to implement robots.txt early and adjust often for bad bots. Oh yes, and the sandbox belongs to all search engines.

Mike Banks Valentine is a search engine optimization specialist who operates http://WebSite101.com and will continue reports of case study chronicling search indexing of http://Publish101.com

In The News:

This RSS feed URL is deprecated, please update. New URLs can be found in the footers at https://news.google.com/news

Search Engine Land

5 SEO trends that will matter most in 2019
Search Engine Land
This year, Google's shaken the world with its mobile- and speed-related efforts. As a result, most of next year's SEO efforts are expected in this direction. However, some “non-Google” game-changers will also influence how we build our SEO campaigns.
The Search Future: Plan Your 2019 SEO TacticsValueWalk
Google Launches New Tool That Grades A Site's SEO ImplementationSearch Engine Journal
These are 2018's popular SEO trendsBorn2Invest
XDA Developers (blog) -Technotification -Business 2 Community
all 20 news articles »

Search Engine Journal

When Does User-First SEO Not Apply?
Search Engine Journal
Editor's note: “Ask an SEO” is a weekly column by technical SEO expert Jenny Halasz. Come up with your hardest SEO question and fill out our form. You might see your answer in the next #AskanSEO post!


Search Engine Journal

SEO Contest Exposes Weakness in Google's Algorithm
Search Engine Journal
Search Engine Journal - SEO, Search Marketing News and Tutorials · Rss · Twitter · Facebook. Follow Us. LinkedIn · YouTube · Instagram · Reddit · Google Plus · Pinterest. Follow Us. Rss · Twitter · Facebook · Instagram · Youtube · google plus. SEO. All ...


WFXG

New SEO scam targets small businesses
WFXG
AUGUSTA, GA (WFXG) - Looking out for you and your money; Gigi Turner with the Better Business Bureau is in the FOX 54 studio to tell you about an old scam that has picked up a new twist. It's a warning to small businesses, a new target of scammers and ...


Search Engine Land

Solving complex SEO problems require a new 'discovery' approach
Search Engine Land
At SMX East I attended the “Solving Complex SEO Problems When Standard Fixes Don't Apply” session with presenters Hannah Thorpe, Head of SEO Strategy at Found, and Arsen Rabinovitch, Founder and CEO at TopHatRank.com. Here are key learnings ...


Forbes

Four E-Commerce SEO Trends To Prepare For In 2019
Forbes
Meanwhile, e-commerce businesses that already have SEO marketing strategies in place need to also adapt to these changes if they want to remain relevant and dominate their respective markets. Make one SEO mistake, and you could set your business ...
SEO Friendly Pagination: A Complete Best Practices GuideSearch Engine Journal

all 3 news articles »

Inverse

Conquer Google With The Ultimate SEO Tool
Inverse
Did you know that the top three spots on Google receive nearly 60% of all search engine traffic for any given keyword? That's the magic of SEO, baby, and you're falling behind if you're not learning the ins and outs of this vital tool.


Search Engine Land

SearchCap: Google seller ratings, complex SEO & local reviews
Search Engine Land
Solving complex SEO problems require a new 'discovery' approach. Nov 9, 2018 by Eric Enge. The SMX presentation with Hannah Thorpe and Arsen Rabinovitch reviewed Google's latest updates along with diagnostics and tools to get your site back on track ...


Search Engine Journal

7 Key Considerations When Developing Your In-House SEO Structure
Search Engine Journal
In-house SEO is a different beast than agency SEO. With in-house SEO, you're typically in charge of one website, rather than 50 websites at once. Your focus shifts toward a long-term, holistic approach to website SEO rather than the mechanical approach ...


Entrepreneur

The Sneaky Way SEO Spam Is Costing You Business -- And How to Stop It
Entrepreneur
Small businesses are losing business and potentially harming their reputations because of SEO spam cyberattacks. These attacks are known as search engine poisoning (SEP) attacks. As the name implies, the desired outcome (i.e., action on objective) of ...
Search Buzz Video Recap: Google Spooky Search Updates, Search Console Domain Property, Lazy Loading SEO ...Search Engine Roundtable
Digital Marketing Agencies and SEO Providers Boost Business and Lead Generation with New Software Powered by ...MarTech Series
Seven bad SEO tactics that would cost your rankingElite Business Magazine

all 13 news articles »
Google News

Increase Web Site Sales with a SEO Proposal - Part 2

Part I of this article discussed some of the points... Read More

Keep Your Web Site Content Relevant

Visitors and search engines love content-rich web sites, but just... Read More

Search Engine Algorithm Quandaries

Before you make drastic changes to your website after a... Read More

Search Engine Optimization (SEO) - Fix Your OnPage!

Search Engine Optimization (SEO) is something you should be aware... Read More

Keyword Targeting Strategy In Your Site

Once the keywords have been decided for the site one... Read More

Youve Got The Power, Why Arent You Promoting It?

Like all things, theft is theft, misery loves company, negative... Read More

Search Engine Optimization Tips For 2005 - Part One

Anybody who has their own website or is involved in... Read More

Creating A Search Engine Copywriting Plan

Search engine copywriting has become an extremely important part of... Read More

The Two Most Important Things You Must Do For Google Top Ranking

Attaining a top ranking in Goggle or any other major... Read More

5 Things to Keep an Eye on in the SEO World in 2005...

After the latest PR update at Google and MSN's beta... Read More

Got Spiders?

Many internet marketers blow mountains of start-up cash on their... Read More

Non-Reciprocal Link Building For Higher Search Engine Positioning

Non-Reciprocal Link Building For Higher Search Engine Positioning By Dave... Read More

How to Pick an SEO Firm

If you're looking for an SEO firm, we recommend that... Read More

Driving Your Website through Google Sandbox

What is Google Sandbox?Google Sandbox is applied on new websites,... Read More

How To Really SEO Your Site

Search engine optimization is one of most popular online marketing... Read More

How Google Indexes Content From Your Web Directory

In a fluke, I was able to notice something about... Read More

Pay-per-click ? the Ultimate Tool to Boost Affiliate Sales

The old ways are not always the best ways.The traditional... Read More

Companies Cash In on Your Search Engine Ignorance

This article will cause many companies to stir, but it's... Read More

4 Tricks For Lightning Fast Indexing

The biggest problem that most are running into seems to... Read More

Google Page Rank Is Dead - Part III

HELP! My PR page rank is grey, call the development... Read More

Creating Search Engine Friendly Web Sites

With tons of competition and copycats online, you need a... Read More

Do Not Drop Your Web Site Off the Search Engine Cliff

If you've been feeling like Tom Cruise climbing up the... Read More

Speed Indexing - 3 Steps to Getting Your Website Listed in Google Quickly

Getting your website listed in Google quickly simply requires that... Read More

Why Pay-Per-Inclusion Search Engines are Dying

A Pay-Per-Inclusion search engine is a service in which a... Read More

Google Zombies Need To Wake Up

Over the last couple of weeks, I've received more e-mails... Read More