Search Engine Robots - How They Work, What They Do (Part I)

Automated search engine robots, sometimes called "spiders" or "crawlers", are the seekers of web pages. How do they work? What is it they really do? Why are they important?

You'd think with all the fuss about indexing web pages to add to search engine databases, that robots would be great and powerful beings. Wrong. Search engine robots have only basic functionality like that of early browsers in terms of what they can understand in a web page. Like early browsers, robots just can't do certain things. Robots don't understand frames, Flash movies, images or JavaScript. They can't enter password protected areas and they can't click all those buttons you have on your website. They can be stopped cold while indexing a dynamically generated URL and slowed to a stop with JavaScript navigation. How Do Search Engine Robots Work?

Think of search engine robots as automated data retrieval programs, traveling the web to find information and links.

When you submit a web page to a search engine at the "Submit a URL" page, the new URL is added to the robot's queue of websites to visit on its next foray out onto the web. Even if you don't directly submit a page, many robots will find your site because of links from other sites that point back to yours. This is one of the reasons why it is important to build your link popularity and to get links from other topical sites back to yours.

When arriving at your website, the automated robots first check to see if you have a robots.txt file. This file is used to tell robots which areas of your site are off-limits to them. Typically these may be directories containing only binaries or other files the robot doesn't need to concern itself with.

Robots collect links from each page they visit, and later follow those links through to other pages. In this way, they essentially follow the links from one page to another. The entire World Wide Web is made up of links, the original idea being that you could follow links from one place to another. This is how robots get around.

The "smarts" about indexing pages online comes from the search engine engineers, who devise the methods used to evaluate the information the search engine robots retrieve. When introduced into the search engine database, the information is available for searchers querying the search engine. When a search engine user enters their query into the search engine, there are a number of quick calculations done to make sure that the search engine presents just the right set of results to give their visitor the most relevant response to their query.

You can see which pages on your site the search engine robots have visited by looking at your server logs or the results from your log statistics program. Identifying the robots will show you when they visited your website, which pages they visited and how often they visit. Some robots are readily identifiable by their user agent names, like Google's "Googlebot"; others are bit more obscure, like Inktomi's "Slurp". Still other robots may be listed in your logs that you cannot readily identify; some of them may even appear to be human-powered browsers.

Along with identifying individual robots and counting the number of their visits, the statistics can also show you aggressive bandwidth-grabbing robots or robots you may not want visiting your website. In the resources section of the end of this article, you will find sites that list names and IP addresses of search engine robots to help you identify them. How Do They Read The Pages On Your Website?

When the search engine robot visits your page, it looks at the visible text on the page, the content of the various tags in your page's source code (title tag, meta tags, etc.), and the hyperlinks on your page. From the words and the links that the robot finds, the search engine decides what your page is about. There are many factors used to figure out what "matters" and each search engine has its own algorithm in order to evaluate and process the information. Depending on how the robot is set up through the search engine, the information is indexed and then delivered to the search engine's database.

The information delivered to the databases then becomes part of the search engine and directory ranking process. When the search engine visitor submits their query, the search engine digs through its database to give the final listing that is displayed on the results page.

The search engine databases update at varying times. Once you are in the search engine databases, the robots keep visiting you periodically, to pick up any changes to your pages, and to make sure they have the latest info. The number of times you are visited depends on how the search engine sets up its visits, which can vary per search engine.

Sometimes visiting robots are unable to access the website they are visiting. If your site is down, or you are experiencing huge amounts of traffic, the robot may not be able to access your site. When this happens, the website may not be re-indexed, depending on the frequency of the robot visits to your website. In most cases, robots that cannot access your pages will try again later, hoping that your site will be accessible then.

Resources

*SpiderSpotting - Search Engine Watch http://searchenginewatch.com/webmasters/spiders.html

*Robotstxt.org List of robots and protocols for setting up a robots.txt file. http://www.robotstxt.org/

*Spider-Food Tutorials, forums and articles about Search Engine spiders and Search Engine Marketing. http://spider-food.net/

*Spiderhunter.com Articles and resources about tracking Search Engine spiders. http://www.spiderhunter.com/

*Sim Spider Search Engine Robot Simulator Search Engine World has a spider that simulates what the Search Engine robots read from your website. http://www.searchengineworld.com/cgi-bin/sim_spider.cgi

Daria Goetsch is the founder and Search Engine Marketing Consultant for Search Innovation Marketing, a Search Engine Optimization company serving small businesses. She has specialized in Search Engine Promotion since 1998, including three years as the Search Engine Specialist for O'Reilly Media, Inc., a technical book publishing company.

Copyright © 2002-2005 Search Innovation Marketing. http://www.searchinnovation.com All Rights Reserved.

Permission to reprint this article is granted if the article is reproduced in its entirety, without editing, including the bio information. Please include a hyperlink to http://www.searchinnovation.com when using this article in newsletters or online.

In The News:

This RSS feed URL is deprecated, please update. New URLs can be found in the footers at https://news.google.com/news

Business.com

How Artificial Intelligence Is Changing SEO
Business.com
AI is a rapidly evolving technology that will make SEO tools and techniques even more informative and useful for businesses in the near future. Artificial intelligence and digital marketing are quickly becoming intertwined systems. AI development is ...
3 advanced Google SEO strategies for financial marketersCUinsight.com (press release)
Visual Search: What It Is and How It Is Changing the SEO Industry? [Infographic]Business 2 Community

all 6 news articles »

Search Engine Journal

How Reliable is Google's Web.dev SEO Score?
Search Engine Journal
By making sure search engines can find and automatically understand your content, you are improving the visibility of your site for relevant searches. This is called SEO, or search engine optimization, which can result in more interested users coming ...

and more »

ThomasNet News (blog)

What You Need To Know About Google's New SEO Scoring Tool
ThomasNet News (blog)
Cracking the Google SEO algorithm can be a bit daunting, but earlier this week, the search giant launched a new set of tools designed to bring about some clarity. Here's what you need to know about the tools, and how you can incorporate them into your ...


Search Engine Land

5 SEO trends that will matter most in 2019
Search Engine Land
This year, Google's shaken the world with its mobile- and speed-related efforts. As a result, most of next year's SEO efforts are expected in this direction. However, some “non-Google” game-changers will also influence how we build our SEO campaigns.
Google Launches New Tool That Grades A Site's SEO ImplementationSearch Engine Journal
Get Your Website on Google's Front Page with These SEO ToolsXDA Developers (blog)

all 20 news articles »

Search Engine Journal

9 Ways to Deal with Unresponsive SEO Clients
Search Engine Journal
Unresponsive SEO clients are a common problem that most agencies have to deal with at some point. Fortunately, re-establishing contact is often simpler than you might think. A little extra preparation can also help you avoid situations like this in the ...


Search Engine Roundtable

Google Tool Scores Your SEO: Should Google Be Scoring Your SEO?
Search Engine Roundtable
It's too broad to be a useful metric. It would do more harm than good - SEOs would be judged on it incorrectly. E.g. a small site, perfectly "SEO'd" gets score of 100%. SEO is "complete". Create another 50 pieces of great content, score drops, traffic ...


Search Engine Journal

When Does User-First SEO Not Apply?
Search Engine Journal
Editor's note: “Ask an SEO” is a weekly column by technical SEO expert Jenny Halasz. Come up with your hardest SEO question and fill out our form. You might see your answer in the next #AskanSEO post!


Search Engine Journal

What Are Entities & Why They Matter for SEO
Search Engine Journal
Entities are, in my not-so-humble opinion, the single most important concept to understand in SEO right now. Full stop. Think I'm just another SEO professional spouting the latest “silver bullet” that will die on the table along with many before it ...


The Drum

Recovering SEO traffic and rankings after a website redesign
Search Engine Land
When building a new website, retaining and improving your SEO and organic traffic should be a key design goal. This requires a clear understanding of how SEO and website design work together and careful planning for the site migration. If everything is ...
The impact of voice search on local SEOThe Drum
Voice technology is changing SEO: here's what brands can do about itMarketing magazine Australia

all 5 news articles »

Campaign US

SEO jackpot sparks birth of e-commerce agency success
Campaign US
It sparked an SEO explosion and raked in a ridiculous amount of traffic via Google. The article's success was exploited and monetized with affiliate links, which gave the publisher a nice little revenue stream for a while. "Eventually, we came to the ...

Google News

Search Engine Essentials

What are Search Engines?There are numerous different search engines, and... Read More

Ten Steps To A Well Optimized Website - Step Seven ? Website Submissions

Welcome to part seven in this ten-part search engine positioning... Read More

Why Do You Want to Link With A Home Business And Affiliate Website?

No, it's not a general question for all and sundry.... Read More

The Power of Topic Specific Search

What are Topical Search Engines?Simply put, topical search engines are... Read More

Look Out MSN Search, Here Comes Gbrowser

It is official, the search engine wars are in full... Read More

SEO 101

SEO (search engine optimization) is a booming businees these days.... Read More

Cache in the Bank: Understanding Googles Advanced Operators

If you would like to know when your site was... Read More

SEO Expert Guide - Sitewide Optimization (part 4/10)

In parts 1 and 2 you learnt how to develop... Read More

Google News - Just another article announcer?

In Google's recent battle towards becoming an international news center,... Read More

Keyword Demand Isnt Enough

I get half of the world traffic for the term... Read More

Dynamic Pages

Dynamic pages and the Search Engines By Clare Lawrence 10th... Read More

Screwed: Is this an inevitability in the SEO World?

By about 2pm everyday, each of my team members has... Read More

How To Choose Keywords Before they Skyrocket in Popularity

Long before the days of researching phrases with the helpful... Read More

How to Make a Title Tag that Search Engines Will Love

Making the following changes to your title tag can help... Read More

Finding the Right SEO Company

I often talk to people who have lost faith in... Read More

Dealing With Search Engine Stress In A Home-Based Business

As a member of several search engine optimization forums, I... Read More

An Easy Way Not to Get Banned by Google

Strategic search engine optimization involves far more than keyword research,... Read More

Choosing a good domain name isnt always so simple.

So you need a domain name for your brand new... Read More

All About Google

If you read The Search Engine Showdown, you... Read More

4 Tricks For Lightning Fast Indexing

The biggest problem that most are running into seems to... Read More

How To Get Number 1 Spot In Google, Every Time, Guaranteed

Woaah! Wait a minute. Before I exclusively reveal these insider... Read More

Expert Help From Google Answers

Web users turn to search engines for answers to their... Read More

What is Search Engine Optimization?

It is no secret that search engines are the number... Read More

DIY SEO

Part 1. Wordtracker for keywords.A problem for all new webmasters... Read More

Complete Web-Site Optimization For Search Engines (Part 1)

SEO or search engine optimization strategy now becomes widely popular... Read More