Search Engine Spiders Lost Without Guidance - Post This Sign!

The robots.txt file is an exclusion standard required by allweb crawlers/robots to tell them what files and directoriesthat you want them to stay OUT of on your site. Not allcrawlers/bots follow the exclusion standard and will continuecrawling your site anyway. I like to call them "Bad Bots" ortrespassers. We block them by IP exclusion which is anotherstory entirely.

This is a very simple overview of robots.txt basics forwebmasters. For a complete and thorough lesson, visithttp://www.robotstxt.org/

To see the proper format for a somewhat standard robots.txtfile look directly below. That file should be at the root ofthe domain because that is where the crawlers expect it to be,not in some secondary directory.

Below is the proper format for a robots.txt file ----->

User-agent: *
Disallow: /cgi-bin/
Disallow: /images/
Disallow: /group/

User-agent: msnbot
Crawl-delay: 10

User-agent: Teoma
Crawl-delay: 10

User-agent: Slurp
Crawl-delay: 10

User-agent: aipbot
Disallow: /

User-agent: BecomeBot
Disallow: /

User-agent: psbot
Disallow: /

--------> End of robots.txt file

This tiny text file is saved as a plain text document andALWAYS with the name "robots.txt" in the root of your domain.

A quick review of the listed information from the robots.txtfile above follows. The "User Agent: MSNbot" is from MSN,Slurp is from Yahoo and Teoma is from AskJeeves. The otherslisted are "Bad" bots that crawl very fast and to nobody'sbenefit but their own, so we ask them to stay out entirely.The * asterisk is a wild card that means "All"crawlers/spiders/bots should stay out of that group of filesor directories listed.

The bots given the instruction "Disallow: /" means they shouldstay out entirely and those with "Crawl-delay: 10" are thosethat crawled our site too quickly and caused it to bog downand overuse the server resources. Google crawls more slowlythan the others and doesn't require that instruction, so isnot specifically listed in the above robots.txt file.Crawl-delay instruction is only needed on very large siteswith hundreds or thousands of pages. The wildcard asterisk *applies to all crawlers, bots and spiders, includingGooglebot.

Those we provided that "Crawl-delay: 10" instruction to wererequesting as many as 7 pages every second and so we askedthem to slow down. The number you see is seconds and you canchange it to suit your server capacity, based on theircrawling rate. Ten seconds between page requests is far moreleisurely and stops them from asking for more pages than yourserver can dish up.

(You can discover how fast robots and spiders are crawling bylooking at your raw server logs - which show pages requestedby precise times to within a hundredth of a second - availablefrom your web host or ask your web or IT person. Your serverlogs can be found in the root directory if you have serveraccess, you can usually download compressed server log filesby calendar day right off your server. You'll need a utilitythat can expand compressed files to open and read those plaintext raw server log files.)

To see the contents of any robots.txt file just typerobots.txt after any domain name. If they have that file up,you will see it displayed as a text file in your web browser.Click on the link below to see that file for Amazon.com

http://www.Amazon.com/robots.txt

You can see the contents of any website robots.txt file thatway.

The robots.txt shown above is what we currently use atPublish101 Web Content Distributor, just launched in May of2005. We did an extensive case study and published a series ofarticles on crawler behavior and indexing delays known as theGoogle Sandbox. That Google Sandbox Case Study is highlyinstructive on many levels for webmasters everywhere about theimportance of this often ignored little text file.

One thing we didn't expect to glean from the research involvedin indexing delays (known as the Google Sandbox) was theimportance of robots.txt files to quick and efficient crawlingby the spiders from the major search engines and the number ofheavy crawls from bots that will do no earthly good to thesite owner, yet crawl most sites extensively and heavily,straining servers to the breaking point with requests forpages coming as fast as 7 pages per second.

We discovered in our launch of the new site that Google andYahoo will crawl the site whether or not you use a robots.txtfile, but MSN seems to REQUIRE it before they will begincrawling at all. All of the search engine robots seem torequest the file on a regular basis to verify that it hasn'tchanged.

Then when you DO change it, they will stop crawling for briefperiods and repeatedly ask for that robots.txt file duringthat time without crawling any additional pages. (Perhaps theyhad a list of pages to visit that included the directory orfiles you have instructed them to stay out of and must nowadjust their crawling schedule to eliminate those files fromtheir list.)

Most webmasters instruct the bots to stay out of "image"directories and the "cgi-bin" directory as well as anydirectories containing private or proprietary files intendedonly for users of an intranet or password protected sectionsof your site. Clearly, you should direct the bots to stay outof any private areas that you don't want indexed by the searchengines.

The importance of robots.txt is rarely discussed by averagewebmasters and I've even had some of my client business'webmasters ask me what it is and how to implement it when Itell them how important it is to both site security andefficient crawling by the search engines. This should bestandard knowledge by webmasters at substantial companies, butthis illustrates how little attention is paid to use ofrobots.txt.

The search engine spiders really do want your guidance andthis tiny text file is the best way to provide crawlers andbots a clear signpost to warn off trespassers and protectprivate property - and to warmly welcome invited guests, suchas the big three search engines while asking them nicely tostay out of private areas.

Google Sandbox Case Study http://publish101.com/Sandbox2Mike Banks Valentine operates http://Publish101.comFree Web Content Distribution for Article Marketers andProvides content aggregation, press release optimizationand custom web content for Search Engine Positioninghttp://www.seoptimism.com/SEO_Contact.htm

RELATED ARTICLES

Six Reasons Why Your Alexa Rating Is Still Important
1. Additional Exposure For Your Site.

Get More Cosmetic Surgery Patients From The Web
It's no surprise that dominant cosmetic surgery practices also have a dominant web site and presence. More than ever, cosmetic surgery patients utilize the Internet to help select a credible surgeon. If you want more quality leads and patients, it's time to upgrade your web site marketing efforts.

SEO = Search Engine Optimization, tips on successful page ranking
One of the key things to remember when developing your web-site presence is to always evaluate your competition. See what's working for them; how they market their products and services, and even evaluate their KEYWORD and DISCRIPTION tags.

Search Engines Are Going to Love You for This
What's the most potent free traffic source on the web?

SEO Trade Secrets - 8 Great Tools for Search Engine Optimization
About 80% of website traffic comes through search engines. And research shows, if you're not on the first 2 pages, most people won't find you.

How to Make More Money with Your Mambo 4.5.1 Site
{mosgoogle} and it will surely show up. Well, that's it. Now you too can make more money with your Mambo site!

SEO #5: Analyzing the Top Ranked Website on Google
Yesterday you should have read the forth course out of 6 courses that will help you get a TOP rank in the search engines and get EXPLOSIVE LASER TARGETED TRAFFIC for Free. Today we move on to course #5 and study analyzing the Top Ranked Website on Google. Please read today's course very carefully and take some time to test what I'm about to tell you on your own webpage. Alright let's start!

How To Get Listed in Yahoo!
Getting listed in Yahoo! (The Very Top Search Engine/Index) is a lot like getting a date with the most beautiful girl in high school - often all you have to do is ask PROPERLY. I can make that statement because I have NEVER submitted a site to Yahoo! that wasn't listed. I've read a lot about how difficult it is getting listed, and how frustrating it is - so maybe I am just very lucky. Then again, if you follow the same steps that have ALWAYS worked for me, there is no reason to believe that they won't work for you too. So I wanted to share with you in four short steps how to get listed.

Does The Number Of Links On A Page Affect Ranking?
Lots of research has focused on inbound links to a site, but little has focused on the number of links actually on a page (outbound or to other parts of a site). Many SEO gurus have recently been talking about something they call "PR Leak" which seems to be a theory that the more outbound links you have, the more your page rank on Google "leaks" away. That concept isn't found in the academic papers published by the founders of Google, but does seem to be accepted by a majority of SEOs. I decided it was time to take a look at the number of links present on a page and how that number correlates with ranking.

Keywords Finalization Methodology
To arrive at the set of keywords that:

Improve Search Engine Rankings - The Real Deal!
Ok, here's the deal, follow these steps and shoot me if your rankings doesn't improve. I know that there's been so many articles on how to improve your search engine rankings but most of them are either incomplete or untrue. So I've put up a list of what works best to improve your rankings and I'm telling you now this works but it's no walk in the park.

SEO Made Easy!
Search engine optimization remains a minefield of old advice, outdated ideas and outright dangerous techniques that can get you banned. Here's the main points to great search engine optimization.

60 Day Sandbox for Google & AskJeeves; MSN Indexes Quickest, Yahoo Next
Search engine listing delays have come to be called the Google Sandboxeffect are actually true in practice at each of four top tiersearch engines in one form or another. MSN, it seems has theshortest indexing delay at 30 days. This article is thesecond in a series following the spiders through a brand newweb site beginning on May 11, 2005 when the site was firstmade live on that day under a newly purchased domain name.

How Do I Improve My Web Site Conversion Rate? Part 2
Question 1

Google Page Rank Is Dead - Or Is It? - Part I
For a long time now, marketing gurus all over the world have been talking about google page ranking. Page ranking is simply Google's way of measuring your pages accordingly.

Good Things Come to Those Who Wait (and Other Analogies and Cliché?s for SEO)
We've all heard that familiar expression, "Good things come to those who wait". Whether you're waiting for your Heinz ketchup to pour out onto your burger (remember those commercials?), waiting for Christmas day to open your gifts, waiting for summer vacation to be let out of school, or waiting in line at the DMV? well, maybe not the DMV, good things will come if you simply allow them to come in their own time.

Ranked #1 at Google for Invisible Entrepreneurs But No Traffic?
I am ranked #1 for that silly phrase at Google. So What?

Onpage Optimization: Essential for Effective Offpage Optimization
Onpage optimization is the process by which various elements on an individual web page are structured so that the web page can be found by the search engines for specific keyword(s) or keyword phrases.

Alert Marketing - Get Google Search Results By E-mail
Sometimes our jobs as marketers means we need to look beyond the obvious. Google, for example, offers their Google Alerts service. At first glance, this may seem most useful to companies who want to track certain searches within their industry, or to hobbyists who want to stay on top of changes in their topics of interest.

9 Steps to Getting Better Search Engine Rankings
You finally have a website and you are ready to sit back and let the visitors start rolling in. How does that saying go? "If you build it they will come." This may be true in Hollywood but certainly not true in the case of your website. You need to constantly work at improving your site to ensure that your website achieves a decent search engine ranking. This is a process that doesn't happen over night but with some concentrated effort you can get better search engine rankings. Here are some tips on getting to the top of the search engines.

home | site map | www.1001topwords.com