Does your website need a robots.txt file? Do you already have one in place with exclusions? For the uninitiated, robots.txt is a handy little file that allows you to communicate with your friendly search engine visitors (sometimes called “spiders” or “crawlers”) and guide them through your website. After all, you want to be a good host, so it’s important to direct crawlers to the pages of your site that you specifically intend for them to see, rather than leave them wandering around, wondering where they’re intended to go.
As we’ve previously stated in our post on SEO myths, you don’t need to have a robots.txt file if you prefer your website to be an entirely open book. If you wish to have every page crawled and indexed on Google, then don’t worry about implementing this file – you simply don’t need it!
However, you may have pages within certain directories that contain no SEO value at all. Moreover, their lack of unique content could even hurt your rankings, so it certainly couldn’t hurt to look into whether or not a robots.txt file is right for you.
Understanding Robots.txt
Think of your website as your home and the search engine crawlers that visit your website as your house guests. Whenever you have company visiting your house, you’re likely to have them stay in rooms designed for leisure, such as your living room or den. On the other hand, you probably wouldn’t show them your messy garage or filthy attic.
These rooms certainly serve a purpose, but they’re just not designed to welcome or accommodate your guests. In a similar fashion, you may have a blog or e-commerce website, loaded with products or posts that can be sorted by tags, dates and other search filters designed to help visitors (of the human variety) find exactly what they’re looking for. However, the multiple URLs that are created by these added directories serve no SEO value and will only result in search engines indexing the same content multiple times.
First, you should check to see if a robots.txt file has already been implemented within a website. To do so, type in the following in your browser address bar:
www.YourWebsite.com/robots.txt
If you receive a 404 error, then there’s no robots.txt file in place. Alternatively, you can use the robots.txt checker we’ve recommended in our previous SEO tools entry.
Creating Exclusions
If your search comes up negative, then it’s time to setup your own robots.txt file. Let’s say you have an e-commerce website, and the many filters and sorting options featured within your shopping cart platform creates multiple URLs for the same product page, each indexed within Google.
First, decide which directories are causing these problems. Before creating exclusions, ensure that these have zero SEO value and are only hurting your website’s rankings by remaining open to search engine crawlers (most likely through the creation of duplicate content).
Once you’ve identified the offending directories, open Notepad or any text editor and create a file called “robots.txt.” Using /tags/ as an example directory, you can tell search engines to keep their crawlers from going where they’re not welcomed:
User-agent: *
Disallow: /tags/
The asterisk denotes all search engines, but you can specify specific ones (e.g. “User-agent: Googlebot”).
Save when you’re done and upload the robots.txt file to your website’s root directory. Now every page that falls within www.YourWebsite.com/tags/ will not be crawled, and if you’ve created this exclusion for all the right reasons, then you may very well see your search engine rankings improve and your duplicate content issue disappear.
*Photo “MSR-H01 Hexapod robot” appears courtesy of Flickr user, masochismtango under the Attribution-ShareAlike 2.0 Generic license.