Rel Canonical: Why do you have a website? Generally, it’s because you want to share something with others. The nature of what you share is your business — it can be information, like in the case of a blog, or a product. If you have something you want to share with the world, you build a website.
Usually, the more people who visit the website, the bigger the owner smiles (an exception would be if your website was hacked and now hosts a bunch of embarrassing photos of you and the infamous pot plant.)
This all means that you want Google, Bing, Yahoo, DuckDuckGo and the likes to find you, so in turn, you search engine users to find you as well. They must find you easily and accurately. But, like in real life, you do not want everyone to be able to see everything, pot plant notwithstanding. Some web pages are simply not meant for the public eye, and if you don’t notify the search engines accordingly, they will be indexed for everyone to see.
There are different ways to tell the search engines that all pages are not equal. Two of the options used most often used methods would be rel canonical and “disallow” in the robots.txt file.
What is rel canonical?
This HTML tag is used to talk to the search engine bots, informing them which page, or variation of content, should be considered the original and which should be seen as copies or duplicates of the original.
The primary use of the rel = canonical tag is to help search engines to keep their search indexes accurate and up to date. This means that how you use rel = canonical will not directly influence the user experience of your website.
When should rel canonical be used?
If there are two of more pages that are identical or extremely similar it is best practice to always use the rel-tag.
Borrowing an example from Google Webmaster Central, we see that the same page can often be accessed through different URLs. From the Google example we get the following URLs that all point to the same page:
Notice how each of these URLs are variations of the original URL with a few dynamic parameters thrown on at the end after the “&” character. If each of these URLs return the same content, you’re going to want to take care of that with rel canonical.
In this case, on each page you would include the following code in the head section: <link rel=”canonical” href=”http://www.example.com/product.php?item=swedish-fish” />
The search engine bots will now take your hint that this page should be seen as the primary page of the lot to be returned in searches. Without this hint, Google may index each of these pages, and the duplicate content could possibly count against you.
How does disallow work?
Disallow is not the same as canonical. It is not nearly the same, yet they do get confused, probably because both are used to address bots.
The disallow command/instruction or notice tells bots which directories and/or pages are off limits. Not that this is a lock on a gate that keeps them out; it is merely a “No Entry” sign.
You would not want Google to index a directory with purely administrative tools or a temporary folder. Or perhaps you are busy adding a new page or sot of pages to your website, and you do not want them indexed while they are still under development.
Take a few minutes to search your site in Google, and you may be surprised by what you find. Here’s how you do that:
Enter the above code in Google will return every page that Google has indexed on your website. If you feel that there are web pages that are not meant for human visitors or do not add value to their experience, you may considering disallowing these via robots.txt.
Using the disallow command enables you to make them off limits to the bots.
(Note: Malicious bots do not respect the robots.txt file or its instructions.)
All you need to do is add a line like this in your robots.txt file:
Google will no longer count the pages or directories listed above toward or against you.
Which one then?
If you read carefully, then you will know that the canonical is to enable the search engines to index your website more accurately and the other prevents the search engines from indexing your page or website.
Or you can say rel canonical is similar to referring someone to the most up to date telephone book (do those, by the way, still exist?), whereas disallow is effectively opting for an unlisted number.
The bottom line is this: If you don’t think a page’s content should be seen by the public, then disallow it. If there are multiple URLs for the same content, then slap a rel = canonical code in each of these pages, each pointing to the same core URL.