You can often come across a recommendation to use canonical for site pages with “duplicate” content, or even put a self-referential canonical tag. But the misleading fact is that this is not a requirement. In this article we will find out what is canonical and why is it better to use it.
What is canonical?
The canonical attribute points to a search engine that the current page is a full or partial copy or another, with a higher priority.
Using this tag prevents duplication of content.
In practice it looks like this:
Why is it necessary?
Many factors can cause duplication of content: pagination, get-parameters, the ability to use both www and non-www site versions, etc.
If you don`t put rel = “canonical” on pages with the similar content, search engines will not understand which one has priority. In this case, SE independently selects only one page that will be indexed or indexes both, but none of them gets a good ranking. Specifying the canonical page prevents such problems.
Some say that if the crawler skans pages with the same content for a long time, it may not reach important, unique pages of the site and rank it worse.
The 301st redirect can also solve the duplication problem. But the advantage of canonical is that the user can see the non-canonical page and interact with it. If redirect is set up, it drives user to the priority page immediately. This is not the best solution, for example, for pagination or sorting pages.
Problems that can be solved using rel=”canonical”
1. Pages with get-parameters
When the filter on the site forms a lot of new pages, it is necessary to indicate the main ones in order not to waste the crawling budget and focus the “attention” of search engines only on the landing pages.
There are several ways of avoiding the problem of duplication for such pages and canonical is one of them.
- UTM tags
UTM tags can also be the reason for generation of almost identical pages with get parameters, and the problem can be solved using the canonical tag.
Sorting can create pages with get parameters, where canonical can be specified.
2. Different domains duplicating
Canonical can also be a solution if you`re moving your site to the new domain. Just set the canonical from old pages to the new ones.
3. AMP and non-AMP versions
This is quite new technology that allows to create “lighter” versions of site pages for faster loading. But even here we are faced with the problem of duplication of content. In this case, you can set canonicals from amp pages to similar pages of the site.
4. Mobile version
If you choose to make a mobile-friendly copy of your website, canonical helps to prevent duplicating on pages like https://m.site.com/article and https://site.com/article.
5. One page in different categories
It happens that the same page needs to be available in several subfolders at the same time. In this case, several pages appear with the same content and different url. For example, the same product is available both in the catalog and on the sale page: site.com/catalog/product and site.com/sale/product. And canonical helps SE to indicate which url has priority.
6. Same content on different sites
If you own several sites that often publish the same content, you can point which one is more important using canonical tags.
7. www, non-www
It is recommended to always determine the main mirror of the site: www or non-www. You can specify the main version using canonical tags, although 301 redirect is a better solution here.
8. index.php, index.html, index.htm, index.aspx, index.asp
Sometimes it happens that the page is available in different extensions. For example, the address https://site.com/index.php returns 200 OK and is available to the user at the same time with the https://site.com. One of the solutions to the problem is using canonical. Although here, as in the previous case, 301 redirect is a more preferable way out of the situation.
Things to remember
- You can use a relative path (<link rel=”canonical” href=”/article-1”>) with the rel=”canonical” href element, but it is strongly recommended to specify an absolute url (<link rel=”canonical” href=”https://site.com/article-1”>).
- Although self referential canonicals are optional, it is recommended to use it so that in controversial situations Google knows which pages you`d like to be indexed.
“I recommend self-referential canonical because it really makes it clear to us which page you want to have indexed, or what the URL should be when it is indexed.
Even if you have one page, sometimes there are different variations of the URL that can pull that page up. For example, with parameters in the end, perhaps with upper lower case or www and non-www. All of these things can be kind of cleaned up with a rel canonical tag.
John Mueller, Webmaster Trends Analyst Google”
3. Only one canonical can be specified for one page.
4. Avoid situations where canonical pages point to each other like this: for page A canonical B is specified, and for B – A.
Ways of implementing
You can specify canonical URLs in this ways:
- HTML tag (rel=canonical)
- HTTP header
For more info on each of them see the official Google documentation.
- Blocking canonicalized page via robots.txt or noindex
- Setting canonical on another language version
- Using hreflang with non-canonical page
- Setting 3xx redirect from canonical page
- No internal links to the canonical page.
- Non-canonical pages in xml sitemaps
- Canonical from http to https and vice versa.
Note: Be careful! If the content on the pages is not the same or there are many links to the non-canonical page, Google may ignore the canonical.