Mastering Canonical Issue: Fix Content Duplicate Problems

What is Canonical Tag

The link rel=canonical tag is helpful in telling the search engine that, “Hey, the page you’re crawling has a preferred version which is mentioned as the canonical link, it is the original copy of this page” It is useful in preventing the search engine from pass the credit to duplicate page and passing all the credits to preferred version of the page and making the preferred version of the page to appear in search results rather than the duplicate page.

The canonical tag can be either self-referential, referring to another page in the same domain or referring to another page in the other domain, where it is called a cross domain canonical tag.

There were various scenarios where multiple URLs has 100% same content which is consider as duplicate content such as:

  • Pages with multiple URLs pointing to the same content (e.g., www and non-www versions, http and https version, trailing and non-trailing slash versions, etc)
  • Parameterised URLs such as session IDs, tracking parameters in the URL
  • Same content syndicated across different websites
  • Print only version of URL , etc.

In all the duplicate content scenarios adding a canonical tag is not the only solution. There are several more ways.

Handling Duplicate Content other methods:

  1. Using 301 redirects – It is a best method to handle the duplicate pages where you can take the source link equity almost to the destination link.
  2. Using Meta Robots tag noindex, nofollow– Marking robots tag noindex a page can be accessible from the user end but search engines won’t pass link equity and prevent the page occurring from the search results.
  3. Deleting/preventing duplicate pages – Deal with the CMS settings to prevent auto generated duplicate pages similar to tags page creation and other simple solution is deleting the content and setting the HTTP status 410 content permanently removed
  4. Using Robots.txt – Cannot fully control duplicate content by using robots.txt but it is helpful in controlling the bots crawling query parameter URLs

Using canonical tags is one among the above mentioned methods to prevent duplicate content. Each scenario needs different approaches. Seek professional SEO consultant for providing the right solution for you.

Also, Google may not follow the canonical tag mentioned on the page 100% accurate always. If the page has partial duplicate content, adding a canonical to the page referring to the preferred version may not work in this scenario. As the page has partially unique content, it may result in Google indexing the page and appearing in search results. 

Canonical URL external signals
External signals which influence Google to choose canonical URL

Google takes some more external signals such as preferred version links in the sitemap, redirected links to the preferred version, internal links referring to the preferred version to decide the preferred version. Make sure every signal is referring to the preferred version to avoid Google itself picking the duplicate version.

More about Canonical tag errors flagged in Google search console:

When Google chosen different canonical than the user selected one, the following error may flagged in Google search console.

“Alternate page with proper canonical tag”

“Duplicate, Google chose different canonical than user” 

“Duplicate, Submitted URL Not Selected as Canonical”

“Duplicate Without User-Selected Canonical”

In all of the above scenarios, it is good to inspect the URL and understand what is the Google Selected URL, in most cases the Google selected canonical could be the actual original/preferred version. However there are chances that Google may choose duplicate or non-preferred URLs that you don’t want to appear in the index. In this case you need to replace the canonical tag and ensure other signals internal links, sitemaps are linking the preferred URLs. There is no one common solution, each scenario is different. Checking the Google selected canonical link will help to understand how Google is treating the duplicate pages on the website

Let’s dig deeper to understand whatever errors may arise while auditing websites related to canonical tags. Some of them are listed below:

  • Canonicalized URL is noindex, nofollow
  • Canonical URL is relative
  • Canonical points to a noindex URL
  • Canonical points to a URL that is Error (5XX)
  • Canonical points to a URL that is Not Found 404
  • Mismatch canonical tag between HTML and rendered HTML page

How to resolve Canonical issue?

By considering the below simple points will be helpful in fixing all the canonical tag related errors summarized from Google Canonical tag guidelines.

1. rel=canonical link should be in head section

2. Canonical link should be an absolute URL

3. Canonical link should not be a nofollow or noindex

4. Canonical link should not be redirected (either 3XX, 4XX, 5XX)

5. The rendered HTML page should have same canonical tag as non-rendered version

6. Use self-referential canonical tag on paginated URLs

7. Canonical link can also be used in HTTP header of non-HTML file such as PDF file URL

How to check canonical tag

The canonical tag can be verified either directly by looking into the source code of the website head section searching for “canonical”. Otherwise there are several SEO chrome extensions which tell if the URL is canonicalised or not with one click. 

Example from Detailed SEO Chrome extension:

You can either check the canonical tag of HTML pages in bulk by using Screaming Frog. Run the crawl, move to canonical tag -> canonical link element (found in the column) exporting the file, you can check the canonical tag for bulk URLs

Benefits of URL canonicalization

By implementing URL canonicalization, you can:

  • Improve search engine rankings by consolidating the link equity of multiple URLs into a single, preferred URL
  • Use to reduce the crawl budget for an enterprise level eCommerce website
  • Effectively manage syndicated content, passing link equity to the original page

Some common Canonical tag FAQs:

What happens when your webpage has more than one canonical tag?

If a web page has more than one canonical link, there are chances that all rel=canonical links will be ignored by Google. Make sure to add only one canonical tag in a page

Should every page have a self referential canonical tag?

Although it is not critical for each page having a self referential canonical tag. Google may automatically pick the most preferred one. But having them explicitly helps the process. And there could be possible scenarios where unwanted duplicate URLs tend to appear in Google index.

Did you face any canonical tag issue or Google picks duplicate page URL as the original URL? Let me know your experience or questions in the comment section.

Leave a Comment