Duplicate Content Decision Matrix

Resolving duplicate content is hard. The trickiest part is wrapping your head around which arrows in the canonicalization & de-duplication quiver to use in which situations to accomplish 2 twin goals of A) consolidating link popularity and B) deindexing duplicate content. I just finished a series on finding and fixing duplicate content, but I think this decision matrix will help clarify the situation. Before you can begin the process, however, you need to identify the duplicate content.

Duplicate Content Decision MatrixClick the image to open a larger version.

The first decision as you start at the list of duplicate content URLs is: “Is the content at these URLs an exact duplicate?” Answering “Yes” means that different URLs load the exact same content in the exact same order. Answering “No” means that there are some filtering, sorting, pagination, breadcrumb or other variations generated by the URL.

Let’s take the “Yes” path first, all of our URLs load content that is exactly the same as the others identified. The issue is a lack of canonicalization. The next question is: “Are these URL variations required to load the page or track data for analytics?” Answering “No” means that the duplicate URLs can be canonicalized in the SEO ideal manner with 301 redirects. Excellent! Answering “Yes” means that the URLs must continue to exist for some reason at least in the short term. Canonicalizing in this case will mean applying canonical tags to the head of the file specifying the canonical URL.

Now let’s assume that the answer to the first question about exact duplicates is “No.” This issue is called cannibalization, more than one page of content targeting the same keyword theme.

The next question to ask is: “Can the content at this URL be differentiated sufficiently with content optimization to send its own valuable keyword signal?” Answering “Yes” means that the content does have SEO potential and should be optimized to target a unique, relevent keyword target. Answering “No” means that the page is of low or no value to natural search and has no real chance to rank or drive traffic. Proceed to the next question.

The next question to ask is: “Does the content at this URL serve usability needs?” Answering “Yes” means that the content needs to remain live and accessible to provide functionality that humans enjoy/need (like sorting results by price). Canonicalizing in this case will mean applying canonical tags to the head of the file specifying the canonical URL. Answering “No” means that the page is of low or no value to natural search and usability both, and has no real chance to rank or drive traffic. Proceed to the next question.

The next question to ask is: “Is this URL string required to drive site functionality differently than the others, or to provide tracking data?” Answering “Yes” means that the URL needs to remain live and accessible to provide functionality. Canonicalizing in this case will mean applying canonical tags to the head of the file specifying the canonical URL. Answering “No” means that the page is of low or no value to natural search, usability or business needs and has no real chance to rank or drive traffic. The duplicate URLs can be canonicalized in the SEO ideal manner with 301 redirects. Excellent!

This duplicate content decision matrix identifies the ideal tactics to use to A) consolidating link popularity and B) deindexing duplicate content. In some cases, there will be barriers to 301 redirects, canonical tags or content differentiation via optimization. In those cases, there are other options to deindex content, but they do not consolidate link popularity. This is a critical point to understand. The link popularity accumulated by URLs that are meta noindexed, disallowed or 404’d is wasted. Try, try and try again to remove internal barriers to 301s and canonical tags before resorting to Plan B deindexation tactics like meta noindexes, disallows or 404s.


Web PieRat logo.

Originally posted on Web PieRat.

2 Replies to “Duplicate Content Decision Matrix”

  1. Hello everybody! I have a question about google ban. Can I use content from other websites (news, e.g.) within my blog if it is free to use, share and modify? I understand that I should stick to the topic of my website, and I will. But I don’t want to be panalised by google for duplicating content. Should I link to the source or avoid posting such news at all? Thanks!!!

Comments are closed.