Duplicate content on the Internet is as old as the web itself. The absolute ease of copying (or even robbing) web-specific content multiplied by constellations of non-optimized technical solutions such as tracking parameters or human error creates billions of duplicate pages in addition to existing pages. This makes it one of the priority tasks to be managed by search engines. And as usual, what Google wants inevitably affects the work of SEO managers. In this two-part article, we will review different types of duplicate content, algorithms for detecting and specifics of duplicate content processing by Google, methods and tools for its identification and of course correction.
What is duplicate content?
Let’s start with the definition of duplicate content, and for this we take the official explanation from Google:
“By duplicate content, we generally mean large blocks of content, which belong to the same domain or extend to several domains, which are identical in the same language or are essentially similar. In most cases, these contents are not originally misleading. »
Based on this definition, we can easily come up with some duplicate content typologies.
Depending on where the duplicate content appears, we may have:
- Internal duplicates (duplicate page is within the same page).
- External duplicates (duplicate page is on another site, name of another domain).
Depending on the degree of similarity, we distinguish:
- Complete duplication (“exact duplicate”).
- Partial duplications (“almost duplicate”).
Depending on the nature of the duplication:
- Voluntary and deceptive duplication.
- Unintentional or accidental duplications.
To these three types of duplication we can add 4th :
- Technical duplicates.
- Semantic duplications (pages that use different words and expressions but end up talking about exactly the same thing with no added value).
Depending on the type of duplication, the severity, reaction and correction methods will not be the same. This is what we will see later in this article.
How does Google identify duplicate content?
For search engines, comparing web documents with the goal of identifying duplicates is always a matter of compromise between accuracy and spent machine resources.
Many algorithms that are available to us and that we can easily use for our projects, very quickly prove ineffective on the scale of the Web when it is necessary to make comparisons with millions and even billions of websites.
To determine if a site contains duplicate content, Google uses several levels, methods, and analysis algorithms.
[Cet article est disponible sous sa forme complète pour les abonnés du site Réacteur. Pour en savoir plus : https://www.reacteur.com/2022/06/contenu-duplique-et-seo.html]
The article he wrote Alexis Rylkosenior SEO consultant at iProspect (https://www.iprospect.com/ & https://alekseo.com/)