In this article, we’ll be talking about duplicate content, what it is, why you don’t want too much duplicate content on your site, and how it affects your SEO. Ready to get into it? Let’s get started.
What Is Duplicate Content?
Duplicate content is content that appears within or across more than one domain on the Internet. When this happens, it can be tough for search engines to decide which version to show in the results. There are two types of duplicate content: internal and external.
Internal Duplicated Content
So, what is internal duplicate content? It is when you have many URLs pointing to the same material, this could be an issue with search engines.
The search engine makes this index using a program called a ‘web crawler.’ You want the web crawler to find your webpage so it can add its URL to growing index with many web pages. It then becomes the search engine’s index. Some websites intentionally stop crawlers from visiting them. For many others duplicate content confuses the web crawlers; therefore, your page does not show up in the index.
Duplicate content can have many causes; not all duplicate content is editorially created. In most cases, website owners don’t deliberately create duplicate content. However, the web is estimated to be 30% duplicated content!
The main causes of internal duplicate content
- The session ID is the unique identifier of a session. A session is a history of what the visitor did on your site, and this needs to be stored somewhere. The most common solution is to save with cookies, yet search engines don’t usually store cookies. At that point, some systems fall back to using Session IDs in the URL. This means that every internal link on the website gets that Session ID added to its URL, and because that Session ID is unique to that session, it creates a new URL and therefore duplicates content.
- URL parameters, such as click tracking and some analytics code, can cause duplicate content issues. It’s often beneficial to avoid adding URL parameters or alternate versions of URLs
- If your site has a “www.mysite.com” and a “mysite.com” (with and without the “www” prefix), the same content lives at both versions. The same concept pertains to sites that maintain both http:// and https:// versions. If both versions of a page are up and running, you can run into a duplicate content issue.
- Parameters and faceted navigation
- Trailing Slashes
- Index pages
- Alternate page versions such as m. or AMP pages or print
- Dev/hosting environments
- Country/language versions
External Duplicated Content
If you have been blogging for a while, you have probably heard of content scrapers. Content scrapers are websites that steal your content for their blogs without your permission. Some content scrapers copy the material off of your blog, but the majority use programs that automate the process of taking stuff from your RSS feed and posting your content on their site.
How you can stop content scrapers
Create Google Alerts:
create a Google Alert using your post’s title by putting the title in quotation marks, this way you can receive regularly delivered emails with the results.
If you use WordPress, you can receive trackbacks from sites when someone steals your content. Trackbacks are WordPress’ way of letting you know that another website has linked to a post on your blog.
If you use google webmaster tools, look under “Traffic”, you will see a page that says Links to your site, your scrapers will probably be shown there.
File a DMCA (Digital Millennium Copyright Act) with their host
You won’t be able to keep up with the number of scrapers on the web,it will take too long to fight them all. Just chill, have fun, and focus on creating quality content
Non-Malicious Duplicate Content
There is Duplicate Content that will not hurt or jeopardize your rankings whatsoever, according to google they are:
- Discussion forums that can generate both regular and stripped-down versions of the pages targeting mobile devices
- Store items that are shown or linked via multiple distinct URLs
- Printer-only versions of web pages
How to Avoid Internal Duplicate Content
Use a Tool to Assess Your Site
I Recommend Using Siteliner.com – (www.siteliner.com/)
Use this tool to receive a site report containing information on duplicate content, broken links, and more
A canonical tag (aka “rel canonical”) is a way of informing search engines that a particular URL represents the master copy of a page. Using the canonical tag obviates problems caused by identical or “duplicate” content appearing on various URLs.
So for example, on your site, you have the canonical URL, mysite.com/duplicate-content.
Then you have a duplicate of that URL for any reason. It could be that it’s there on purpose or its a complication in the site structure. Possibly could even be there for some tracking or testing purposes. That URL is mysite.com/duplicated-contents. Many other versions can be picked up by Google to be ranked the highest. If you want a specific page ranking the highest, you need to add this line to your code.
link rel=”canonical” href=”mysite.com/duplicate-content”
The href is telling Google what page; you put the code in the header tag of any document. There are many different ways to canonicalize multiple URLs, but we’ll go over that another time.
Create Original, High-Quality Content
Want to know one of the most effective solutions to avoid internal duplicate content? It’s quite simple. Website owners need to focus on pushing out high-quality expert content. Create original content, and you will be fine. The more pages on your site that are original content, the higher Google positions them; consequently, they may be able to appear throughout different search queries.
Now that your a duplicate content wiz, check out the tool mentioned above to see how much of your site is duplicated, and try to find sites who have scraped from your site. Otherwise, just focus on churning out more quality content and you will be fine, Google will know who the original author of a post is in the end.