Canonicalization is a big word that you may or may not have heard of before. It literally means an object that can be represented in multiple ways. Because your Web site probably has canonicalization issues (most do), you could be losing position on the search engines and not even realize it.
What is Canonicalization?
Google and the other search engines do their best to try to read and understand your Web site. The do this by following links, both internal (on your site), and external links coming into your site. But because after all, Google is just a really complex computer program, your Web site is a series of index entries. Depending on how it finds your data and Web pages, it will put different information into its index.
So for instance, if your Home page can be found at:
http://www.xyz.com/ and http://www.xyz.com/index.html
Then Google may actually see this as two separate and distinct pages, and put both into its index. That’s a problem, because now Google sees two site pages with exactly the same content, and you’ll get penalized for duplicate content.
What’s even worse, is if your site resolves to the domain without the “www”, you’ve got now four possible pages that all point to the same thing:
http://xyz.com/ and http://xyz.com/index.html
Aren’t More Domain Names Better?
Many times my clients will tell me they bought several domain names, thinking that would help them with search engine positioning, so now they’ve compounded the problem even more:
http://www.abc.com points to http://www.xyz.com (which already has four possible canonical pages as shown above)
Now the new domain name has the same problem but you’ve made it even worse with four new pages that have the same duplicate content problem as the first example. Google is going to look at all this as just bad practice because you’ve got up to eight pages of duplicate content, and you’re going to be penalized for it.
Ho Do I Fix Canonicalization Issues?
Matt Cutts from Google wrote a very good blog entry a couple years ago, in his SEO Advice: URL Canonicalization column, in which he answers many of these types of questions. It takes some work to get this resolved, but it’s a process I go through with every client that wants (search engine optimization (SEO) on their site.
Just more recently, all the search engines including Google, Yahoo! and MSN have come together with a new standard that helps us out even more. In his blog, Barry Schwartz describes how the search engines have agreed on a canonical tag to reduce duplicate content clutter. Basically, you want to include a new tag between the <head> and </head> sections of your Web site that looks something like this:
<link rel="canonical" href="http://www.yourdomain.com/xyz.html" />
This instructs the search engines that the part after “href” is what you want them to have in their index as the “real” page, and to ignore all other versions that they might see.
This is fantastic news! It’s been a real pain in the keister to try to use 301-redirects, edit the .htaccess file, embed PHP code, and so on to get the search engines to just put one version of the page in their index. This really simplifies things for all of us, but you have to take the time to do it.
DotNetNuke Canonicalization
I’m a big fan of DotNetNuke as a Web platform. It makes it really easy for me to get a Web site launched and running within an hour or two. Unfortunately, the out-of-the-box platform creates to URLs for every page you generate. You can now set the URLs to be “human friendly” in the web.config file, but you still end up with the old page names being active. Furthermore, you can’t edit what’s in the <head> section, so you can’t add the canonical tag.
It appears that iFinity has created a DotNetNuke module that gives you an automatic way of generating the canonical tag for your DNN site. I’ve yet to try it out, but it looks like it’ll do the trick for a reasonable price.
No code examples? 🙂
Actually, DNN does allow you to modify the section to add the cannonical tag.
Tom Kraak has a great tutorial on how to do this in his blog at
http://seablick.com/blog/140/using-the-canonical-link-tag-in-dnn.aspx
In a nutshell, you go to page settings/advanced settings and put the tag in the ‘page header’ section.
Bruce at Ifinity has some great products, and just about everyone recommends his URL Master and Pageblaster for all but the simplest DNN sites, but you can implement the cannonical tag pretty easily without using a third party app.
Excellent. Thanks, Steve, I appreciate the pointer! I was unaware of that ability, and I learn new stuff about DNN all the time. 🙂
This site is one of many that I manage, but it is illustrative of the canonicalization issue… The home page of this site (the only page I’m interested in being indexed) could be accessed by: http://confectionarydesigns.com, http://www.confectionarydesigns.com, confectionarydesigns.com, at the whim of the person doing the browsing. Google should know this and not consider these “multiples” as duplicate content. It’s their own lazy selves that cause this problem, by not coding the search crawlers to take this into account. However, variations on the name (e.g. confectdesign.com (not a real domain) ) that resolves to the real domain IS canonicalization, and web designers should take care either not to have multiple names for a single domain, or employ the tag.
Thanks for the advice, and for keeping us up to date…
Vince G.
Matt Cutts from Google has said that “35% of all content on the web is duplicate content” so I think they understand this. But it’s good to get your ducks in a row too, because I’ve seen negative effects of canonicalization too.