Canonicalization is a big word that you may or may not have heard of before. It literally means an object that can be represented in multiple ways. Because your Web site probably has canonicalization issues (most do), you could be losing position on the search engines and not even realize it.

What is Canonicalization?

Google and the other search engines do their best to try to read and understand your Web site. The do this by following links, both internal (on your site), and external links coming into your site. But because after all, Google is just a really complex computer program, your Web site is a series of index entries. Depending on how it finds your data and Web pages, it will put different information into its index.

So for instance, if your Home page can be found at:

http://www.xyz.com/ and http://www.xyz.com/index.html

Then Google may actually see this as two separate and distinct pages, and put both into its index. That’s a problem, because now Google sees two site pages with exactly the same content, and you’ll get penalized for duplicate content.

What’s even worse, is if your site resolves to the domain without the “www”, you’ve got now four possible pages that all point to the same thing:

http://xyz.com/ and http://xyz.com/index.html

Aren’t More Domain Names Better?

Many times my clients will tell me they bought several domain names, thinking that would help them with search engine positioning, so now they’ve compounded the problem even more:

http://www.abc.com points to http://www.xyz.com (which already has four possible canonical pages as shown above)

Now the new domain name has the same problem but you’ve made it even worse with four new pages that have the same duplicate content problem as the first example. Google is going to look at all this as just bad practice because you’ve got up to eight pages of duplicate content, and you’re going to be penalized for it.

Ho Do I Fix Canonicalization Issues?

Matt Cutts from Google wrote a very good blog entry a couple years ago, in his SEO Advice: URL Canonicalization column, in which he answers many of these types of questions. It takes some work to get this resolved, but it’s a process I go through with every client that wants (search engine optimization (SEO) on their site.

Just more recently, all the search engines including Google, Yahoo! and MSN have come together with a new standard that helps us out even more. In his blog, Barry Schwartz describes how the search engines have agreed on a canonical tag to reduce duplicate content clutter. Basically, you want to include a new tag between the <head> and </head> sections of your Web site that looks something like this:

<link rel="canonical" href="http://www.yourdomain.com/xyz.html" />

This instructs the search engines that the part after “href” is what you want them to have in their index as the “real” page, and to ignore all other versions that they might see.

This is fantastic news! It’s been a real pain in the keister to try to use 301-redirects, edit the .htaccess file, embed PHP code, and so on to get the search engines to just put one version of the page in their index. This really simplifies things for all of us, but you have to take the time to do it.

DotNetNuke Canonicalization

I’m a big fan of DotNetNuke as a Web platform. It makes it really easy for me to get a Web site launched and running within an hour or two. Unfortunately, the out-of-the-box platform creates to URLs for every page you generate. You can now set the URLs to be “human friendly” in the web.config file, but you still end up with the old page names being active. Furthermore, you can’t edit what’s in the <head> section, so you can’t add the canonical tag.

It appears that iFinity has created a DotNetNuke module that gives you an automatic way of generating the canonical tag for your DNN site. I’ve yet to try it out, but it looks like it’ll do the trick for a reasonable price.