How to Avoid Duplicate Content in osCommerce
If you’ve got an osCommerce store set up you are probably fighting a duplicate content problem with the search engines. The problem is that OSC has an almost limitless number of ways to view any one particular product page - 1 for each category that the product is in plus one without any category information. The cPath variable in the query string is the culprint.
Go to your osCommerce store and look at the URIs that it builds for your product pages. If you click on one from the list of new products or one of the sidebar boxes you will see a URI like /product_info.php?products_id=123. If you browse through the categories to the same product you may see a URL like /product_info.php?products_id=123&cPath=1_2. Products in multiple categories are an even bigger problem since they may have different cPath variables passed. The problem is that to Google and most other search engines these are all distinct URIs, but all have the exact same content.
Here’s how I got around it and my search engine traffic, as small as it is, has been growing.
I wanted the product_info.php file without any cPath variable to show up as the default. It’s what shown in my Sitemap so that’s what Google is looking for anyway. Plus the no category version is what is shown for the recent additions and a few other pages that escape me right now.
What I did is edit the header_tags.php file from the aptly named Header Tags Generator 2.0 by WebMakers.com. At the bottom of the file, just above “echo ‘‘;” you’re going to add this code.
-
if ($_SERVER[‘PHP_SELF’]==‘/product_info.php’ && $_GET[‘cPath’]!=”))
-
{
-
}
This will add a robots meta tag telling the robot to follow any links, but not to index this page. Ideally this will cause the search engine index not to pick up pages as duplicate content.
Question, Comments...
Do you have more questions. Please either leave a comment below or join us in our new forum.
Great post. There is another easy way of removing the cPath from the osCommerce source. http://forums.oscommerce.com/index.php?s=&showtopic=130080&view=findpost&p=523682
Try changing this in catalog/includes/modules/products_listing.php
CODE
case ‘PRODUCT_LIST_NAME’:
$lc_align = ”;
if (isset($HTTP_GET_VARS['manufacturers_id'])) {
$lc_text = ‘<a href=”‘ . tep_href_link(FILENAME_PRODUCT_INFO, ‘manufacturers_id=’ . $HTTP_GET_VARS['manufacturers_id'] . ‘&products_id=’ . $listing['products_id']) . ‘” rel=”nofollow”>’ . $listing['products_name'] . ‘</a>’;
} else {
$lc_text = ‘ $lt;a href=”‘ . tep_href_link(FILENAME_PRODUCT_INFO, ($cPath ? ‘cPath=’ . $cPath . ‘&’ : ”) . ‘products_id=’ . $listing['products_id']) . ‘” rel=”nofollow”>’ . $listing['products_name'] . ‘</a> ’;
}
break;
Remove cPath from this line.
CODE
$lc_text = ‘ <a href=”‘ . tep_href_link(FILENAME_PRODUCT_INFO, ‘products_id=’ . $listing['products_id']) . ‘” rel=”nofollow”>’ . $listing['products_name'] . ‘</a> ’;
You will also need to do this with the PRODUCT IMAGE function in the same file too.
Good luck.
Thanks Myles. Just goes to prove that there’s always about 50 different ways to do something online and we’ve got plenty of options to find one that works for us.
And I noticed that WordPress thought what you entered was a bunch of links and parsed them so I went back and edited your comment a little to make them show right.
Myles,
Way to go! Just got rid of the cPath! Much neater now.
I also use robots.txt to avoid duplicate content and get googlebot to focus on important pages:
Disallow: /*osCsid=*
//disallow all urls containing the session parameter. It is better to just set the OSC to “force cookies” so those parameters don’t appear at all. But if you already have those URLs indexed, this will help clear googles index.
Disallow: /*action=*
//those links go to the login page anyway. Nothing valuable content-vise
Disallow: /eshop/*sort=*
//another way to get rid of some duplicate content
Disallow: /*manufacturers_id=*products_id=*
//having sorted the cPath, there is still the manufacturers that can get you duplicate content product pages. What I do is just disallow urls with that pattern.
Disallow: /eshop/popup_image.php
//nothing to index there either…
Although, you are still left with the problem of the duplicate content from language, currency etc.
On one of my relatively default OsC installations Google has the site indexed at nearly 6000 pages when there are ~1000 products. This despite disabling currency, language, reviews, etc. I make it about 5 or 6 ways of viewing one product even with this turned off :(
True. The site I had setup was only in English so language and currency weren’t problems. Maybe you could do something similar though. How does osc keep track of what currency or language the visitor is using? Is it part of the query string? Session variable maybe? If you can figure out the pattern then you’ll be able to just add to the if statement to match more cases.