Using canonical links in Shopify

While a few Shopify shop-owners have elevated their concern with search engine optimization to an obsessive-compulsive disorder while waiting for their sales to pick up, the word is out that we can use the link element to tell search engines about duplicate content. The idea is that we don't want search engines to dilute the ranking of a product page among the many different URLs by which said page can be accessed. The dilution is further aggravated when a product belongs to many collections: the same content we see at /products/my-lovely-boyfriend-is-for-sale will be seen at /collections/SOME_COLLECTION/products/my-lovely-boyfriend-is-for-sale and /collections/SOME_OTHER_COLLECTION/products/my-lovely-boyfriend-is-for-sale. You get these last two URLs if you use the within filter in your collection and index templates. You can learn how to navigate within a collection by visiting Shopify's wiki.

We add a canonical link to a web page when we don't want it indexed. We don't want it indexed because we know the same content can be found somewhere else, and we use the canonical link to tell the spider where to find the one 'true' page, that is, the page we do want indexed, cherished and behold. For the sake of simplicity, the one true page for Shopify product pages is the one sitting under /products/MY-PRODUCT. You'll find at this URL the collection-agnostic landing page for your product.

In Shopify, adding a canonical link for product pages is very simple. Open theme.liquid and locate your head element. Anywhere between the opening and closing tag of the head element, paste this code:

{% if template == 'product' %}{% if collection %}
<link rel="canonical" href="{{ shop.url }}{{ product.url }}" />
{% endif %}{% endif %}

The noindex instruction

As a bonus question: what if you want to ask a spider to not index a webpage, yet have nothing else to offer in its place?

Shopify keeps the legs of well-behaved spiders off some of the content of your website by supplying a nice instruction booklet under SHOPNAME.myshopfy.com/robots.txt. The content of that file is:

# robots.txt file for www.shopify.com e-commerce engine
 
User-agent: *
Disallow: /admin
Disallow: /carts
Disallow: /orders
Sitemap: http://SHOPNAME.myshopify.com/sitemap.xml
 
User-agent: Nutch
Disallow: /

Shopify also tells the well-behaved spiders what to index in SHOPNAME.myshopfy.com/sitemap.xml. You have nothing to do here. It's all taken care of for you. Also, you can't edit robots.txt, and cannot easily provide an alternate sitemap — not sure why you would want to do that, but it needed to be said.

What if there are pages that you do not want spiders to get their legs on? For example, what if you don't want product-type collections to get indexed?

Answer: Open theme.liquid and locate your head element. Anywhere between the opening and closing tag of the head element, paste this code:

{% if template == 'collection' %}{% if collection.handle %}{% else %}
<meta name="robots" content="noindex,follow" />
<!-- instructs search engines not to index this page, but to follow links from the page -->
{% endif %}{% endif %}

How does that work? The collections that live under /collections/types?q=PRODUCT TYPE have no collection handle.