While setting the Canonical tag, i found out that i am not getting all the juice out of the canonical purpose...
GIVEN
Currently ugly urls like website.org/juice?ln=de
are made nice via apache, reachable in more userfriendly way, like website.org/de/juice
. Now, in this multi-lingual website, I wish for consistency and all pages to have their languages as a folder. I wish the search engines to remember and prefer those /language/page
as opposed to their ugly counterparts /page?ln=language
.
Question 1: Am I sofar on the right track in how i want to use Canonical to communicate this to the search engines out there?
CURRENTLY the code removes unneccessary strings sothat canonical urls are short:
when URL = http://website.org/de/juice?ln=whatever
canocal url= http://website.org/de/juice
Sofar so good, BUT, it does not rewrite the old files roaming on the net/old search engine cache memories, and thus following situations go wrong:
when URL = http://website.org/juice?ln=xyz (missing language folder)
then canonical becomes = http://website.org/juice (whereas it should be http://website.org/xyz/juice
Question 2: what should i add to my code, do to improve/ foolproof my canonical sothat it recognises situations where there is no language folder set?
<?php
$domain = $_SERVER['HTTP_HOST']; #domain like website.org
$qsIndex = strpos($extensions, '?'); # strip off of string/query part (?ln=xyz)
$pageclean = $qsIndex !== FALSE ? substr($extensions, 0, $qsIndex) : $extensions;
$canonical = "http://" . $domain . $pageclean;
?>
<html><head><link rel="canonical" href="<?=$canonical?>"></head>...
note: languages can be things like {de, nl, es, it, en, la, .... but also zh-CN, zh-TW} so whatever that comes after ln?=