I am writing php code that generates html that contains links to documents via their DOI. The links should point to https://doi.org/
followed by the DOI of the document.
As the results is a url, I thought I could simply use php's esc_url()
function like in
echo '<a href="' . esc_url('https://doi.org/' . $doi)) . '">' . esc_url('https://doi.org/' . $doi)) . '</a>';
as this is what one is supposed to use in text nodes, attribute nodes or anywhere else. Unfortunately things apparenty aren't that easy...
The problem is that DOIs can contain all sorts of special characters that are apparently not handled correctly by esc_url()
. A nice example of such a DOI is
10.1002/(SICI)1521-3978(199806)46:4/5<493::AID-PROP493>3.0.CO;2-P
which is supposed to link to
https://doi.org/10.1002/(SICI)1521-3978(199806)46:4/5<493::AID-PROP493>3.0.CO;2-P
With $doi
equal to this DOI the above code however produces a link that is displayed and links to https://doi.org/10.1002/(SICI)1521-3978(199806)46:4/5493::AID-PROP4933.0.CO;2-P
.
This leads me to the question: If esc_url()
is obviously not one-size-fits-all no-brained solution to escaping urls, then what should I use? For this case I can get the result I want with
esc_url(htmlspecialchars('https://doi.org/' . $doi))
but is this really the right way™ of doing it? Does this have any other unwanted side effects? If not, then why does esc_url()
not also escape <
and >
? Would esc_html()
be better than htmlspecialchars()
? If so, should I nest it into a esc_url()
?
I am aware that there are many articles on escaping urls in php on stackoverflow, but I couldn't find one that addresses the issues of <
and >
signs.