dotn30471 2017-08-17 10:43
浏览 87

在php中转义DOI链接 - 当esc_url()不够时

I am writing php code that generates html that contains links to documents via their DOI. The links should point to https://doi.org/ followed by the DOI of the document.

As the results is a url, I thought I could simply use php's esc_url() function like in

echo '<a href="' . esc_url('https://doi.org/' . $doi)) . '">' . esc_url('https://doi.org/' . $doi)) . '</a>';

as this is what one is supposed to use in text nodes, attribute nodes or anywhere else. Unfortunately things apparenty aren't that easy...

The problem is that DOIs can contain all sorts of special characters that are apparently not handled correctly by esc_url(). A nice example of such a DOI is

10.1002/(SICI)1521-3978(199806)46:4/5<493::AID-PROP493>3.0.CO;2-P

which is supposed to link to

https://doi.org/10.1002/(SICI)1521-3978(199806)46:4/5<493::AID-PROP493>3.0.CO;2-P

With $doi equal to this DOI the above code however produces a link that is displayed and links to https://doi.org/10.1002/​(SICI)1521-3978(199806)46:4/​5493::AID-PROP4933.0.CO;2-P.

This leads me to the question: If esc_url() is obviously not one-size-fits-all no-brained solution to escaping urls, then what should I use? For this case I can get the result I want with

esc_url(htmlspecialchars('https://doi.org/' . $doi))

but is this really the right way™ of doing it? Does this have any other unwanted side effects? If not, then why does esc_url() not also escape < and >? Would esc_html() be better than htmlspecialchars()? If so, should I nest it into a esc_url()?

I am aware that there are many articles on escaping urls in php on stackoverflow, but I couldn't find one that addresses the issues of < and > signs.

  • 写回答

1条回答 默认 最新

  • dssqq82402 2017-08-17 11:23
    关注

    I'm no PHP expert, but I do know about DOIs and SICIs can be really annoying.

    URL-encoding and HTML encoding are separate things, so it makes sense to think about them separately. You must escape the angle-brackets to make correct HTML. As for the URL-escaping, you should also do this because there are other characters that might break URLs (such as the # character, which also pops up from time to time).

    So I would recommend:

    'https://doi.org/' . htmlspecialcharacters(urlencode($doi))
    

    Which will give you:

    <a href="https://doi.org/10.1002%2F%28SICI%291521-3978%28199806%2946%3A4%2F5%3C493%3A%3AAID-PROP493%3E3.0.CO%3B2-P">Click here</a>
    

    Note the order of function application, and the fact that you don't want to encode the https://doi.org resolver!

    To the above "dipshit decision" comment... it's certainly inconvenient. But SICIs were around before DOIs and it's one of those annoying things we've had to live with ever since!

    评论

报告相同问题?

悬赏问题

  • ¥15 矩阵加法的规则是两个矩阵中对应位置的数的绝对值进行加和
  • ¥15 活动选择题。最多可以参加几个项目?
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)
  • ¥20 怎么在stm32门禁成品上增加查询记录功能
  • ¥15 Source insight编写代码后使用CCS5.2版本import之后,代码跳到注释行里面
  • ¥50 NT4.0系统 STOP:0X0000007B
  • ¥15 想问一下stata17中这段代码哪里有问题呀