dqs13465424392 2015-06-14 14:29
浏览 205
已采纳

将unicode URL转换为ASCII

I'm writing a PHP application that accepts an URL from the user, and then processes it with by making some calls to binaries with system()*. However, to avoid many complications that arise with this, I'm trying to convert the URL, which may contain Unicode characters, into ASCII characters.

Let's say I have the following URL:

https://täst.de:8118/news/zh-cn/新闻动态/2015/

Here two parts need to be dealt with: the hostname and the path.

  • For the hostname, I can simply call idn_to_ascii().
  • However, I can't simply call urlencode() over the path, as each of the characters that need to remain unmodified will also be converted (e.g. news/zh-cn/新闻动态/2015/ -> news%2Fzh-cn%2F%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81%2F2015%2F as opposed to news/zh-cn/%E6%96%B0%E9%97%BB%E5%8A%A8%E6%80%81/2015/).

How should I approach this problem?


*I'd rather not deal with system() calls and the resulting complexity, but given that the functionality is only available by calling binaries, I unfortunately have no choice.

  • 写回答

3条回答 默认 最新

  • dtrz99313 2015-06-16 14:51
    关注

    The following can be used for this transformation:

    function convertpath ($path) {
      $path1 = '';
      $len = strlen ($path);
      for ($i = 0; $i < $len; $i++) {
         if (preg_match ('/^[A-Za-z0-9\/?=+%_.~-]$/', $path[$i])) {
           $path1 .= $path[$i];
         }
         else {
           $path1 .= urlencode ($path[$i]);
         }
      }
      return $path1;
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制
  • ¥20 usb设备兼容性问题
  • ¥15 错误(10048): “调用exui内部功能”库命令的参数“参数4”不能接受空数据。怎么解决啊
  • ¥15 安装svn网络有问题怎么办
  • ¥15 vue2登录调用后端接口如何实现