doudun8705 2015-07-15 11:42
浏览 85

解析没有路径但在查询中使用斜杠的URL

I have problems parsing an URL than doesn't have a path but has a slash in the query. For example: http://example.com?q=a/b

I'm aware that such an URL is most likely invalid (*) - it requires at least a slash as the path like this: http://example.com/?q=a/b.

All browsers in which I tried such an URL in, correct the URL automatically. And that is basically what I want to reproduce: Identify and correct such an URL.

Using parse_url however produces:

var_dump( parse_url('http://example.com?q=a/b') );

array(3) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(15) "example.com?q=a"
  ["path"]=>
  string(2) "/b"
}

While with an URL without a slash in the query it works fine:

var_dump( parse_url('http://example.com?q=ab') );

array(3) {
  ["scheme"]=>
  string(4) "http"
  ["host"]=>
  string(11) "example.com"
  ["query"]=>
  string(4) "q=ab"
}

All external libraries I tried (Jwage\Purl, League\Url, Sabre\Uri) basically do the same thing, which surprises me a bit.

Why do (all?) browsers get it "right", while (all?) PHP libraries get it "wrong"?

Other than trying to catch these cases with a regular expression before parsing the URL (which may be unreliable - that's why I want to use a library in the first place), what alternatives do I have?

(*) I consulted three sources: RFC 1738, RFC 3986, WHATWG URL Standard and they all three disagree on what is considered valid.

  • 写回答

2条回答 默认 最新

  • dongpo8250 2015-07-15 12:13
    关注

    In case you still want to apply a regular expression, the following should generate the URL you are looking for:

    $url=pcre_replace('/([^/]+:\/\/[^/]+)\?/', '$1/?',$url);
    

    It requires for the URL to start with a protocol name of at least one character followed by "://", a domain name of at least one character ("localhost" would be acceptable too). After that it will insert '/' before a '?', but only if there is no further '/' before the '?'.

    评论

报告相同问题?

悬赏问题

  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 yolov8边框坐标
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真