douaoren4402 2013-05-07 05:01
浏览 28
已采纳

PHP正则表达式URL解析问题preg_replace

I have a custom markup parsing function that has been working very well for many years. I recently discovered a bug that I hadn't noticed before and I haven't been able to fix it. If anyone can help me with this that'd be awesome. So I have a custom built forum and text based MMORPG and every input is sanitized and parsed for bbcode like markup. It'll also parse out URL's and make them into legit links that go to an exit page with a disclaimer that you're leaving the site... So the issue that I'm having is that when I user posts multiple URL's in a text box (let's say delimited) it'll only convert every other URL into a link. Here's the parser for URL's:

$markup = preg_replace("/(^|[^=\"\/])\b((\w+:\/\/|www\.)[^\s<]+)" . "((\W+|\b)([\s<]|$))/ei", '"$1<a href=\"out.php?".shortURL("$2")."\" target=\"_blank\">".shortURL("$2")."</a>$4"', $markup);

As you can see it calls a PHP function, but that's not the issue here. Then entire text block is passed into this preg_replace at the same time rather than line by line or any other means.

  1. If there's a simpler way of writing this preg_replace, please let me know
  2. If you can figure out why this is only parsing every other URL, that's my ultimate goal here

Example INPUT:

http://skylnk.co/tRRTnb
http://skylnk.co/hkIJBT
http://skylnk.co/vUMGQo 
http://skylnk.co/USOLfW 
http://skylnk.co/BPlaJl 
http://skylnk.co/tqcPbL
http://skylnk.co/jJTjRs
http://skylnk.co/itmhJs
http://skylnk.co/llUBAR
http://skylnk.co/XDJZxD

Example OUTPUT:

<a href="out.php?http://skylnk.co/tRRTnb" target="_blank">http://skylnk.co/tRRTnb</a>
<br>http://skylnk.co/hkIJBT
<br><a href="out.php?http://skylnk.co/vUMGQo" target="_blank">http://skylnk.co/vUMGQo</a> 
<br>http://skylnk.co/USOLfW 
<br><a href="out.php?http://skylnk.co/BPlaJl" target="_blank">http://skylnk.co/BPlaJl</a> 
<br>http://skylnk.co/tqcPbL
<br><a href="out.php?http://skylnk.co/jJTjRs" target="_blank">http://skylnk.co/jJTjRs</a>
<br>http://skylnk.co/itmhJs
<br><a href="out.php?http://skylnk.co/llUBAR" target="_blank">http://skylnk.co/llUBAR</a>
<br>http://skylnk.co/XDJZxD
<br>
  • 写回答

1条回答 默认 最新

  • dongnue4923 2013-05-08 04:22
    关注

    e flag in preg_replace is deprecated. You can use preg_replace_callback to access the same functionality.

    i flag is useless here, since \w already matches both upper case and lower case, and there is no backreference in your pattern.

    I set m flag, which makes the ^ and $ matches the beginning and the end of a line, rather than the beginning and the end of the entire string. This should fix your weird problem of matching every other line.

    I also make some of the groups non-capturing (?:pattern) - since the bigger capturing groups have captured the text already.

    The code below is not tested. I only tested the regex on regex tester.

    preg_replace_callback(
        "/(^|[^=\"\/])\b((?:\w+:\/\/|www\.)[^\s<]+)((?:\W+|\b)(?:[\s<]|$))/m",
        function ($m) {
            return "$m[1]<a href=\"out.php?".shortURL($m[2])."\" target=\"_blank\">".shortURL($m[2])."</a>$m[3]";
        },
        $markup
    );
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 AT89C51控制8位八段数码管显示时钟。
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 下图接收小电路,谁知道原理
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口