dsgw3315 2017-04-11 01:48
浏览 18
已采纳

从字符串中的链接中删除部件

How can I change all links in a string from:

...<p><a href="https://www.somesite.com/url?q=http://www.someothersite.se/&amp;q1=xxx&q2=xxx">Some text</a>...

Into:

...<p><a href="http://www.someothersite.se/">Some text</a>...

"..." means that there are lots of other code. Also there are multiple links like this in the string. All links looks like this.

  • 写回答

2条回答 默认 最新

  • doujiu9307 2017-04-11 13:31
    关注

    Working solution:

    $regex = <<<EOF
    %(<[aA]\s[^>]*href=['"])([^"']+url\?q=([A-z]+:\/{2}[^"'&]+)[^"']*)(["'][^>]*>)%im
    EOF;
    
    $replacement = '$1$3$4';
    
    $html = <<<EOF
    ...<p><a href="https://www.somesite.com/url?q=http://www.secondsite.se/&amp;q1=xxx&q2=xxx">Some text</a>...
    ...<p><a class="lnk" href="https://www.somesite.com/url?q=http://www.thirdsite.se" id="lnk">Some text</a>...
    ...<p><a class="lnk2" href="https://www.somesite.com/">Some text</a>...
    EOF;
    
    $new_html = preg_replace($regex, $replacement, $html);
    

    Regex explained:

    (                     - Group 1 (tag A from beginning to href parameter)
      <[aA]\s             - Match <a or <A followed by white character
      [^>]*               - Match anything after it except > because we want to match all parameters (like class, id etc.)
      href=['"]           - match href parameter with equal sign and ' or " after it
    )                     - End group 1
    (                     - Group 2 (content of href parameter)
        [^"']+            - everything that is not ' or "
        url\?q=           - url?q=
        (                 - Group 3 (URL we are really interested in)
            [A-z]+:\/{2}  - match protocol of the url http:// https:// ftp:// etc.
            [^"'&]+       - match anything except ' " or &. those characters represents end of the url we are interested in.
        )                 - End group 3
        [^"']*            - Anything except " or ' - this represents end of href parameter
    )                     - End group 2
    (                     - Group 4 - rest of the tag
        ["']              - " or ' closing href parameter
        [^>]*             - anything except > so we match rest of the tag
        >                 - finally we match closing character >
    )                     - End group 4
    

    Then we just replace whole thing with just groups 1, 3 and 4.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 为什么apriori的运行时间会比fp growth的运行时间短呢
  • ¥15 帮我解决一下膳食平衡的线性规划模型的数据实例
  • ¥40 万年历缺少农历,需要和阳历同时显示
  • ¥250 雷电模拟器内存穿透、寻基址和特征码的教学
  • ¥200 比特币ord程序wallet_constructor.rs文件支持一次性铸造1000个代币,并将它们分配到40个UTXO上(每个UTXO上分配25个代币),并设置找零地址
  • ¥15 关于Java的学习问题
  • ¥15 如何使用chatgpt完成文本分类任务?
  • ¥15 已知速度v关于位置s的等式,怎么转化为已知位置求速度v的等式
  • ¥15 我有个餐饮系统,用wampserver把环境配置好了,但是后端的网页却进去,是为什么,能不能帮远程一下?
  • ¥15 R运行没有名称为"species"的插槽对于此对象类"SDMmodelCV"