dongzhi4470 2018-07-10 03:55
浏览 197

正则表达式:处理大字符串时的灾难性回溯

I need help for optimizing my regex for processing URL BBCode Tag. The regex is to check that URL tag has valid pattern and NOT containing whitelist protocol

#(\[url=(?:"|"|\'|)(((((?!https|http|ftp|mailto).)*):(//)?)([^\[\]]*))(?:"|"|\'|)\])(.*)(\[/url\])#siU

Regex will ignore :

  • [url="www.example.com"]example[/url]
  • [url="https://example.com"]example[/url]
  • [url="http://example.com"]example[/url]
  • [url="ftp://example.com"]example[/url]
  • [url="mailto:mail@example.com"]example[/url]

And match when :

  • [url="ymsgr://example.com"]example[/url]
  • [url="anyprotocol://example.com"]example[/url]

It's run well and has no issue, until user create string data with more than 10000 char length, that will make Catastrophic backtracking

Regex101 Reference Link

  • 写回答

1条回答 默认 最新

  • dongxie3701 2018-07-10 06:53
    关注

    Here is a slightly optimized version:

    (?:\[url=(?:"|"|\'|)(?:(?:(?:(?:(?!https?|ftp|mailto).)*):(?://)?)(?:(?!"|"|&quote;).)++)(?:"|"|\'|)\])(?:(?!\[/url\]).)++(?:\[/url\])
    

    The main optimizations here are:

    • changed most of the capture groups into non-capture groups (?:)
    • changed .* expressions no tempered greedy tokens/excludes (?:(?!).)
    • added some possessive quantifiers ++
    • (switching from protocol blacklist to a whitelist would also help a lot)

    Demo

    If you are going to use this pattern often it might be worth to mention the S|Study PHP regex flag. Guessing from the description, it should not be useful but might be still worth the trial. I have not tested it.

    Sample Code


    Regarding your updated sample: It's probably best to do this in a two step process: first, extract the URL meta tags with a much simpler regex, e.g.

    \[url=.*\[/url\]

    then, use your original regex or the one above to verify the input format.

    评论

报告相同问题?

悬赏问题

  • ¥15 微信小程序协议怎么写
  • ¥15 c语言怎么用printf(“\b \b”)与getch()实现黑框里写入与删除?
  • ¥20 怎么用dlib库的算法识别小麦病虫害
  • ¥15 华为ensp模拟器中S5700交换机在配置过程中老是反复重启
  • ¥15 java写代码遇到问题,求帮助
  • ¥15 uniapp uview http 如何实现统一的请求异常信息提示?
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看