douzi8548 2017-07-14 17:17
浏览 64
已采纳

RegEx查找和删除事件属性ex。 onclick,onload,onhover等[复制]

This question already has an answer here:

I have been at this on and off for a few days, but my RexEx mastery is not great. Yes I understand that RegEx is not for parsing HTML. I am doing server side "cleaning" of CKEditor input, which already does this, but only client side.

After striping none white-listed tags...

First: $html = preg_replace(' on\w+=(["\'])[^\1]*?\1', '', $html); remove all event attributes properly quoted with either ' or " quotes

Second: $html = preg_replace(' on\w+=\S+', '', $html); *remove the ones that have no quotes but still can fire, ex. onclick=blowUpTheBase()

What I would like to do is ensure the onEvent is between < & > but I can only get it to work if the onEvent attribute is the first one after a tag. Everything I try ends up capturing most of the code. I just cant get it lazy enough.

ex. $html = preg_replace('<([\s\S]?)( on\w+=\S+) ([\s\S]*?)>', '<$1 $3>', $html);

EDIT: I am going to select @colburton's answer because RegEx is what I asked for. I will also use it for my particular situation because it will due the trick. (it is an internal application anyhow)

BUT

I want to thank @Casimir et Hippolyte for his answer because it gives a great example and explanation about how to do this the "right way". I will in short order write up a function using DOMDocument and it will become my goto way of handling RTE/WYSIWYG/HTML input.

</div>
  • 写回答

1条回答 默认 最新

  • douhuzhi0907 2017-07-14 17:27
    关注

    Maybe I should have mentioned this from the start: This is not how you should try to filter XSS. This is purely academic inside the parameters you proposed (eg. "use RegEx").


    This gets you pretty close:

    preg_replace('/(<.+?)(?<=\s)on[a-z]+\s*=\s*(?:([\'"])(?!\2).+?\2|(?:\S+?\(.*?\)(?=[\s>])))(.*?>)/ig', "$1 $3", $string);
    

    Tested on

    <a href="something" onclick="bad()">text</a> onclick not in tags
    <a href="something" onclick=bad()>text</a>
    <a href="something" onclick="bad()" >text</a>
    <meta name="keywords" content="keyword1, keyword2, keyword3">
    
    <a href="something" onclick= "bad()">text</a> onclick not in tags
    <a href="something" onclick =bad()>text</a>
    <a href="something" onclick=bad('test')>text</a>
    <a href="something" onclick=bad("test")>text</a>
    <a href="something" onclick="bad()" >text</a>
    What if I write john+onelia=love forever?
    

    Play around here: https://regex101.com/r/GMBaQs/9

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?