douzi8548 2017-07-14 17:17
浏览 64
已采纳

RegEx查找和删除事件属性ex。 onclick,onload,onhover等[复制]

This question already has an answer here:

I have been at this on and off for a few days, but my RexEx mastery is not great. Yes I understand that RegEx is not for parsing HTML. I am doing server side "cleaning" of CKEditor input, which already does this, but only client side.

After striping none white-listed tags...

First: $html = preg_replace(' on\w+=(["\'])[^\1]*?\1', '', $html); remove all event attributes properly quoted with either ' or " quotes

Second: $html = preg_replace(' on\w+=\S+', '', $html); *remove the ones that have no quotes but still can fire, ex. onclick=blowUpTheBase()

What I would like to do is ensure the onEvent is between < & > but I can only get it to work if the onEvent attribute is the first one after a tag. Everything I try ends up capturing most of the code. I just cant get it lazy enough.

ex. $html = preg_replace('<([\s\S]?)( on\w+=\S+) ([\s\S]*?)>', '<$1 $3>', $html);

EDIT: I am going to select @colburton's answer because RegEx is what I asked for. I will also use it for my particular situation because it will due the trick. (it is an internal application anyhow)

BUT

I want to thank @Casimir et Hippolyte for his answer because it gives a great example and explanation about how to do this the "right way". I will in short order write up a function using DOMDocument and it will become my goto way of handling RTE/WYSIWYG/HTML input.

</div>
  • 写回答

1条回答 默认 最新

  • douhuzhi0907 2017-07-14 17:27
    关注

    Maybe I should have mentioned this from the start: This is not how you should try to filter XSS. This is purely academic inside the parameters you proposed (eg. "use RegEx").


    This gets you pretty close:

    preg_replace('/(<.+?)(?<=\s)on[a-z]+\s*=\s*(?:([\'"])(?!\2).+?\2|(?:\S+?\(.*?\)(?=[\s>])))(.*?>)/ig', "$1 $3", $string);
    

    Tested on

    <a href="something" onclick="bad()">text</a> onclick not in tags
    <a href="something" onclick=bad()>text</a>
    <a href="something" onclick="bad()" >text</a>
    <meta name="keywords" content="keyword1, keyword2, keyword3">
    
    <a href="something" onclick= "bad()">text</a> onclick not in tags
    <a href="something" onclick =bad()>text</a>
    <a href="something" onclick=bad('test')>text</a>
    <a href="something" onclick=bad("test")>text</a>
    <a href="something" onclick="bad()" >text</a>
    What if I write john+onelia=love forever?
    

    Play around here: https://regex101.com/r/GMBaQs/9

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 Java-Oj-桌布的计算
  • ¥15 请问如何在openpcdet上对KITTI数据集的测试集进行结果评估?
  • ¥15 powerbuilder中的datawindow数据整合到新的DataWindow
  • ¥20 有人知道这种图怎么画吗?
  • ¥15 pyqt6如何引用qrc文件加载里面的的资源
  • ¥15 安卓JNI项目使用lua上的问题
  • ¥20 RL+GNN解决人员排班问题时梯度消失
  • ¥60 要数控稳压电源测试数据
  • ¥15 能帮我写下这个编程吗
  • ¥15 ikuai客户端l2tp协议链接报终止15信号和无法将p.p.p6转换为我的l2tp线路