douzuo5504 2014-05-28 09:38
浏览 29
已采纳

正则表达式匹配某些标记之外的所有新行字符

I need to match all new line characters outside of a particular html tag or pseudotag.

Here is an example. I want to match all " "s ouside of [code] [/code] tags (in order to replace them with <br> tags) in this text fragment:

These concepts are represented by simple Python classes.  
Edit the polls/models.py file so it looks like this: 

[code]  
from django.db import models

class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published') 
[/code]

I know that I should use negative lookaheads, but I'm struggling to figure the whole thing out.

Specifically, I need a PCRE expression, I will use it with PHP and perhaps Python.

  • 写回答

2条回答 默认 最新

  • dongmu1951 2014-05-28 09:51
    关注

    To me, this situation seems to be straight out of Match (or replace) a pattern except in situations s1, s2, s3 etc. Please visit that link for full discussion of the solution.

    I will give you answers for both PHP and Python (since the example mentioned django).

    PHP

    (?s)\[code\].*?\[/code\](*SKIP)(*F)|
    
    

    The left side of the alternation matches complete [code]...[/code] tags, then deliberately fails, and skips the part of the string that was just matched. The right side matches newlines, and we know they are the right newlines because they were not matched by the expression on the left.

    This PHP program shows how to use the regex (see the results at the bottom of the online demo):

    <?php
    $regex = '~(?s)\[code\].*?\[/code\](*SKIP)(*F)|
    ~';
    
    $subject = "These concepts are represented by simple Python classes.
    Edit the polls/models.py file so it looks like this:
    
    [code]
    from django.db import models
    
    class Question(models.Model):
    question_text = models.CharField(max_length=200)
    pub_date = models.DateTimeField('date published')
    [/code]";
    
    $replaced = preg_replace($regex,"<br />",$subject);
    echo $replaced."<br />
    ";
    ?>
    

    Python

    For Python, here's our simple regex:

    (?s)\[code\].*?\[/code\]|(
    )
    

    The left side of the alternation matches complete [code]...[/code] tags. We will ignore these matches. The right side matches and captures newlines to Group 1, and we know they are the right newlines because they were not matched by the expression on the left.

    This Python program shows how to use the regex (see the results at the bottom of the online demo):

    import re
    subject = """These concepts are represented by simple Python classes.  
    Edit the polls/models.py file so it looks like this: 
    
    [code]  
    from django.db import models
    
    class Question(models.Model):
        question_text = models.CharField(max_length=200)
        pub_date = models.DateTimeField('date published') 
    [/code]"""
    
    regex = re.compile(r'(?s)\[code\].*?\[/code\]|(
    )')
    def myreplacement(m):
        if m.group(1):
            return "<br />"
        else:
            return m.group(0)
    replaced = regex.sub(myreplacement, subject)
    print(replaced)
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 企业资源规划ERP沙盘模拟
  • ¥15 前端echarts坐标轴问题
  • ¥15 CMFCPropertyPage
  • ¥15 ad5933的I2C
  • ¥15 请问RTX4060的笔记本电脑可以训练yolov5模型吗?
  • ¥15 数学建模求思路及代码
  • ¥50 silvaco GaN HEMT有栅极场板的击穿电压仿真问题
  • ¥15 谁会P4语言啊,我想请教一下
  • ¥15 这个怎么改成直流激励源给加热电阻提供5a电流呀
  • ¥50 求解vmware的网络模式问题 别拿AI回答