dsegw3424 2010-11-22 15:11
浏览 57
已采纳

php PCRE Regex优化

quite new to regexes i'm trying to optimize one, or at least know if there are better ways to do it.

Here is my input string:

$str = 'Some text
spanned on
several lines
txt_to_grab1 fixed_text1 txt_to_grab2
Full line to grab
txt_to_grab3 fixed_text2 txt_to_grab4
Some text after';

I'm trying to grab the lines from "txt_to_grab1" to "txt_to_grab4", but only the words "txt_to_grabX" and the line "Full line to grab".
I want to preserve everything untouched before and after (ie line breaks), but remove line breaks inside the lines i grab (as each line will be a <tr> that'll go into an html table).

Regex patterns/replace i found matching:

$find = "#(?<=
)(.*?) fixed_text1 (.*?)(
.*?
)(.*?) fixed_text2 (.*?)(
)#i";
$replace = '"$1" && "$2" grabbed.$3"$4" && "$5" grabbed.$6';   

$find = "#(.*)(?<=
)(.*?) fixed_text1 (.*?)(
)(.*)(?<=
)(.*?) fixed_text2 (.*?)(
.*)#is";
$replace = '$1"$2" && "$3" grabbed.$4$5"$6" && "$7" grabbed.$8';

Questions :

All questions can be sum up as : are there better/shorter/faster patterns ?

  • how to make the patterns work with either or ? I read somewhere on stack that (? ) would be a solution, but i dunno how to use them in lookbehinds. For example the following patterns work, but i don't like them (dirty as only are used in lookbehinds, may produce unexpected results):

    "#(?<=
    )(.*?) fixed_text1 (.*?)(?
    .*??
    )(.*?) fixed_text2 (.*?)(?
    )#i"
    "#(.*)(?<=
    )(.*?) fixed_text1 (.*?)(?
    )(.*)(?<=
    )(.*?) fixed_text2 (.*?)(?
    .*)#is";
    
  • even better, how to use the "s" modifier to remove all line breaks from the pattern, so being able to use (.*?) but still grabbing what i want ? Word boundaries ?

  • is the multiline mode (m modifier) useful/helpful here ?

I'd really like the regexes to be explained, if you provide some :)

  • 写回答

1条回答 默认 最新

  • doukong5394 2010-11-22 19:17
    关注

    You don't need lookbehinds for this. Just use the start-of-line anchor at the beginning of your regex and the end-of-line anchor at the end (that's ^ and $ in multiline mode). To match the line separators in the middle you can use (?: |[ ]), a common idiom for the three most common styles of line separator: , , or .

    As for the s modifier (a.k.a. "single-line" or "DOT_ALL"), you don't need that either. All it does is allow the dot metacharacter to match line separators as well as all other characters, which doesn't do you any good. You want it to stop matching when it reaches line breaks, so you can exclude them from your captures.

    Here's a demo:

    $pattern='#^(.*?) fixed_text1 (.*)(?:
    |[
    ])(.*)(?:
    |[
    ])(.*?) fixed_text2 (.*)$#im';
    
    preg_match($pattern, $source, $m);
    
    echo "$m[1] && $m[2] grabbed.
    ";
    echo "$m[3]
    ";
    echo "$m[4] && $m[5] grabbed.
    "; 
    

    output:

    txt_to_grab1 && txt_to_grab2 grabbed.
    Full line to grab
    txt_to_grab3 && txt_to_grab4 grabbed.
    

    See it in action on ideone.com

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 csmar数据进行spss描述性统计分析
  • ¥15 各位请问平行检验趋势图这样要怎么调整?说标准差差异太大了
  • ¥15 delphi webbrowser组件网页下拉菜单自动选择问题
  • ¥15 wpf界面一直接收PLC给过来的信号,导致UI界面操作起来会卡顿
  • ¥15 init i2c:2 freq:100000[MAIXPY]: find ov2640[MAIXPY]: find ov sensor是main文件哪里有问题吗
  • ¥15 运动想象脑电信号数据集.vhdr
  • ¥15 三因素重复测量数据R语句编写,不存在交互作用
  • ¥15 微信会员卡等级和折扣规则
  • ¥15 微信公众平台自制会员卡可以通过收款码收款码收款进行自动积分吗
  • ¥15 随身WiFi网络灯亮但是没有网络,如何解决?