dongma6326 2019-05-23 13:19
浏览 214
已采纳

PHP正则表达式用回调替换多个模式

I'm trying to run a simple replacement on some input data that could be described as follows:

  • take a regular expression
  • take an input data stream
  • on every match, replace the match through a callback

Unfortunately, preg_replace_callback() doesn't work as I'd expect. It gives me all the matches on the entire line, not individual matches. So I need to put the line together again after replacement, but I don't have the information to do that. Case in point:

<?php
echo replace("/^\d+,(.*),(.*),.*$/", "12,LOWERME,ANDME,ButNotMe")."
";
echo replace("/^\d+-\d+-(.*) .* (.*)$/", "13-007-THISLOWER ThisNot THISAGAIN")."
";


function replace($pattern, $data) {
    return preg_replace_callback(
        $pattern, 
        function($match) {
            return strtolower($match[0]);
        }, $data
    );
}

https://www.tehplayground.com/hE1ZBuJNtFiHbdHO

gives me 12,lowerme,andme,butnotme, but I want 12,lowerme,andme,ButNotMe.

I know using $match[0] is wrong. It's just to illustrate here. Inside the closure I need to run something like

foreach ($match as $m) { /* do something */ }

But as I said, I have no information about the position of the matches in the input string which makes it impossible to put the string together again.

I've digged through the PHP documentation as well as several searches and couldn't find a solution.


Clarifications:

I know that $match[1], $match[2]... etc contain the matches. But only a string, not a position. Imagine in my example the final string is also ANDME instead of ButNotMe - according to the regex, it should not be matched and the callback should not be applied to it. That's why I'm using regexes in the first place instead of string replacements.

Also, the reason I'm using capture groups this way is that I need the replacement process to be configurable. So I cannot hardcode something like "replace #1 and #2 but not #3". On a different input file, the positions might be different, or there might be more replacements needed, and only the regex used should change.

So if my input is "15,LOWER,ME,NotThis,AND,ME,AGAIN", I want to be able to just change the regex, not the code and get the desired result. Basically, both $pattern and $data are variable.

  • 写回答

2条回答 默认 最新

  • duangua5308 2019-05-23 13:51
    关注

    This uses preg_match() and PREG_OFFSET_CAPTURE to return the capture groups and the offset within the original string where it is found. This then uses substr_replace() with each capture group to replace only the part of the string which is to be changed - this stops any chance of replacing similar text which you do not want to be changed...

    function lowerParts (string $input, string $regex ) {
        preg_match($regex, $input, $matches, PREG_OFFSET_CAPTURE);
        array_shift($matches);
        foreach ( $matches as $match )  {
            $input = substr_replace($input, strtolower($match[0]),
                $match[1], strlen($match[0]));
        }
        return $input;
    }
    echo lowerParts ("12,LOWERME,ANDME,ButNotMe", "/^\d+,(.*),(.*),.*$/");
    

    gives...

    12,lowerme,andme,ButNotMe
    

    But also with

    echo lowerParts ("12,LOWERME,ANDME,LOWERME", "/^\d+,(.*),(.*),.*$/");
    

    it gives

    12,lowerme,andme,LOWERME
    

    Edit:

    If the replacement data is of different lengths, then you would need to chop the string up into parts and replace each one. The complication is that each change in length alters the relative position of the offsets, so this has to keep track of what this offset is. This version also has a parameter which is the process you want to apply to the strings (this example just passes "strtolower") ...

    function processParts (string $input, string $regex, callable $process ) {
        preg_match($regex, $input, $matches, PREG_OFFSET_CAPTURE);
        array_shift($matches);
        $offset = 0;
        foreach ( $matches as $match )  {
            $replacement = $process($match[0]);
            $input = substr($input, 0, $match[1]+$offset)
                     .$replacement.
                     substr($input, $match[1]+$offset+strlen($match[0]));
            $offset += strlen($replacement) - strlen($match[0]);
        }
        return $input;
    }
    echo processParts ("12,LOWERME,ANDME,LOWERME", "/^\d+,.*,(.*),(.*)$/", "strtolower");
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来
  • ¥50 comfyui下连接animatediff节点生成视频质量非常差的原因
  • ¥20 有关区间dp的问题求解
  • ¥15 多电路系统共用电源的串扰问题
  • ¥15 slam rangenet++配置
  • ¥15 有没有研究水声通信方面的帮我改俩matlab代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作