dongtanliefang8765 2015-02-19 10:21
浏览 44
已采纳

是否可以知道主题字符串中匹配的位置

I have a file name where information has to be replaced. Here is a subject sample :

FileA-2014-11-01_K_1_A2_383.xxx

As many files are to be processed, this filename is first matched by a regex, say :

/[a-zA-Z]*-\d{4}-\d{2}-\d{2}_(\w)_(\d)_A2_(\d*)\.xxx$/

This regex will give me, using preg_match, the values to be replaced, here :

  • K=>A
  • 1=>2
  • 383=>666

My first try was to naively use "str_replace", but it fails when patterns are repeated in the string : here i will get :

FileA-2024-22-02_A_2_A2_666.xxx

So the date is also modified by the str_replace (as it was told to do..)

So, i wonder if there is a way to know where is a given match in the string to have a clean replacement. I'm now trying to revert the regex to be able to capture non-replacement blocks, and then insert replaced data. That regex would be :

/([a-zA-Z]*-\d{4}-\d{2}-\d{2}_)\w(_)\d(_A2_)\d*(\.xxx)$/

With that one, i'm able to keep non-replaced parts. I now have to find a kind of index to know the replacement position in the string. I guess I can achieve this way, but is seems somewhat complicated and error prone. Given I only have the initial regex and the map for to=>from replacement, is there a way to do that in a better way?

[EDIT : solution]

<?php

$filename = "FileA-2014-11-01_K_1_A2_383.xxx";
$expected = "FileA-2014-11-01_A_2_A2_666.xxx";

$regex = "/[a-zA-Z]*-\d{4}-\d{2}-\d{2}_(\w)_(\d)_A2_(\d*)\.xxx$/";


global $replacements;

$replacements["K"] = "A";
$replacements["1"] = "2";
$replacements["383"] = "666";


$result = preg_replace_callback($regex, function($matches){
    global $replacements;
    print_r($matches);
    // ended here. no way.
}, $filename);


if(strcmp($result,$expected)==0)
    echo "preg_replace_callback() : Yep
";
else
    echo "preg_replace_callback() : Nop
";


preg_match($regex, $filename, $matches, PREG_OFFSET_CAPTURE);

// remove useless global string match
array_shift($matches);

$result = $filename;
foreach($matches as $matchInfo){

    $match    = $matchInfo[0];
    $position = $matchInfo[1];

    $matchLength= strlen($match);

    $beforeReplacementPart = substr($result, 0, $position);
    $afterReplacementPart = substr($result, ($position + $matchLength));
    $result = $beforeReplacementPart . $replacements[$match] . $afterReplacementPart;

}


if(strcmp($result,$expected)==0)
    echo "preg_match() and substr game : Yep
";
else
    echo "preg_match() and substr game : Nop
";
  • 写回答

5条回答 默认 最新

  • drurhg37071 2015-02-19 10:49
    关注

    A regex that matches that filename:

    $re  = '/[a-zA-Z]*-\d{4}-\d{2}-\d{2}_(\w)_(\d)_A2_(\d*)\.xxx$/';
    $str = 'FileA-2014-11-01_K_1_A2_383.xxx';
    

    If you add PREG_OFFSET_CAPTURE as the fourth parameter ($flags) to the call to preg_match(), it will also return the offset of each captured string in the third parameter:

    preg_match($re, $str, $matches, PREG_OFFSET_CAPTURE);
    

    A print_r($matches) will reveal:

    Array
    (
        [0] => Array
            (
                [0] => FileA-2014-11-01_K_1_A2_383.xxx
                [1] => 0
            )
        [1] => Array
            (
                [0] => K
                [1] => 17
            )
        [2] => Array
            (
                [0] => 1
                [1] => 19
            )
        [3] => Array
            (
                [0] => 383
                [1] => 24
            )
    )
    

    $matches[0] is the part that matched the entire regex. $matches[1] is the first capturing sub-expression, $matches[2] is the second and so on.

    $matches[1][0] is the fragment from the input string that matched the first regex sub-expression (\w) and $matches[1][1] is the offset in the input string where it was found. The same for $matches[N][0] and $matches[N][1] for the Nth sub-expression.

    If you need to do a simple replacement then you don't need to bother about offsets but use preg_replace() or, if the replacement expression is complex or dynamic, preg_replace_callback().

    Using preg_replace() you need to capture the parts you want to keep:

    $re  = '/([a-zA-Z]*-\d{4}-\d{2}-\d{2}_)\w_\d_A2_\d*(\.xxx)$/';
    $str = 'FileA-2014-11-01_K_1_A2_383.xxx';
    
    $new = preg_replace($re, '$1A_2_A2_666$2', $str);
    echo($new."
    ");
    

    In the replacement string, $1 and $2 denote the sub-expressions from the regex. We marked them for capturing in order to re-use them in the replacement string.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(4条)

报告相同问题?

悬赏问题

  • ¥15 目详情-五一模拟赛详情页
  • ¥15 有了解d3和topogram.js库的吗?有偿请教
  • ¥100 任意维数的K均值聚类
  • ¥15 stamps做sbas-insar,时序沉降图怎么画
  • ¥15 买了个传感器,根据商家发的代码和步骤使用但是代码报错了不会改,有没有人可以看看
  • ¥15 关于#Java#的问题,如何解决?
  • ¥15 加热介质是液体,换热器壳侧导热系数和总的导热系数怎么算
  • ¥100 嵌入式系统基于PIC16F882和热敏电阻的数字温度计
  • ¥15 cmd cl 0x000007b
  • ¥20 BAPI_PR_CHANGE how to add account assignment information for service line