dr9379 2013-08-26 15:50
浏览 63
已采纳

在php中的正则表达式与负面的lookbehind

I'm doing some SEO of huge catalog product descriptions using preg_replace_callback and have some difficulties with regex.

I'd like to replace all these words (hat, shirt) except ones after "men's" + 0-2 words between, e.g. "men's pretty black hat", "men's long shirt" shouldn't be replaced.

Here is a debug code, in real application I use callback to pick proper replacement for each word:

$str = "men's black hat, and orange shirt!";
preg_match_all('/((\s|\.\s|,\s|\!\s|\?\s)(hat|shirt)(\s|\.|\.\s|,\s|\!|\!\s|\?|\?\s))/i', $str, &$_matches);
print_r($_matches);

Thanks

  • 写回答

2条回答 默认 最新

  • doutuichan2681 2013-08-26 16:10
    关注

    Lookbehind must be of fixed length, so this way of attacking the problem won't work.

    IMHO you are trying to make preg_relace_callback do way too much. If you want to perform manipulation that is complex beyond a certain level, it's reasonable to forfeit the convenience of a single function call. Here's another way you can attack the problem:

    1. Use preg_split to split the text into words along with the flag PREG_SPLIT_OFFSET_CAPTURE so that you know where each word appears in the original text.
    2. Iterate over the array of words. It's now very easy to do a "negative lookbehind" on the array and see if a hat or shirt is preceded by any one of the other terms that interest you.
    3. Whenever you find a positive match for hat or shirt, use the offset from preg_split and the (known) length of the positive match to power substr_replace on the original text input.

    For example:

    $str = "men's black hat, and orange shirt!";
    $targets = array('hat', 'shirt');
    $shield = 'men\'s';
    $bias = 0;
    
    for ($i = 0; $i < count($words); ++$i) {
        list ($word, $offset) = $words[$i];
    
        if (!in_array($word, $targets)) {
            continue;
        }
    
        for ($j = max($i - 2, 0); $j < $i; ++$j) {
            if ($words[$j][0] === $shield) {
                continue 2;
            }
        }
    
        $replacement = 'FOO';
        $str = substr_replace($str, $replacement, $offset + $bias, strlen($word));
        $bias += strlen($replacement) - strlen($word);
    }
    
    echo $str;
    

    See it in action.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥50 永磁型步进电机PID算法
  • ¥15 sqlite 附加(attach database)加密数据库时,返回26是什么原因呢?
  • ¥88 找成都本地经验丰富懂小程序开发的技术大咖
  • ¥15 如何处理复杂数据表格的除法运算
  • ¥15 如何用stc8h1k08的片子做485数据透传的功能?(关键词-串口)
  • ¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗?
  • ¥200 uniapp长期运行卡死问题解决
  • ¥15 latex怎么处理论文引理引用参考文献
  • ¥15 请教:如何用postman调用本地虚拟机区块链接上的合约?
  • ¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题:[h264 @ 000000004faf7500]no frame?