在php中的正则表达式与负面的lookbehind

I'm doing some SEO of huge catalog product descriptions using preg_replace_callback and have some difficulties with regex.

I'd like to replace all these words (hat, shirt) except ones after "men's" + 0-2 words between, e.g. "men's pretty black hat", "men's long shirt" shouldn't be replaced.

Here is a debug code, in real application I use callback to pick proper replacement for each word:

$str = "men's black hat, and orange shirt!";
preg_match_all('/((\s|\.\s|,\s|\!\s|\?\s)(hat|shirt)(\s|\.|\.\s|,\s|\!|\!\s|\?|\?\s))/i', $str, &$_matches);
print_r($_matches);

Thanks

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
doutuichan2681 2013-08-26 16:10
关注
Lookbehind must be of fixed length, so this way of attacking the problem won't work.

IMHO you are trying to make preg_relace_callback do way too much. If you want to perform manipulation that is complex beyond a certain level, it's reasonable to forfeit the convenience of a single function call. Here's another way you can attack the problem:

Use preg_split to split the text into words along with the flag PREG_SPLIT_OFFSET_CAPTURE so that you know where each word appears in the original text.

Iterate over the array of words. It's now very easy to do a "negative lookbehind" on the array and see if a hat or shirt is preceded by any one of the other terms that interest you.

Whenever you find a positive match for hat or shirt, use the offset from preg_split and the (known) length of the positive match to power substr_replace on the original text input.

For example:

$str = "men's black hat, and orange shirt!"; $targets = array('hat', 'shirt'); $shield = 'men\'s'; $bias = 0; for ($i = 0; $i < count($words); ++$i) { list ($word, $offset) = $words[$i]; if (!in_array($word, $targets)) { continue; } for ($j = max($i - 2, 0); $j < $i; ++$j) { if ($words[$j][0] === $shield) { continue 2; } } $replacement = 'FOO'; $str = substr_replace($str, $replacement, $offset + $bias, strlen($word)); $bias += strlen($replacement) - strlen($word); } echo $str;

See it in action.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

查看更多回答(1条)

报告相同问题？

关注问题

在php中的正则表达式与负面的lookbehind php
2013-08-26 15:50

回答 2 已采纳 Lookbehind must be of fixed length, so this way of attacking the problem won't work. IMHO you are
PHP：正则表达式使用Lookbehind Assertions中的通配符替换单词 php
2013-07-03 05:18

回答 1 已采纳 As I mentioned you need to use an html parser. But if you want it /\btest\b(?=[^>]*(<|$))/
正则表达式PHP部分字符串[重复] php
2018-10-18 23:20

回答 1 已采纳 There are many ways to solve this using PHP/PCRE, one is to skip the preceding string using \K [^
正则表达式lookahead或lookbehind了解了，写代码就不用捉急了
2019-03-13 18:38

chuli3282的博客 正则表达式，用于处理文本的工具，由于其简洁、高效、强大所以被包括进了各种计算机语言的基础库当中。通常情况下，我们了解最基本的模式，就已经够用了，比如代表空白用\s、非空白\S等。如果有不了解的，可以从金色...
我可以在正则表达式的lookbehind中使用Kleene plus php
2013-05-29 21:19

回答 1 已采纳 NO you can't have a variable lookbehind, but you can do this in php: \[\[\w+-\K[^\]]+(?=]]) the
在查找电子邮件地址时，如何使PHP正则表达式断言适用于整个模式？ php
2019-03-05 09:54

回答 2 已采纳 You need to have a word boundary at the beginning of regex to avoid matching the text partially, a
PHP在JavaScript中的lookbehind javascript php
2016-08-06 23:21

回答 1 已采纳 Not sure how to do multiple matches at once with JS RegEx, but here are some non-regex solutions:
JAVA中的正则表达式
2022-08-01 09:57

华不在意的博客 JAVA中的正则表达式
正则表达式存在正面观察问题 html php
2014-08-13 21:14

回答 2 已采纳 Try the following: $pattern = '/(<(?:[^>]+?\s)?)([\w-]+=)"([\w-]+)"((?:\s[^>]+)?>)/';
是否有可能在正则表达式中AND两个单独的环视/零宽度断言（即lookbehind / look-behind）？ php
2013-05-11 17:19

回答 2 已采纳 Just put them next to each other. That's it. It will create AND effect, since you need to pass bot
如何使用正则表达式提取子字符串？
2014-03-10 20:15

回答 3 已采纳 You need to use FindStringSubmatch and then extract m[1]. http://play.golang.org/p/zOixuvDWsi
ES2018 新特征之：正则表达式反向(lookbehind)断言
2018-03-26 00:00

justjava_c的博客 ES2018 新特性异步迭代器非转义序列的模板字符串正则表达式反向(lookbehind)断言（本文）正则表达式 Unicode 转义正则表达式 s/dotAll 模式正则表达式命名捕获组对象展开运算符Promise.prototype.finally“正则...
如何使用正则表达式提取字符串？ php
2011-08-03 18:54

回答 2 已采纳 /(?<=#).*?(?=#)/ But why wouldn't you want to use a capturing group?
学习正则表达式 - 边界
2023-04-27 11:31

wzy0623的博客零宽断言、行的开始和结束、dotall 模式、单词边界和非单词边界、主题词的起始与结束位置、使用元字符的字面值、在段首加标签等
使用正则表达式找出不包含特定字符串的条目
2020-12-13 05:11

做日志分析工作的经常需要跟成千上万的日志条目打交道，为了在庞大的数据量中找到特定模式的数据，常常需要编写很多复杂的正则表达式。例如枚举出日志文件中不包含某个特定字符串的条目，找出不以某个特定字符串打头...
JS正则表达式完整版
2018-07-17 13:14

Chafferer，迷心的博客第一章 正则表达式字符匹配攻略 1 两种模糊匹配 2. 字符组 3. 量词 4. 多选分支 5. 案例分析第1章小结第二章 正则表达式位置匹配攻略 1. 什么是位置呢？ 2. 如何匹配位置呢？ 3. 位置的特性 4. 相关...
正则表达式组与断言
2022-10-08 19:04

小鹿乱创的博客 正则表达式处理的对象是字符串，或者抽象地说，是一个对象序列，若要为字符串设定多个匹配规则并分别使用的话，那么使用正则表达式分组是更好的选择，这也体现了正则表达式的灵活与高效。断言是判断当前位置的...
正则表达式大全
2022-09-13 16:30

潮浪之巅的博客例如，虽然A和B这两种情况只要有一种能够击中所需要的文本模式就会成功匹配，但是如果只要有一条子表达式（例如A）会产生误匹配，那么不论其它的子表达式（例如B）效率如何之高，范围如何精准，C的总体精准度也会因...
正则表达式讲解
2023-02-02 17:11

santugege的博客最实用的正则表达式详细讲解
java 正则表达式分类功能_正则表达式中分组功能高级用法
2021-02-28 15:57

Sheepy Sheepp的博客通过将部分正则表达式用括号括住来实现分组捕获的用法大部分人都很熟悉，如/.+(\d+).+/捕获字符串中的所有数字部分，然后通过组号就可以抽取出各分组匹配的字符文本或者通过反向引用的方式对分组中的文本进行替换。...
没有解决我的问题, 去提问

悬赏问题

¥50 永磁型步进电机PID算法
¥15 sqlite 附加（attach database）加密数据库时，返回26是什么原因呢？
¥88 找成都本地经验丰富懂小程序开发的技术大咖
¥15 如何处理复杂数据表格的除法运算
¥15 如何用stc8h1k08的片子做485数据透传的功能？(关键词-串口)
¥15 有兄弟姐妹会用word插图功能制作类似citespace的图片吗？
¥200 uniapp长期运行卡死问题解决
¥15 latex怎么处理论文引理引用参考文献
¥15 请教：如何用postman调用本地虚拟机区块链接上的合约？
¥15 为什么使用javacv转封装rtsp为rtmp时出现如下问题：[h264 @ 000000004faf7500]no frame？

在php中的正则表达式与负面的lookbehind

2条回答 默认 最新

悬赏问题

2条回答默认最新