我需要匹配组中的所有字符，只要它们与某个单词不匹配即可

I'm not sure if this is a simple question, but i have been unable to find an answer to it thus far. I am trying to write a regular expression that pulls apart a .docx file and matches replaces all <w:tab /> tags with <w:ind /> tags, as the <w:tab> tags don't seem to preserve tabs correctly when they translate to html. I am working in PHP, and I have so far been unsuccessful at writing a regular expression that does what i need it to do correctly.

The problem is, I can't just run a simple find-and-replace function here. I have to remove the <w:tab /> tag and inject the <w:ind /> tag within the nearest opening and closing <w:rPr></w:rPr> tags.

A sample XML string would look something like this:

    <w:p w14:paraId="2679030C" w14:textId="4E6FFA99" w:rsidR="00ED4314" w:rsidRPr="00254747" w:rsidRDefault="00ED4314" w:rsidP="00322270">
        <w:pPr>
            <w:pStyle w:val="NoSpacing" />
            <w:spacing w:line="480" w:lineRule="auto" />
            <w:jc w:val="both" />
            <w:rPr>
                <w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman" />
                <w:sz w:val="24" />
                <w:szCs w:val="24" />
            </w:rPr>
        </w:pPr>
        <w:r w:rsidRPr="00254747">
            <w:rPr>
                <w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman" />
                <w:sz w:val="24" />
                <w:szCs w:val="24" />
            </w:rPr>
            <w:tab />
            <w:t>SOME text</w:t>
        </w:r>
        <w:r w:rsidR="0003297C">
            <w:rPr>
                <w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman" />
                <w:sz w:val="24" />
                <w:szCs w:val="24" />
            </w:rPr>
            <w:t>SOME more text</w:t>
        </w:r>
        <w:r w:rsidRPr="00254747">
            <w:rPr>
                <w:rFonts w:ascii="Times New Roman" w:hAnsi="Times New Roman" w:cs="Times New Roman" />
                <w:sz w:val="24" />
                <w:szCs w:val="24" />
            </w:rPr>
            <w:t>EVEN more text</w:t>
        </w:r>
    </w:p>

So each instance of <w:tab/> would need to be removed, and then i would need to trace backwards to the previous <w:rPr> tag and inject a <w:ind /> tag inside of it.

heres what i have so far:

$content = preg_replace("/<w:rPr>(.*?)<\/w:rPr>(.*?)<w:tab\/>/", "<w:rPr><w:ind w:firstLine=\"720\"/>$1</w:rPr>$2", $content);

This sort-of works, but the problem is i think the search is too global. even though i'm specifying for it to not be greedy, the results it returns to me have way more content then they should. Can anyone suggest an optimal way to refine this? Thanks in advance!

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dpka7974 2013-11-05 05:58
关注
I think you're confusing non-greediness with regular expressions "knowing" to stop before finding more tags—which it can't. If you mean to disallow tags between </w:rPr> and <w:tab/>, then this should roughly work:

/<w:rPr>(.*?)<\/w:rPr>([^<]*?)<w:tab\/>/ ^^^^

This is known as a negated character class, and matches all characters that aren't <—therefore won't consume any other tags before finding a <w:tab/>.

Edit. In response to your clarification, i.e. allowing all tags except <w:rPr> before finding a <w:tab/>, you'd need to use a negative lookahead assertion, because, as you correctly understood, negated character classes only exclude characters, not strings.

/<w:rPr>(.*?)<\/w:rPr>((?:(?!<w:rPr>).)*?)<w:tab\/>/ ^^^^^^^^^^^^^^^^

Ignore the (?:xyz) if that's confusing—that's merely a way to get parentheses not to capture—I need the parentheses though for the quantifier, *. The important piece here is the (?!xyz) which is known as a negative lookahead assertion (and incidentally is also a non-capturing group)—it matches if it looks ahead and does not find "xyz"—so, what we're doing above is this: (1) look ahead, and (2) if it's not <w:rPr>, then (3) match one character, ., and (4) repeat—until a <w:tab/> is found.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

我需要匹配组中的所有字符，只要它们与某个单词不匹配即可 php
2013-11-05 05:13

回答 1 已采纳 I think you're confusing non-greediness with regular expressions "knowing" to stop before finding
计算字符串中匹配的单词数量[重复] php
2019-08-02 07:27

回答 1 已采纳 Hello PassCody first of all wellcome on Stackoverflow. In this case you can split/explode the Str
如何在字符串中的任何位置匹配精确的单词与PHP正则表达式 php
2016-01-07 17:54

回答 4 已采纳 Square brackets in a regexp are not for grouping, they're for specifying character classes; groupi
php返回字符串中所有单词的方法
2020-10-24 14:35

主要介绍了php返回字符串中所有单词的方法,实例分析了php字符串正则匹配与数组操作的技巧,具有一定参考借鉴价值,需要的朋友可以参考下
用PHP中的字符串中的数字/数字替换匹配的单词 php
2015-08-09 02:59

回答 2 已采纳 This may be a little too long, but you get the idea: http://3v4l.org/JfXBN <?php $str="Please
查找字符串中与多个数组键匹配的所有单词 php
2015-07-10 13:13

回答 5 已采纳 str_word_count() with a format argument of 1 or 2, then an array_intersect().... but watch out for
使用php在字符串中查找匹配的单词 php
2014-08-13 20:11

回答 2 已采纳 You can find all the words starting with exp using pre_match or preg_match_all and the regex: /(e
php 匹配字符串数组_PHP匹配字符串到多个关键字数组
2021-03-23 08:08

YF.Su的博客例如：PHP匹配字符串到多个关键字数组$cat['dining'] = array('food','restaurant','brunch','meal','cand(y|ies)');$cat['services'] = array('service','cleaners','framing','printing');...
正则表达式匹配逗号分隔的特定单词，有或没有连字符 php
2017-04-17 06:36

回答 3 已采纳 Demo ZB[^,]*(?=,?) What I am looking for is to match any comma separated value that starts with
如何在SQL查询和PHP中匹配字符串中的单词并检索数据？ mysql php
2014-06-02 09:50

回答 5 已采纳 try $qry = "select * from brand where brand "; $i=1; $count = count($brand); foreach($brand as
PHP：匹配字符串中最后的parethesized单词？ php
2014-12-22 15:04

回答 3 已采纳 #.*$(.*?)$# try this.this should do it. or #$(.*?)$(?!.*\()#
php正则匹配字符_PHP正则表达式匹配字符的方法汇总
2021-03-22 19:48

健康维C的博客 1、字符对于单个字符，通常按字面意义表示，字符指出其后的字符为特殊字符，所以不做字面意义解释，而解释为特殊字符。例如/b/相当于字符b,通过在b前面加一个...例如 /^A/字符不匹配 an A中的A，但匹配 an A中的最前...
正则表达式同时匹配带连字符和非带连字符的单词 php
2016-06-09 14:36

回答 1 已采纳 Just replace \w by a character class that includes the dash: [\w-] $line = preg_replace('/^.*? ke
php正则匹配任意字符串,正则表达式匹配任意字符（包括换行符）
2021-03-13 01:02

weixin_39915081的博客但有时候我们需要匹配包括换行符在内的字符，经过一番搜索，发现了几种正则表达式匹配任意字符(包括换行符)的方法。可以用 ([\s\S]*) ，也可以用 “([\d\D]*)”、“([\w\W]*)” 来匹配，就可以匹配包括换行符在内的...
php 正则匹配特定中文,中文正则(正则表达式匹配指定中文)
2021-04-21 07:26

刘玉珍的博客 \u4E00-\u9FA5能验证中文，但是包含中文符号我只想要中文(不要'‘；，.function checkname(){ var name=document.getelementbyid(＂text1＂); var . }}这个可以了，推荐你一个正则表达式工具，填入这个\w{1，}，不....
没有解决我的问题, 去提问

悬赏问题

¥15 如何用Labview在myRIO上做LCD显示？(语言-开发语言)
¥15 Vue3地图和异步函数使用
¥15 C++ yoloV5改写遇到的问题
¥20 win11修改中文用户名路径
¥15 win2012磁盘空间不足,c盘正常，d盘无法写入
¥15 用土力学知识进行土坡稳定性分析与挡土墙设计
¥70 PlayWright在Java上连接CDP关联本地Chrome启动失败,貌似是Windows端口转发问题
¥15 帮我写一个c++工程
¥30 Eclipse官网打不开，官网首页进不去，显示无法访问此页面，求解决方法
¥15 关于smbclient 库的使用

我需要匹配组中的所有字符，只要它们与某个单词不匹配即可

1条回答 默认 最新

悬赏问题

1条回答默认最新