dplo59755 2012-11-19 05:48
浏览 32
已采纳

正则表达式找不到xml标签内的字符串?

Trying to find regex for this question :

PHP parsing xml file error

Trying to match "137b" in the following string , using negative lookahead and lookbehind assertions :

<Rate Symbol="EURTRY">
    <Bid>2.29443</Bid>
    <Ask>2.29562</Ask>
    <High>2.29841</High>
    <Low>2.28999</Low>

 137b

 <Direction>1</Direction>
    <Last>23:29:11</Last>
</Rate>

Can anyone please point out why this regex is not working :

(?<!(<\w+>))[a-zA-Z0-9_\.:]+(?!(</\w+>))

Intention : A string containing "a-zA-Z0-9_.:" not preceded and followed by an XML tag, so it should have matched "137b" , but it does not.

Here is a link to the regex : http://regexr.com?32rk4

Whereas the same regex (<\w+>)[a-zA-Z0-9_\.:]+(</\w+>) without negative assertions correctly matches all the strings WITHIN xml tags.

http://regexr.com?32rk7

  • 写回答

2条回答 默认 最新

  • douxian4376 2012-11-19 10:14
    关注

    PHP won't let you use a lookbehind for this, but lookbehind wouldn't be the best tool for the job anyway. (It almost never is.) You should be able to solve the problem with just a lookahead. It will be a lot easier if you can make certain assumptions about the document structure. For example, can you be sure the enclosing node is always named Rate, and that its child nodes will never have children of their own (attributes or elements)? In other words, you'll never see something like this:

    <Rate Symbol="EURUSD">
        <Bid>1.27554</Bid>
            <foo>bar</foo>
        <Ask foo="bar">1.27578</Ask>
    </Rate>
    

    If so, you can use a positive lookahead to match any number of complete child nodes followed by a closing </Rate> tag:

    [a-zA-Z0-9_.:]++(?=\s*(?><(\w+)>[^<]*</\1>\s*)*+</Rate>)
    

    To explain:

    [a-zA-Z0-9_.:]++
    (?=
      \s*
      (?>
        <(\w+)>       # match an opening tag and capture its name
        [^<]*         # consume the content
        </\1>         # match the closing tag
        \s*
      )*+           # do this zero or more times
      </Rate>       # confirm we're inside a <Rate> element
    )
    

    This could even be expanded to deal with the other junk you mentioned in your original question, but the regex gets so ugly, I don't think it's worth it.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站
  • ¥50 成都蓉城足球俱乐部小程序抢票
  • ¥15 yolov7训练自己的数据集
  • ¥15 esp8266与51单片机连接问题(标签-单片机|关键词-串口)(相关搜索:51单片机|单片机|测试代码)
  • ¥15 电力市场出清matlab yalmip kkt 双层优化问题
  • ¥30 ros小车路径规划实现不了,如何解决?(操作系统-ubuntu)