普通网友 2014-02-13 18:43
浏览 497

正则表达式提取纯文本XML节点

I have a LARGE XML file. I'm troubleshooting some things, and I would like to extract specific nodes from the XML file. I don't want a SimpleXML object, I want to make a new file with the raw string matching what I want (posting this on bash/sed/php).

<?xml version="1.0" encoding="UTF-8"?>
<definition></definition>
    <metadata></metadata>
    <nodeToRegex>
        <nodeImightwant>
            <subnode>
                <subsubnode1></subsubnode1>
                <subsubnodeToCheck>stringCheck</subnodeToCheck>
                <subsubnode2></subsubnode2>
            </subnode>
        </nodeImightwant>
        <nodeImightwant></nodeImightwant>
        <nodeImightwant></nodeImightwant>
    </nodeToRegex>

So from this XML file, I want all lines from every node except the nodeToRegex. From nodeToRegex, I only want the nodeImightwant if the stringCheck string equals "aValidString". Can this be done via regex or should I just copy and paste the stuff out of the file? (my regex skills are subpar)

  • 写回答

1条回答 默认 最新

  • dongyu3967 2014-02-13 18:56
    关注

    Don't parse XML with regexes. There is no reason you can't repackage/rearrange the data using SimpleXML, but trying to do it with a regex is a recipe for lots of headaches and, ultimately, broken code.

    See this classic example for why parsing XML/HTML/XHTML with regexes is the road to madness.

    If you insist on using a regex, just replace the nodes you don't want, like this:

    $myxml = preg_replace('~<nodeToRegex>.*?</nodeToRegex>~', '', $myxml);
    

    Regular expression visualization

    Debuggex Demo

    评论

报告相同问题?

悬赏问题

  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 lammps拉伸应力应变曲线分析
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥15 请问Lammps做复合材料拉伸模拟,应力应变曲线问题