dongzi1397 2013-10-17 19:13
浏览 47
已采纳

由于名称空间而导致的XML解析问题

Suppose I have this XML

<?xml version="1.0" encoding="UTF-8"?>
<TEI xmlns="http://www.tei-c.org/ns/1.0" xmlns:irp="http://kuleuven-kulak.be/itec/ns/irp/" xml:id="irp-rmg-fr-2013-05-03-00862-src" xml:lang="fr">
  <text xml:id="irp-rmg-fr-2013-05-03-00862-src." xml:lang="fr">
    <body>
      <div>
        <p>
          <irp:PEnrich irp:path="(//section/paragraph)[1]" n="irp-1">
            <irp:PNerd>
              1955 (30 avril) Naissance à 
              <irp:ne ref="http://fr.dbpedia.org/resource/Lille" irp:confidence="1" type="LOC">Lille</irp:ne>.
            </irp:PNerd>
          </irp:PEnrich>
        </p>
      </div>
    </body>
  </text>
</TEI>

How should I use SimpleXML and xpath to parse the irp:PNerd nodes and get a string like:

1955 (30 avril) Naissance à <url="http://fr.dbpedia.org/resource/Lille">Lille</url>.

I tried getting the text by doing:

    $penrich = $xml->xpath("//irp:PEnrich");
    foreach ($penrich as $p) {
        $pnerds = $p->children("irp", true);
        $pnerd = $pnerds->PNerd;
        $ne = $pnerd->ne;
        foreach ($ne as $n) {
            print_r($n->children());
        }
        echo "----
";
    }

but this only retrieves type and ref: (Also, how should I access these values in my code?)

SimpleXMLElement Object
(
    [@attributes] => Array
        (
            [ref] => http://fr.dbpedia.org/resource/Lille
            [type] => LOC
        )
)

But I want to obtain something like:

1955 (30 avril) Naissance à <url="http://fr.dbpedia.org/resource/Lille">Lille</url>.
  • 写回答

1条回答 默认 最新

  • dongtanlin0765 2013-10-18 03:32
    关注

    Here's some PHP code that shows some examples of how to access the parts of the XML you requested:

    <?php
    
    $tei = <<<XML
    <TEI xmlns="http://www.tei-c.org/ns/1.0"
         xmlns:irp="http://kuleuven-kulak.be/itec/ns/irp/"
         xml:id="irp-rmg-fr-2013-05-03-00862-src"
         xml:lang="fr">
      <text xml:id="irp-rmg-fr-2013-05-03-00862-src." xml:lang="fr">
        <body>
          <div>
            <p>
              <irp:PEnrich irp:path="(//section/paragraph)[1]" n="irp-1">
                <irp:PNerd>1955 (30 avril) Naissance à <irp:ne ref="http://fr.dbpedia.org/resource/Lille" irp:confidence="1" type="LOC">Lille</irp:ne>.</irp:PNerd>
              </irp:PEnrich>
            </p>
          </div>
        </body>
      </text>
    </TEI>
    XML;
    
    $doc = new DOMDocument();
    $doc->loadXML(mb_convert_encoding($tei, 'utf-8', mb_detect_encoding($tei)));
    $xpath = new DOMXPath($doc);
    $xpath->registerNamespace('irp', 'http://kuleuven-kulak.be/itec/ns/irp/');
    
    echo $xpath->evaluate("string(//irp:PNerd/text())");
    echo '<url ref="'. $xpath->evaluate("string(//irp:ne/@ref)") . '">';
    echo $xpath->evaluate("string(//irp:ne/text())");
    echo '</url>';
    ?>
    

    Yields the following output:

    1955 (30 avril) Naissance ? <url ref="http://fr.dbpedia.org/resource/Lille">Lille</url>
    

    Notes:

    • I assume that you had a typo and didn't really want <url= as that's looking like XML but is actually malformed.
    • There may remain a character encoding issue with à is coming through as ?.
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法
  • ¥15 可否在不同线程中调用封装数据库操作的类
  • ¥15 微带串馈天线阵列每个阵元宽度计算
  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据