dongneng5383 2016-05-20 14:53
浏览 80
已采纳

如何在PHP中使用simplexml_load_string来获取没有嵌入标签的innertext?

I found a freely available data dump of USPTO patent data in XML format. Part of the XML for most of the patents has the following structure:

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE us-patent-grant SYSTEM "us-patent-grant-v45-2014-04-03.dtd" [ ]>
<us-patent-grant lang="EN" dtd-version="v4.5 2014-04-03" file="US09226443-20160105.XML" status="PRODUCTION" id="us-patent-grant" country="US" date-produced="20151221" date-publ="20160105">
  ...
  <claims>
    ...
    <claim id="CLM-00015" num="00015">
      <claim-text>15. The system of <claim-ref idref="CLM-00013">claim 13</claim-ref>, wherein ...</claim-text>
    </claim>
  </claims>
</us-patent-grant>

When I execute the PHP simplexml_load_string function on the XML, the <claim-ref idref="CLM-00013">claim 13</claim-ref> part goes away and I'm left with the following for the claim text:

15. The system of , wherein ...

I tried executing the simplexml_load_string function as follows:

$xml = simplexml_load_string($xmlTxt, 'SimpleXMLElement', LIBXML_NOCDATA);

But it didn't change anything.
What do I need to do in order to get the text within the claim-ref tags to be retained as part of the CDATA within the claim-text tags? Please note that I don't need to retain the actual claim-ref tags, just the text within them.

  • 写回答

1条回答 默认 最新

  • drpkcwwav20524605 2016-05-20 15:55
    关注

    Here is no CDATA section in your example XML. A CDATA section looks like this in XML:

    <foo><![CDATA[<bar>text</bar>]]></foo>
    

    The CDATA section is a single text node in this case. It is compareable to:

    <foo>&lt;bar&gt;text&lt;/bar&gt;</foo>
    

    If you need the text content of a SimpleXMLElement (including it's descendants) you can convert it into a DOM node. The DOMElement::$textContent property provides it.

    $patentGrant = new SimpleXMLElement($xml);
    $node = dom_import_simplexml($patentGrant->claims->claim->{'claim-text'});
    
    var_dump($node->textContent);
    

    Output:

    string(39) "15. The system of claim 13, wherein ..."
    

    Or you use DOMXpath::evaluate(), without SimpleXML at all:

    $document = new DOMDocument();
    $document->loadXml($xml);
    $xpath = new DOMXpath($document);
    
    var_dump($xpath->evaluate('string(/us-patent-grant/claims/claim/claim-text)'));
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!