douqiu1604 2014-06-04 14:43
浏览 57


I am trying to extract data from multi level structured XML file. The Input file will be

This is the search result of the query

Output of the query:

<?xml version="1.0" encoding="UTF-8"?>
        <MedlineCitation Status="Publisher" Owner="NLM">
            <PMID Version="1">24874852</PMID>
            <Article PubModel="Print-Electronic">
                    <ISSN IssnType="Electronic">1976-670X</ISSN>
                    <JournalIssue CitedMedium="Internet">
                    <Title>BMB reports</Title>
                    <ISOAbbreviation>BMB Rep</ISOAbbreviation>
                    Human selenium binding protein-1 (hSP56) is a negative regulator of HIF-1α and suppresses the malignant characteristics of prostate cancer cells.
                <ELocationID EIdType="pii">2831</ELocationID>
                    <AbstractText NlmCategory="UNLABELLED">
                        In the present study, we demonstrate that ectopic expression of 56-kDa human selenium binding protein-1 (hSP56) in PC-3 cells that do not normally express hSP56 results in a marked inhibition of cell growth in vitro and in vivo. Down-regulation of hSP56 in LNCaP cells that normally express hSP56 results in enhanced anchorage-independent growth. PC-3 cells expressing hSP56 exhibit a significant reduction of hypoxia inducible protein (HIF)-1α protein levels under hypoxic conditions without altering HIF-1α mRNA (HIF1A) levels. Taken together, our findings strongly suggest that hSP56 plays a critical role in prostate cells by mechanisms including negative regulation of HIF-1α, thus identifying hSP56 as a candidate anti-oncogene product.
                            Laboratory for Cell and Molecular Biology, Division of Hematology and Oncology, Beth Israel Deaconess Medical Center, Harvard Medical School, Boston, MA, USA; Department of Biochemistry and Cancer Research Institute, Kosin University College of Medicine, Busan, South Korea.
                        <ForeName>Arthur J</ForeName>
                    <PublicationType>JOURNAL ARTICLE</PublicationType>
                <ArticleDate DateType="Electronic">
                <MedlineTA>BMB Rep</MedlineTA>
                <PubMedPubDate PubStatus="entrez">
                <PubMedPubDate PubStatus="pubmed">
                <PubMedPubDate PubStatus="medline">
                <ArticleId IdType="pii">2831</ArticleId>
                <ArticleId IdType="pubmed">24874852</ArticleId>

My intention is to reorganise the data in another webpage. I am trying extract data from every layer of this structure. I am using regex. Eg, If I want to extract the abstract text from the xml structure, Here is the code I am using:

$efetch = "
#echo $efetch;
$handle1 = file_get_contents($efetch,"r");
#echo $handle1s;
foreach ($abstext[1] as $tiab){
echo $tiab; }`

I dont get the desired output that I expect. Any idea where it might have gone wrong?

  • 写回答

1条回答 默认 最新

  • dongyizhuang0134 2014-06-04 15:10

    If you are going to extract text from XML, the best option is to use an XML parser, such as a DOM parser:

    $document = new DOMDocument(); 
    $document->load( "" ); 

    From there you can use the XPath language to select the data you want to extract: //AbstractText will return a set of all <AbstractText> nodes.

    You can use XPath in PHP on your parsed document:

    $xpath = new DOMXpath($document);

    To get all nodes you use:


    And to extract the text from each node use nodeValue:

    foreach ($xpath->evaluate("//AbstractText") as $abstractText) {
        echo $abstractText->nodeValue."

    See a working example using your data here:

    本回答被题主选为最佳回答 , 对您是否有帮助呢?



  • ¥50 pointpillars等目标检测算法怎么融合注意力机制
  • ¥15 关于超局变量获取查询的问题
  • ¥20 Vs code Mac系统 PHP Debug调试环境配置
  • ¥60 大一项目课,微信小程序
  • ¥15 求视频摘要youtube和ovp数据集
  • ¥15 在启动roslaunch时出现如下问题
  • ¥15 汇编语言实现加减法计算器的功能
  • ¥20 关于多单片机模块化的一些问题
  • ¥30 seata使用出现报错,其他服务找不到seata
  • ¥35 引用csv数据文件(4列1800行),通过高斯-赛德尔法拟合曲线,在选取(每五十点取1点)数据,求该数据点的曲率中心。