duanli8577 2016-12-06 13:00
浏览 179

DOMXpath查询嵌套多个类

I'm performing a parse on an html file with the following structure:

<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                <li class="txtArea emp-un-area">Get_text 7</li>
                <li class="txtToilet emp-un-bath">Get_text 8</li>
                <li class="txtCar emp-un-park">Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
        </div>
    </div>
</div>
<div class="lstImv blackBd12">
    <div class="stCl3 stLeft imvImg">
        <div class="imgBox">            
            <a class="emp-imgs-link">
                <span class="imgFrm frmBig frmLeft">
                    <img class="emp-img-principal">
                </span>
                <span class="imgFrm frmMd frmTop">
                    <img class="emp-img-logo">
                </span>
                <span class="imgFrm frmMd frmBot">
                    <img class="emp-img-foto">
                </span>             
            </a>
        </div>
        <strong class="imvFse emp-fase">Other Get_text 1</strong>
    </div>
    <div class="imvInf stCl3 stRight">
        <div class="infHd">
            <div class="hdLeft stCl2">
                <strong class="emp-nome infNme colorTxt"></strong>
                <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
            </div>
            <div class="hdRight stCl1">
                <em class="emp-valor-apartir" >Other Get_text 4</em>
                <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
            </div>
        </div>
        <div class="infTxt">
            <p class="blackTxt60 emp-descritivo"></p>
            <ul>                
                <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                <li class="txtArea emp-un-area">Other Get_text 7</li>
                <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                <li class="txtCar emp-un-park">Other Get_text 9</li>
            </ul>
        </div>
        <div class="infBt">
            <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
        </div>
    </div>
</div>

The following block:

<div class="lstImv blackBd12"></div>

It covers the other tags where the target textContents are, it repeats a few times (in the example, after editing, I've put only 2).

Then through this code:

<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
    echo "<pre>";
        print_r($span);
    echo "</pre>";
}
?>

I get 2 objects with their values:

DOMElement Object
(
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 











        Get_text 1





                Get_text 2
                Get_text 3


                Get_text 4
                Get_text 5




            Get_text 6                                  
                Get_text 7
                Get_text 8
                Get_text 9


            Get_text 10



    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => 
    [nextSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 











        Get_text 1





                Get_text 2
                Get_text 3


                Get_text 4
                Get_text 5




            Get_text 6                                  
                Get_text 7
                Get_text 8
                Get_text 9


            Get_text 10



)
DOMElement Object
(
    [tagName] => div
    [schemaTypeInfo] => 
    [nodeName] => div
    [nodeValue] => 











        Other Get_text 1





                Other Get_text 2
                Other Get_text 3


                Other Get_text 4
                Other Get_text 5




            Other Get_text 6                                
                Other Get_text 7
                Other Get_text 8
                Other Get_text 9


            Other Get_text 10



    [nodeType] => 1
    [parentNode] => (object value omitted)
    [childNodes] => (object value omitted)
    [firstChild] => (object value omitted)
    [lastChild] => (object value omitted)
    [previousSibling] => (object value omitted)
    [attributes] => (object value omitted)
    [ownerDocument] => (object value omitted)
    [namespaceURI] => 
    [prefix] => 
    [localName] => div
    [baseURI] => 
    [textContent] => 











        Other Get_text 1





                Other Get_text 2
                Other Get_text 3


                Other Get_text 4
                Other Get_text 5




            Other Get_text 6                                
                Other Get_text 7
                Other Get_text 8
                Other Get_text 9


            Other Get_text 10



)

So the way I'm doing:

<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8'); 
$dom->loadHTMLFile($html); 
$dom->preserveWhiteSpace = false; 
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
    echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
    echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
    echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
    echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
    echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
    echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
    echo "Key 7 : ".$span->textContent."<br/>";
}
?>

I get the data this way:

Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 : 
Key 2 : 
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9

In other words, it is iterating over the keys, but I would like the keys to come in sequentially (K1, k2, ..., k7, k1, k2, ..., k7) and not in the form that is (k1, k1, k2, k2 ..., k7, k7).

sorry my bad english, I'll still be good...

  • 写回答

1条回答 默认 最新

  • drba1172 2016-12-06 22:26
    关注

    Here's the solution I got:

    <?php
    $html = <<<HTML
    <div class="lstImv blackBd12">
        <div class="stCl3 stLeft imvImg">
            <div class="imgBox">            
                <a class="emp-imgs-link">
                    <span class="imgFrm frmBig frmLeft">
                        <img class="emp-img-principal">
                    </span>
                    <span class="imgFrm frmMd frmTop">
                        <img class="emp-img-logo">
                    </span>
                    <span class="imgFrm frmMd frmBot">
                        <img class="emp-img-foto">
                    </span>             
                </a>
            </div>
            <strong class="imvFse emp-fase">Get_text 1</strong>
        </div>
        <div class="imvInf stCl3 stRight">
            <div class="infHd">
                <div class="hdLeft stCl2">
                    <strong class="emp-nome infNme colorTxt"></strong>
                    <span class="emp-loc-part1 infLoc">Get_text 2</span>
                    <span class="emp-loc-part2 infLoc">Get_text 3</span>
                </div>
                <div class="hdRight stCl1">
                    <em class="emp-valor-apartir" >Get_text 4</em>
                    <strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
                </div>
            </div>
            <div class="infTxt">
                <p class="blackTxt60 emp-descritivo"></p>
                <ul>                
                    <li class="txtBed emp-un-dorms">Get_text 6</li>                                 
                    <li class="txtArea emp-un-area">Get_text 7</li>
                    <li class="txtToilet emp-un-bath">Get_text 8</li>
                    <li class="txtCar emp-un-park">Get_text 9</li>
                </ul>
            </div>
            <div class="infBt">
                <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
            </div>
        </div>
    </div>
    <div class="lstImv blackBd12">
        <div class="stCl3 stLeft imvImg">
            <div class="imgBox">            
                <a class="emp-imgs-link">
                    <span class="imgFrm frmBig frmLeft">
                        <img class="emp-img-principal">
                    </span>
                    <span class="imgFrm frmMd frmTop">
                        <img class="emp-img-logo">
                    </span>
                    <span class="imgFrm frmMd frmBot">
                        <img class="emp-img-foto">
                    </span>             
                </a>
            </div>
            <strong class="imvFse emp-fase">Other Get_text 1</strong>
        </div>
        <div class="imvInf stCl3 stRight">
            <div class="infHd">
                <div class="hdLeft stCl2">
                    <strong class="emp-nome infNme colorTxt"></strong>
                    <span class="emp-loc-part1 infLoc">Other Get_text 2</span>
                    <span class="emp-loc-part2 infLoc">Other Get_text 3</span>
                </div>
                <div class="hdRight stCl1">
                    <em class="emp-valor-apartir" >Other Get_text 4</em>
                    <strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
                </div>
            </div>
            <div class="infTxt">
                <p class="blackTxt60 emp-descritivo"></p>
                <ul>                
                    <li class="txtBed emp-un-dorms">Other Get_text 6</li>                                   
                    <li class="txtArea emp-un-area">Other Get_text 7</li>
                    <li class="txtToilet emp-un-bath">Other Get_text 8</li>
                    <li class="txtCar emp-un-park">Other Get_text 9</li>
                </ul>
            </div>
            <div class="infBt">
                <a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
            </div>
        </div>
    </div>
    HTML;
    
    $dom = new domDocument('1.0', 'utf-8'); 
    $dom->loadHTML($html); 
    $dom->preserveWhiteSpace = false; 
    $xpath = new DOMXPath($dom);
    
    
    $items = $xpath->query('//div[@class="lstImv blackBd12"]');
    for($i = 0; $i < $items->length; $i++)
    {
        $status = $xpath->query('//strong[@class="imvFse emp-fase"]');
        echo "Value     :".$status->item($i)->nodeValue."<br/>";    
    
        $titulo = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
        echo "Value     :".$titulo->item($i)->nodeValue."<br/>";
    
        $titulo2 = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
        echo "Value     :".$titulo2->item($i)->nodeValue."<br/>";   
    
        $valor = $xpath->query('//em[@class="emp-valor-apartir"]');
        echo "Value     :".$valor->item($i)->nodeValue."<br/>"; 
    
        $valor2 = $xpath->query('//strong[@class="emp-valor infVlr colorTxt"]');
        echo "Value     :".$valor2->item($i)->nodeValue."<br/>";
    
        $dorm = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
        echo "Value     :".$dorm->item($i)->nodeValue."<br/>";
    
        $tam = $xpath->query('//li[@class="txtArea emp-un-area"]');
        echo "Value     :".$tam->item($i)->nodeValue."<br/>";   
    
    }
    ?>
    

    See on ideone

    评论

报告相同问题?

悬赏问题

  • ¥15 如何实验stm32主通道和互补通道独立输出
  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题