I'm performing a parse on an html
file with the following structure:
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Get_text 2</span>
<span class="emp-loc-part2 infLoc">Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Get_text 6</li>
<li class="txtArea emp-un-area">Get_text 7</li>
<li class="txtToilet emp-un-bath">Get_text 8</li>
<li class="txtCar emp-un-park">Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Get_text 10</a>
</div>
</div>
</div>
<div class="lstImv blackBd12">
<div class="stCl3 stLeft imvImg">
<div class="imgBox">
<a class="emp-imgs-link">
<span class="imgFrm frmBig frmLeft">
<img class="emp-img-principal">
</span>
<span class="imgFrm frmMd frmTop">
<img class="emp-img-logo">
</span>
<span class="imgFrm frmMd frmBot">
<img class="emp-img-foto">
</span>
</a>
</div>
<strong class="imvFse emp-fase">Other Get_text 1</strong>
</div>
<div class="imvInf stCl3 stRight">
<div class="infHd">
<div class="hdLeft stCl2">
<strong class="emp-nome infNme colorTxt"></strong>
<span class="emp-loc-part1 infLoc">Other Get_text 2</span>
<span class="emp-loc-part2 infLoc">Other Get_text 3</span>
</div>
<div class="hdRight stCl1">
<em class="emp-valor-apartir" >Other Get_text 4</em>
<strong class="emp-valor infVlr colorTxt">Other Get_text 5</strong>
</div>
</div>
<div class="infTxt">
<p class="blackTxt60 emp-descritivo"></p>
<ul>
<li class="txtBed emp-un-dorms">Other Get_text 6</li>
<li class="txtArea emp-un-area">Other Get_text 7</li>
<li class="txtToilet emp-un-bath">Other Get_text 8</li>
<li class="txtCar emp-un-park">Other Get_text 9</li>
</ul>
</div>
<div class="infBt">
<a href="/parceiro_cadastro" title="" class="btCadastrese stBt stBtLt colorBg whiteTxt rc9 sh15 emp-btn-cadastre">Other Get_text 10</a>
</div>
</div>
</div>
The following block:
<div class="lstImv blackBd12"></div>
It covers the other tags where the target textContents are, it repeats a few times (in the example, after editing, I've put only 2).
Then through this code:
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//div[@class="lstImv blackBd12"]');
foreach($content as $span)
{
echo "<pre>";
print_r($span);
echo "</pre>";
}
?>
I get 2 objects with their values:
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] =>
[nextSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Get_text 1
Get_text 2
Get_text 3
Get_text 4
Get_text 5
Get_text 6
Get_text 7
Get_text 8
Get_text 9
Get_text 10
)
DOMElement Object
(
[tagName] => div
[schemaTypeInfo] =>
[nodeName] => div
[nodeValue] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
[nodeType] => 1
[parentNode] => (object value omitted)
[childNodes] => (object value omitted)
[firstChild] => (object value omitted)
[lastChild] => (object value omitted)
[previousSibling] => (object value omitted)
[attributes] => (object value omitted)
[ownerDocument] => (object value omitted)
[namespaceURI] =>
[prefix] =>
[localName] => div
[baseURI] =>
[textContent] =>
Other Get_text 1
Other Get_text 2
Other Get_text 3
Other Get_text 4
Other Get_text 5
Other Get_text 6
Other Get_text 7
Other Get_text 8
Other Get_text 9
Other Get_text 10
)
So the way I'm doing:
<?php
$html = "exemplo_parse.html";
libxml_use_internal_errors(true);
$dom = new domDocument('1.0', 'utf-8');
$dom->loadHTMLFile($html);
$dom->preserveWhiteSpace = false;
$xpath = new DOMXPath($dom);
$content = $xpath->query('//strong[@class="imvFse emp-fase"]');
foreach($content as $span)
{
echo "Key 1 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//strong[@class="emp-nome infNme colorTxt"]');
foreach($content as $span)
{
echo "Key 2 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part1 infLoc"]');
foreach($content as $span)
{
echo "Key 3 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//span[@class="emp-loc-part2 infLoc"]');
foreach($content as $span)
{
echo "Key 4 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtBed emp-un-dorms"]');
foreach($content as $span)
{
echo "Key 5 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtArea emp-un-area"]');
foreach($content as $span)
{
echo "Key 6 : ".$span->textContent."<br/>";
}
$content = $xpath->query('//li[@class="txtCar emp-un-park"]');
foreach($content as $span)
{
echo "Key 7 : ".$span->textContent."<br/>";
}
?>
I get the data this way:
Key 1 : Get_text 1
Key 1 : Other Get_text 1
Key 2 :
Key 2 :
Key 3 : Get_text 2
Key 3 : Other Get_text 2
Key 4 : Get_text 3
Key 4 : Other Get_text 3
Key 5 : Get_text 6
Key 5 : Other Get_text 6
Key 6 : Get_text 7
Key 6 : Other Get_text 7
Key 7 : Get_text 9
Key 7 : Other Get_text 9
In other words, it is iterating over the keys, but I would like the keys to come in sequentially (K1, k2, ..., k7, k1, k2, ..., k7) and not in the form that is (k1, k1, k2, k2 ..., k7, k7).
sorry my bad english, I'll still be good...