I been trying to learn how to use Xpath type of querys from this video: https://www.youtube.com/watch?v=632ql93H90g
While I have started to slightly understand everything I wanted to take it a bit further and try a nested looping extracting code to pull out nested elements and then categorize them. I been just using craigslist as an example because they started it in the video and have this listed under their "sites" webpage.
I've had to rewrite this because before it had an infinite loop. Now if ANYONE knows a better way of writing this I would love the input, but this is what I have.
All I been trying to do is get my results into the following format....
Country - State - CityNameTEXT - CityNameHREF
of course cityNameHref = thelink to the city.
Now right now I just have it print_r the results of the inner that has the actual city's listed since the format from craigslist is..
<h1>CountryName</h1>
<div class="colmask">
<div>
<h4>StateName</h4>
<ul>
<li>
<a href="CityNameHREF">CityName</a>
</li>
<li>
<a href="CityNameHREF">CityName</a>
</li>
<li>
<a href="CityNameHREF">CityName</a>
</li>
<li>
<a href="CityNameHREF">CityName</a>
</li>
</ul>
</div>
</div>
As you can see its nested very complicated inside. I been trying literally for 12 hours to try and get this to work. This is the closest i've gotten where it will display the UL nodeValues being the actual city names. But I have NO CLUE how to get these citys to display correctly in the format I listed above.
Now on to the code I have...
$url = 'http://www.craigslist.org/about/sites';
$output = file_get_contents($url);
$doc = new DOMDocument();
libxml_use_internal_errors(true); //Supress Warnings for HTML5 conversion issue
$doc->loadHTML($output);
libxml_use_internal_errors(false); //Start Showing Errors
$xpath = new DOMXpath($doc);
foreach ($xpath->query('//h1') as $e)
{
$country = $e->nodeValue;
$list = array();
foreach ($xpath->query('//div[@class="colmask"]/div', $e) as $li)
{
$state = $li->nodeValue;
echo "<pre>";
$result = $xpath->query('//div[@class="colmask"]/div/ul', $e);
for ($i = 0; $i <= 10; $i++) //10 instead so it doesn't lag out
{
print_r($result->item($i)); //Displays the UL nodeValue
}
}
}