I'm trying to extract/scrap Zip Links and corresponding Date from the below Link's Release tab:
I am able to extract Zip links using the below php code:
preg_match_all('/<ul class=\"rpRootGroup\">(.*?)<\/ul/s',$specpage,$zipul);
$specul = new domDocument;
@$specul->loadHTML($zipul[0][0]);
$specul->preserveWhiteSpace = true;
$xpathspecul = new DOMXPath($specul);
$rowsUL = $xpathspecul->query('//tr');
$resultul = array();
$zipf = array();
$zipuni = array();
foreach ($rowsUL as $rowul) {
$colsul = $rowul->getElementsByTagName('td');
foreach ($colsul as $colul) {
if($xpathspecul->evaluate('count(.//a)', $colul) > 0) { // check if an anchor exists
$slinkul = $xpathspecul->evaluate('string(.//a/@href)', $colul); // if there is, then echo the href value
}
if (isset($slinkul) && $slinkul!=null){
$resultul[] = $slinkul;
}
}
}
foreach ($resultul as $ziplink){
$chkzip = pathinfo($ziplink, PATHINFO_EXTENSION);
if ($chkzip == 'zip' && $ziplink!==null){
$zipf[] = trim($ziplink);
}
}
$zipuni = array_values (array_unique($zipf));
$specpage contains the website loaded using curl
Sample image of aforementioned Zip link and Date
However, I am not able to extract Corresponding Dates.
Further, i am having problem with using 'array_unique' as there can be same Zip link but with different corresponding date. However, without 'array_unique' im getting a lot of multiple links.
Any help is appreciated.