code for scraping from a site :
<?php
require_once("db.php");
$url = 'http://www.indianrail.gov.in/mail_express_trn_list.html';
$ch = curl_init($url);
set_time_limit(600);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
$result = curl_exec($ch);
//echo $result;
error_reporting(E_ERROR | E_PARSE);
$dom = new DOMDocument('1.0', 'UTF-8');
$dom->loadHTML($result);
if($dom)
{
$xpath = new DOMXPath($dom);
$q = '///*[contains(concat(" ", normalize-space(@class), " "), " table_border ")]/tr';
$nodes = $xpath->query($q);
foreach($nodes as $tr){ // DOMNodesList implements traversable
echo "<br>";
$tds = $tr->childNodes;
$i = 0;
foreach($tds as $td){
$arr[$i]=$td->nodeValue;
$i++;
}
var_dump($arr);
//echo"<br><=====><br>";
//echo ""
mysqli_query($con,"INSERT INTO `irl`.`trains` (`TrainNo`, `TrainName`, `Origin`, `DepartureTime`, `Destination`, `ArrivalTime`)
VALUES ('$arr[0]', '$arr[1]', '$arr[2]', '$arr[3]', '$arr[4]', '$arr[5]');") or die(mysqli_error($con));
}
}
else
echo "invalid DOMDocument <br>";
this inserts '0's in the first position(TrainNo.) in the database, even though arr[0] contains the right values(I checked the var_dump($arr) ), what is going on ? arr[] is populated with the right values in each iteration., database field is int with size 8
changing field to varchar does fix it, but if datatyoe is the problem , why does the first row(train number 2696) get inserted correctly
example:
arr[0]=> 09705
arr[1]=>JP DEE AC EXP
arr[2]=>JAIPUR
arr[3]=>07:55
arr[4]=>DELHI S ROHILLA
arr[5]=>13:20
Ignore the first row, it contains the table headers, so it is understandble that it contains 0:0 for time, and 0 for train number., but the rows below it should be fine, in the highlighted row first position should have been 9705, but it holds 0 :
EDIT: including screenshots for convenience :
screenshots of var_dump of $arr in loop, followed by db rows in phpmyadmin :