I really tried to find an answer in each question here and on google and I couldn't find any.
My php code just stop running in the middle of the code and at different times every time I run the code. I don't think it's a problem with CURL function, because the code stops sometimes before or after the CURL function call. I don't think It's a code error because the code runs correctly while it's running. I guess it's a 'timeout' problem with the shared hosting.
My code basically do a "web scrap" with simple_html_dom library and curl functions. I'm running it on a shared web hosting (hostgator) and I tried to run it through CRON JOB also, and it didn't work too.
I've already set the variables at the beginning of the code (and also changed the variables on PHP.INI) and didn't work:
ignore_user_abort(true);
set_time_limit(0);
ini_set('max_execution_time', 0);
ini_set('memory_limit',-1);
The full code (I've shortened a bit, in the original code I put some different dates and call 'scrap' function many times):
require('simple_html_dom.php');
//get today's date
$today = date('Y-m-d');
if (date('H') < '9') {
$date_period = "today";
$date_period_date = date('Y-m-d');
$puDay = date('j');
$puMonth = date('n');
$puYear = date('Y');
$doDay = date('j', strtotime(' + 1 days'));
$doMonth = date('n', strtotime(' + 1 days'));
$doYear = date('Y', strtotime(' + 1 days'));
scrap($puDay,$puMonth,$puYear,$doDay,$doMonth,$doYear,$date_period, $today, $date_period_date, $location_id,$location,$city);
unset($date_period,$date_period_date,$puDay,$puMonth,$puYear,$doDay,$doMonth,$doYear);
}
//functions
function scrap($puDay_aux, $puMonth_aux, $puYear_aux, $doDay_aux, $doMonth_aux, $doYear_aux, $period_id_aux, $curDate_aux, $periodDate_aux, $location_id_aux,$location_aux,$city_aux){
$bad_proxy = "";
$check = 1;
do{
$link = "my link";
$best_proxy = get_best_proxy($link, $bad_proxy);
$scraped_page = curl($link, $best_proxy);
$html = new simple_html_dom();
$html->load($scraped_page);
$check_end = strpos($html,'</html>');
if(!empty($html)) {
if ($check_end===FALSE) {
$check = $check + 1;
$bad_proxy = $best_proxy;
} else {
foreach($html->find('table[class=ResultRow]') as $element)
{
$supplier = $element->find('h4',0);
unset($supplier,$supplier_aux,$car,$car_aux,$price,$price_aux,$priceBRL);
}
$html->clear();
unset($link,$html,$best_proxy,$stream,$context);
$check = 5;
}
} else {
$check = $check + 1;
}
} while ($check<5);
}
function get_best_proxy($link, $bad_proxy){
$proxy_array = array(
'177.184.144.130:8080',
'177.6.147.202:8080',
'187.44.1.167:8080',
'170.82.228.42:8080',
'177.72.1.102:8080',
'138.185.101.20:8080',
'187.102.149.178:8080',
'177.32.12.127:8080',
'189.38.3.9:8080',
'138.185.101.21:8080'
);
$i=0;
foreach ($proxy_array as $key){
if ($key != $bad_proxy) {
$proxy_speed = proxy_speed($key, $link);
$proxy_speed_result[$i] = $proxy_speed;
if ($proxy_speed<9999999){break;}
$i++;
}
}
$min = array_keys($proxy_speed_result, min($proxy_speed_result));
$min_aux = $min[0];
$proxy_output = $proxy_array[$min_aux];
return($proxy_output);
}
function proxy_speed($proxy, $link) {
$link = "my link here";
$loadingtime = time();
$theHeader = curl_init($link);
curl_setopt($theHeader, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($theHeader, CURLOPT_FOLLOWLOCATION, 1);
curl_setopt($theHeader, CURLOPT_AUTOREFERER, 1);
curl_setopt($theHeader, CURLOPT_MAXREDIRS, 10);
curl_setopt($theHeader, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8");
curl_setopt($theHeader, CURLOPT_HTTPPROXYTUNNEL, 1);
curl_setopt($theHeader, CURLOPT_SSL_VERIFYPEER, 0);
curl_setopt($theHeader, CURLOPT_SSL_VERIFYHOST, 0);
curl_setopt($theHeader, CURLOPT_CONNECTTIMEOUT, 10);
curl_setopt($theHeader, CURLOPT_TIMEOUT, 10);
curl_setopt($theHeader, CURLOPT_PROXY, $proxy);
$curlResponse = curl_exec($theHeader);
if ($curlResponse === false)
{
return 9999999;
}
else
{
return (time() - $loadingtime);
}
}
function curl($url, $proxy) {
$options = Array(
CURLOPT_RETURNTRANSFER => TRUE, // Setting cURL's option to return the webpage data
CURLOPT_FOLLOWLOCATION => TRUE, // Setting cURL to follow 'location' HTTP headers
CURLOPT_AUTOREFERER => TRUE, // Automatically set the referer where following 'location' HTTP headers
CURLOPT_CONNECTTIMEOUT => 300, // Setting the amount of time (in seconds) before the request times out
CURLOPT_TIMEOUT => 300, // Setting the maximum amount of time for cURL to execute queries
CURLOPT_MAXREDIRS => 10, // Setting the maximum number of redirections to follow
CURLOPT_USERAGENT => "Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1a2pre) Gecko/2008073000 Shredder/3.0a2pre ThunderBrowse/3.2.1.8", // Setting the useragent
CURLOPT_URL => $url, // Setting cURL's URL option with the $url variable passed into the function
CURLOPT_HTTPPROXYTUNNEL => 1,
CURLOPT_SSL_VERIFYPEER => false,
CURLOPT_SSL_VERIFYHOST => false,
CURLOPT_PROXY => $proxy
);
$ch = curl_init(); // Initialising cURL
$httpCode = curl_getinfo($ch , CURLINFO_HTTP_CODE);
curl_setopt_array($ch, $options); // Setting cURL's options using the previously assigned array data in $options
$data = curl_exec($ch); // Executing the cURL request and assigning the returned data to the $data variable
if ($data === false) $data = curl_error($ch);
return stripslashes($data);
curl_close($ch);
}
Does anyone know what's happening here? Does my web hosting timing out? Thanks!