I'm crawling a few websites, everything it's working fine,
but .... I have one specific website that I'm trying to crawl, and it's making a few "redirects" before landing to the web I want.
So it's something like ...
http://www.example.com/?day=01/01/2016&action=search_prices
this will go to http://www.example.com/search/default.aspx take a few seconds to search the answer page and then show it on there.
Is there any way to easily do this? any hint, clue, etc would be awesome
Simple code right now (almost all the sites I was crawling were jsons):
function get_web_page( $url ){
$options = array(
CURLOPT_RETURNTRANSFER => true, // return web page
CURLOPT_HEADER => false, // don't return headers
CURLOPT_FOLLOWLOCATION => true, // follow redirects
CURLOPT_ENCODING => "", // handle all encodings
CURLOPT_USERAGENT => "spider", // who am i
CURLOPT_AUTOREFERER => true, // set referer on redirect
CURLOPT_CONNECTTIMEOUT => 120, // timeout on connect
CURLOPT_HTTPHEADER => array('HeaderName: HeaderValue'),
CURLOPT_TIMEOUT => 120, // timeout on response
CURLOPT_MAXREDIRS => 10, // stop after 10 redirects
CURLOPT_SSL_VERIFYPEER => false // Disabled SSL Cert checks
);
$ch = curl_init( $url );
curl_setopt_array( $ch, $options );
$content = curl_exec( $ch );
$err = curl_errno( $ch );
$errmsg = curl_error( $ch );
$header = curl_getinfo( $ch );
curl_close( $ch );
$header['errno'] = $err;
$header['errmsg'] = $errmsg;
$header['content'] = $content;
return $header;
}