I am pretty new to cURL and have only been using it for a short time.
My problem is that I want to get the content of a page (file_get_content()
doesn't work) by using cURL. Unfortunately, the site in question has bot protection, meaning it checks whether you are a bot or not when you first arrive at the site. If you are not a bot it will redirect you to the real site with an absolute path (I guess).
Whenever I load this site with cURL it appends the path to my server address.
For example:
My server has the address: http://examplepage.com/
cURL appends the redirected path to my URL. So it would be something like: http://examplepage.com/absolute/path?with=parameters
On the original page, where I try to get the content from, it works because they have a path like that but I do not (I want some html-content of theire site).
Here is my code so far:
<?php
/* getting site */
$website = "https://originalsite.com/?some=parameters";
$redirectURL;
function curl_download($url) {
//initialize curl handler
$c = curl_init();
// Include header in result? (0 = yes, 1 = no)
curl_setopt($c, CURLOPT_HEADER, 1);
//set url to download
curl_setopt($c, CURLOPT_URL, $url);
// follow redirection
curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);
//set referer
curl_setopt($c, CURLOPT_REFERER, "https://originalsite.com/");
// User agent
curl_setopt($c, CURLOPT_USERAGENT, "MozillaXYZ/1.0");
// Should cURL return or print out the data? (true = return, false = print)
curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);
// Timeout in seconds
curl_setopt($c, CURLOPT_TIMEOUT, 10);
// Download the given URL, and return output
$output = curl_exec($c);
// Close the cURL resource, and free system resources
curl_close($c);
return $output;
}
$content = curl_download($website);
echo $content;
?>
so it'll enter the site where it checks whether I am a bot or not and after that it redirects me to the site (or it least, it tries to).
I have searched the internet and StackOverflow but I couldn't find an answer to my problem.