dougou2937 2017-02-13 14:21
浏览 95
已采纳

curl将重定向的url放入浏览器的地址

I am pretty new to cURL and have only been using it for a short time. My problem is that I want to get the content of a page (file_get_content() doesn't work) by using cURL. Unfortunately, the site in question has bot protection, meaning it checks whether you are a bot or not when you first arrive at the site. If you are not a bot it will redirect you to the real site with an absolute path (I guess). Whenever I load this site with cURL it appends the path to my server address.

For example: My server has the address: http://examplepage.com/ cURL appends the redirected path to my URL. So it would be something like: http://examplepage.com/absolute/path?with=parameters

On the original page, where I try to get the content from, it works because they have a path like that but I do not (I want some html-content of theire site).

Here is my code so far:

    <?php

  /* getting site */
  $website = "https://originalsite.com/?some=parameters";
  $redirectURL;

  function curl_download($url) {
    //initialize curl handler
    $c = curl_init();

    // Include header in result? (0 = yes, 1 = no)
    curl_setopt($c, CURLOPT_HEADER, 1);

    //set url to download
    curl_setopt($c, CURLOPT_URL, $url);

    // follow redirection
    curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);

    //set referer
    curl_setopt($c, CURLOPT_REFERER, "https://originalsite.com/");

    // User agent
    curl_setopt($c, CURLOPT_USERAGENT, "MozillaXYZ/1.0");

    // Should cURL return or print out the data? (true = return, false = print)
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);

    // Timeout in seconds
    curl_setopt($c, CURLOPT_TIMEOUT, 10);

    // Download the given URL, and return output
    $output = curl_exec($c);

    // Close the cURL resource, and free system resources
    curl_close($c);

    return $output;
  }

  $content = curl_download($website);

  echo $content;

?>

so it'll enter the site where it checks whether I am a bot or not and after that it redirects me to the site (or it least, it tries to).

I have searched the internet and StackOverflow but I couldn't find an answer to my problem.

  • 写回答

1条回答 默认 最新

  • drvxclagw656708070 2017-02-13 14:43
    关注

    What's happening is that there is some JavaScript code issuing a redirect once you render the page. Try disabling JavaScript in your browser for a quick test.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化