dougou2937 2017-02-13 14:21
浏览 95
已采纳

curl将重定向的url放入浏览器的地址

I am pretty new to cURL and have only been using it for a short time. My problem is that I want to get the content of a page (file_get_content() doesn't work) by using cURL. Unfortunately, the site in question has bot protection, meaning it checks whether you are a bot or not when you first arrive at the site. If you are not a bot it will redirect you to the real site with an absolute path (I guess). Whenever I load this site with cURL it appends the path to my server address.

For example: My server has the address: http://examplepage.com/ cURL appends the redirected path to my URL. So it would be something like: http://examplepage.com/absolute/path?with=parameters

On the original page, where I try to get the content from, it works because they have a path like that but I do not (I want some html-content of theire site).

Here is my code so far:

    <?php

  /* getting site */
  $website = "https://originalsite.com/?some=parameters";
  $redirectURL;

  function curl_download($url) {
    //initialize curl handler
    $c = curl_init();

    // Include header in result? (0 = yes, 1 = no)
    curl_setopt($c, CURLOPT_HEADER, 1);

    //set url to download
    curl_setopt($c, CURLOPT_URL, $url);

    // follow redirection
    curl_setopt($c, CURLOPT_FOLLOWLOCATION, 1);

    //set referer
    curl_setopt($c, CURLOPT_REFERER, "https://originalsite.com/");

    // User agent
    curl_setopt($c, CURLOPT_USERAGENT, "MozillaXYZ/1.0");

    // Should cURL return or print out the data? (true = return, false = print)
    curl_setopt($c, CURLOPT_RETURNTRANSFER, 1);

    // Timeout in seconds
    curl_setopt($c, CURLOPT_TIMEOUT, 10);

    // Download the given URL, and return output
    $output = curl_exec($c);

    // Close the cURL resource, and free system resources
    curl_close($c);

    return $output;
  }

  $content = curl_download($website);

  echo $content;

?>

so it'll enter the site where it checks whether I am a bot or not and after that it redirects me to the site (or it least, it tries to).

I have searched the internet and StackOverflow but I couldn't find an answer to my problem.

  • 写回答

1条回答 默认 最新

  • drvxclagw656708070 2017-02-13 14:43
    关注

    What's happening is that there is some JavaScript code issuing a redirect once you render the page. Try disabling JavaScript in your browser for a quick test.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题