dth42345 2016-04-21 11:09
浏览 75
已采纳

什么相当于SCRAPY中的CURL

I want to scrape a website by SCRAPY with AJAX PAGINATION, i scraped this web site by PHP by using CURL, i monitored the network by Firebug, with firebug we have a option "Copy for CURL" for POST REQUEST. My question is how can i do the same for SCRAPY.

my function in PHP:

   function forCurl($url,$refer, $jsessionid){
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_URL, $url);
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_USERAGENT, 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:34.0) Gecko/20100101 Firefox/34.0');
    $header[0] = "Accept: text/xml,application/xml,application/xhtml+xml,";
    $header[0] .= "text/html;q=0.9,text/plain;q=0.8,image/png,*/*;q=0.5";
    $header[] = "Cache-Control: no-cache' --data 't%3Azoneid=forceAjax";
    $header[] = "Connection: keep-alive";
    $header[] = "Accept-Language: fr,fr-fr;q=0.8,en-us;q=0.5,en;q=0.3";
    $header[] = "Pragma: no-cache";
      $header[] = "X-Requested-With: XMLHttpRequest";

  $header[] = "Keep-Alive: 700";
  $cookie = "JSESSIONID=" . $jsessionid. '; langueFront=fr; tc_cj_v2=%5Ecl_%5Dny%5B%5D%5D_mmZZZZZZKNLLMQOMROKJRZZZ%5D777_rn_lh%5BfyfcheZZZ%7B%7E%28%24%29H/*+%7E-%241%20H%21-ZZZKNLLMQOSQMMRNZZZ%5D777%5Ecl_%5Dny%5B%5D%5D_mmZZZZZZKNLLNNJJKNRRMZZZ%5D777_rn_lh%5BfyfcheZZZ%7B%7E%28%24%29H/*+%7E-%241%20H%21-ZZZKNLLNNKNJOJSKZZZ%5D777%5Ecl_%5Dny%5B%5D%5D_mmZZZZZZKNLLNNMLSNSKLZZZ%5D777_rn_lh%5BfyfcheZZZ222H%7B0%7D%23%7B%29H%21-ZZZKNLLNNMMLMJNJZZZ%5D777%5Ecl_%5Dny%5B%5D%5D_mmZZZZZZKNLLNOOJSKRKMZZZ%5D777_rn_lh%5BfyfcheZZZ%7B%7E%28%24%29H/*+%7E-%241%20H%21-ZZZKNLLNOOLSOMPNZZZ%5D777%5Ecl_%5Dny%5B%5D%5D_mmZZZZZZKNLLNOPJMROQLZZZ%5D777_rn_lh%5BfyfcheZZZ%7B%7E%28%24%29H/*+%7E-%241%20H%21-ZZZKNLLNOPMQSKNOZZZ%5D; _ga=GA1.2.487921595.1421941922; aurol=GA1.2.865695137.1421941922; __utma=239562643.487921595.1421941922.1422452658.1422454606.14; __utmz=239562643.1422443324.10.2.utmcsr=Sphere_myWebSite|utmccn=myWebSitefr_logo|utmcmd=Interne; kameleoonVisitIdentifier=rj1hnzh5ux1n2gxr/4; myWebSiteCook=\"869|\"; revelationDriveWin=2; myWebSite.hamon=1; __utmv=239562643.|1=visite_myWebSitedrive=239562643.487921595.1421941922.1422452658.1422454606.14=1; tosend=%7B%22p%22%3A%7B%22tracker%22%3A%22myWebSitedrive%22%2C%20%22url%22%3A%22rayon%22%2C%20%22mtime%22%3A1422455760000%2C%20%22ref%22%3A%22http%3A%2F%2Fwww.myWebSitedrive.fr%2Fdrive%2Frecherche%2Fbio%22%2C%20%22dest%22%3A%22http%3A%2F%2Fwww.myWebSitedrive.fr%2Fdrive%2FNice-Cote-dAzur-869%2FSurgeles-R41355%2FViandes-Volailles-41478%2F%22%7D%2C%22d%22%3A%7B%22dv%22%3A%22NA%22%7D%2C%20%22t%22%3A%7B%22iplobserverstart%22%3A%221422455762613%22%2C%22jsinit%22%3A%221422455763871%22%2C%22domload%22%3A%221422455764728%22%2C%22clicklink%22%3A%221422455817128%22%2C%22unload%22%3A%221422455817521%22%7D%7D; kameleoonExperiment-14570=86018/1422452656881/false; __utmc=239562643; rdmvalidation=1; layerDrivePromos=2; __utmb=239562643.19.10.1422454606; _gat=1; _gat_myWebSiteRollup=1; __utmt=1; __utmt_secondTracker=1; __utmli=toPage_14b30fac8d4_0';
  curl_setopt($ch, CURLOPT_HTTPHEADER, $header);
  curl_setopt($ch, CURLOPT_REFERER, $refer);
  curl_setopt($ch, CURLOPT_COOKIE, $cookie);
  $content = curl_exec($ch);
  curl_close($ch);
  return $content ;

i want to know how can i post the same parametres with SCRAPY, is that a good idea for scraping a website with ajax pagination?

i tried this:

yield Request(sousUrl, headers={'Referer':'%s' % url},  callback=self.parse_page)
  • 写回答

1条回答 默认 最新

  • dsf23223 2016-05-04 11:07
    关注

    In Python you can use PyCurl

    PycURL is a Python interface to libcurl.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥18 深度学习tensorflow1,ssdv1,coco数据集训练一个模型
  • ¥100 关于注册表摄像头和麦克风的问题
  • ¥30 代码本地运行正常,但是TOMCAT部署时闪退
  • ¥15 关于#python#的问题
  • ¥15 主机可以ping通路由器但是连不上网怎么办
  • ¥15 数据库一张以时间排好序的表中,找出多次相邻的那些行
  • ¥50 关于DynamoRIO处理多线程程序时候的问题
  • ¥15 kubeadm部署k8s出错
  • ¥15 Abaqus打不开cae文件怎么办?
  • ¥15 小程序准备上线,软件开发公司需要提供哪些资料给甲方