duanbarong4617 2011-02-24 19:15
浏览 54

刮书价格

I'm trying to write a scrape app, and I'm running in to problems. My PHP Curl code isn't pulling up the pages with the price of the books. It's returning me to the web root of the domain.

I'm trying to search the site by ISBN.

I've been bashing my head against the wall for days. Any help will be most appreciated!

Code:

<form method="post" for="new-search" name="SearchTerm" class='form-validate' id="SearchTerm" action="index.php">
    <textarea rows="3" name="SearchTerm" id="SearchTerm" cols="40" class="validate-required error"></textarea><div class="error" id="SearchTerm-error">
    <br>                        
    <button class="search primary" type="submit">continue</button>

</form>


<?php

/*
echo("<pre>");print_r($_GET);echo("</pre>");
echo("<pre>");print_r($_POST);echo("</pre>");
*/

$isbn = $_POST['SearchTerm'];


$userAgent = 'User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US;rv:1.8.1.16) Gecko/20080702 Firefox/2.0.0.16';

$fields = array(
    'url' => ("http://www.bookleberry.com/Search/SearchKeyword"),
    'qurl' => ("http://www.bookleberry.com/Search/SearchKeyword/" . $_POST['SearchTerm']),
    'SearchTerm' => ($_POST['SearchTerm']),
    'Page' => ('1'),
    'class' => ('textfield validate-required'),
    'for' => ('new-search'),
    'result-count' => ('1'),
    'status' => 'success',
);

$SearchTerm = ($fields['SearchTerm']);
$url = ($fields['url']);
$Page = ($fields['Page']);


echo("<pre>");
print_r($fields);
echo("</pre>");

if ($isbn != NULL){

    //open connection
    $ch = curl_init($url);
    //set the url, number of POST vars, POST data
    curl_setopt($ch, CURLOPT_HEADER, $userAgent);
    curl_setopt($ch, CURLOPT_URL, $url);
        echo "before curl_exec:<br>";
        echo "curl_errno=". curl_errno($ch) ."<br>";
        echo "curl_error=". curl_error($ch) ."<br>";
    curl_setopt($ch,CURLOPT_POST,count($fields));
    curl_setopt($ch, CURLOPT_POST, 1);
    curl_setopt($ch, CURLOPT_POSTFIELDS, "?SearchTerm=$SearchTerm");
    curl_setopt($ch, CURLOPT_HTTPGET, 1);
    curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
    curl_setopt($ch, CURLOPT_TIMEOUT, 9999999);
     curl_setopt($ch,CURLOPT_HTTPHEADER,array (
        "Accept: application/json"
    ));




    $info = curl_getinfo($ch);

    //execute post
    $result = curl_exec($ch);
    print $result;


print "<pre>
";
print_r(curl_getinfo($ch));  // get error info

?>
  • 写回答

1条回答 默认 最新

  • dongzhang7961 2011-02-24 19:21
    关注

    Don't hurt your head, use it!

    • Install fiddler.
    • Do a request using the browser, look in fiddler to exactly what is posted. This includes all headers, cookies and form variables.
    • Do a post using your code, examine fiddler again
    • Compare the differences between the two and adjust your script.
    • Repeat.

    Also it helps to install firebug. Using the copy Xpath, and putting that into a php DOM xpath query makes scraping fun and easy!

    评论

报告相同问题?

悬赏问题

  • ¥15 用windows做服务的同志有吗
  • ¥60 求一个简单的网页(标签-安全|关键词-上传)
  • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图