dozoqn3347 2013-08-27 07:59
浏览 32

如何在没有Amazon API的情况下从Amazon Url中提取价格

I'm trying to load html file from a Amazon URL to extract the product price using a simple php function on Yii. I started to get the entire file with php function file_get_contents, and than extract only the price from my html file with DOM.

I'm using a DOM parser to read the HTML file. It has convenient functions to read the tags of a html file. This is the parser:

http://simplehtmldom.sourceforge.net/

The URL that php analyze can be of amazon.com, amazon.co.uk, amazon.it, etc. In the future this feature will be used also to analyze other url different from Amazon.

I created a simple function, that from a URL, extract the price, here it is:

public function findAmazonPriceFromUrl($url) {
    Yii::import('ext.HtmlDOMParser.*');
    require_once('simple_html_dom.php');

    $html = file_get_html($url);
    $item = $html->getElementsById('actualPriceValue');
    if ($item) {
        $price = $item[0]->firstChild()->innertext;
    } else {
        $item = $html->getElementsById('current-price');
        $price = $item[0]->innertext;
    }
    return $price;
}

The file_get_html function is the following:

function file_get_html($url) {
    $dom = new simple_html_dom();
    $contents = file_get_contents($url);
    if (empty($contents) || strlen($contents) > MAX_FILE_SIZE) {
        return false;
    }
$dom->load($contents);
return $dom;

}

I noticed that after a few request (various links), I always get an error from the server (Error 500). I checked my apache log file, but everything is good.

Amazon could block my requests after certain time? How can i fix it?

Thanks in advance for the help

  • 写回答

1条回答 默认 最新

  • dongxing7083 2015-06-03 19:59
    关注

    I had same problem and this is my fix: I run script again if image is not parsed. image is parsed first in my php script so I check if it works and amazon gives information. I hope it helps.

    if($html->find('#main-image')) {    
       foreach($html->find('#main-image') as $e) {
          echo '<span href="'. $e->src . '" class="imgblock parseimg">
                   <img src="'. $e->src . '" class="resultimg" alt="'.$name.'" title="'.$name.'">
                </span>
                <input type="hidden" name="my-item-img" value="'. $e->src . '" />';
       }
    } else {
       gethtml($url,$domain);
       die;
    }
    
    评论

报告相同问题?

悬赏问题

  • ¥50 potsgresql15备份问题
  • ¥15 Mac系统vs code使用phpstudy如何配置debug来调试php
  • ¥15 目前主流的音乐软件,像网易云音乐,QQ音乐他们的前端和后台部分是用的什么技术实现的?求解!
  • ¥60 pb数据库修改与连接
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上