dtmm0148603 2015-08-11 14:53
浏览 215

PHP /抓取 - 获取亚马逊产品价格

I am trying to retrieve the price of an Amazon product. I tried 2 methods:

  1. file_get_contents -> regex -> it works.
  2. using DOMXPath -> does not work for some reason.

I noticed that if javascript is enabled the xpath of the price differs from the xpath while javascript is disabled.

Anyway, how can I retrieve the price using xpath?

This is what I am doing but the code returns nothing (even though it is working on any other website):

(The xpath was taken using firebug)

$url = 'http://www.amazon.com/dp/product/B00TRQPSXM/';
$path = '/html/body/div[3]/form/table[3]/tbody/tr[1]/td/div/table/tbody/tr[2]';

$html = file_get_contents($url);

$dom = new DOMDocument();
@$dom->loadHTML($html);
$xpath = new DOMXpath($dom);

$elements = $xpath->query($path);

if($elements)
{   
    foreach($elements as $element)
    {
        echo $element->nodeName.'<br>';
        echo $element->nodeValue.'<br>';
    }
}
  • 写回答

1条回答 默认 最新

  • dourong4031 2015-08-11 14:58
    关注

    Your request will be blocked after a couple of tries every time, amazon checks for robot access. Instead of scrapping their site which btw is against amazon's terms of service (or whatever it's called), use their API found at http://developer.amazonservices.com. You will get the price information you are after with this operation.

    There is also a php sdk you can use.

    Either way, file_get_contents() is not an option here, if you want to scrape the page use curl and make it look like an unique visitor.

    评论

报告相同问题?

悬赏问题

  • ¥15 在获取boss直聘的聊天的时候只能获取到前40条聊天数据
  • ¥20 关于URL获取的参数,无法执行二选一查询
  • ¥15 液位控制,当液位超过高限时常开触点59闭合,直到液位低于低限时,断开
  • ¥15 marlin编译错误,如何解决?
  • ¥15 有偿四位数,节约算法和扫描算法
  • ¥15 VUE项目怎么运行,系统打不开
  • ¥50 pointpillars等目标检测算法怎么融合注意力机制
  • ¥20 Vs code Mac系统 PHP Debug调试环境配置
  • ¥60 大一项目课,微信小程序
  • ¥15 求视频摘要youtube和ovp数据集