dscdttg4389 2013-06-10 14:36
浏览 58
已采纳

从多个URL获取特定数据的最佳做法是什么?

I need to fetch data about a product from a given url, i.e. images, product title, price, etc.. I'm currently fetching all of the images of the webpage using simple PHP file_get_contents code, so that's working great. I'm wondering what's the best practice for fetching the other data though. I need to be able to fetch data from Etsy, Zappos, ASOS, Net-a-Porter, Nordstrom and PopSugar. Do I need a bot? Is it even possible? Thank you very much in advance!

  • 写回答

1条回答 默认 最新

  • drfu29983 2013-06-10 14:52
    关注

    You can use file_get_contents() to obtain the html for the page, but after that you will need to read the DOM to find the elements you want to read information from (src's from images, hrefs from anchors etc)..

    There are actually several ways to do what you want, and without more information it is rather hard to give you a specific answer, but you can start with something like:

    $html = file_get_contents('your url');
    $Dom = new DOMDocument();
    $Dom->loadHTML($html);
    

    At this point you got a DomDocument (http://www.php.net/manual/en/class.domdocument.php) object loaded with all the information of your page.

    You can then select elements with ie. Xpath.

    An example:

    $XPath = new DOMXPath($Dom);
    $Anchors = $XPath->query('//a');
    
    for ($i = 0; $i < $Anchors->length; $i++) {
        $Anchor = $Anchors->item($i);
        echo 'Href #' . $i . ': ' . $Anchor->getAttribute('href') . '<br />';
    }
    

    The code above will print all the anchor hrefs on the page and is just a basic example which is powerfull enough to do whatever you might want. You still will need to dive into the usage of DomDocument and XPath to learn how to get exactly what you want, but that shoulnt be to hard from this point on.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 fluent的在模拟压强时使用希望得到一些建议
  • ¥15 STM32驱动继电器
  • ¥15 Windows server update services
  • ¥15 关于#c语言#的问题:我现在在做一个墨水屏设计,2.9英寸的小屏怎么换4.2英寸大屏
  • ¥15 模糊pid与pid仿真结果几乎一样
  • ¥15 java的GUI的运用
  • ¥15 Web.config连不上数据库
  • ¥15 我想付费需要AKM公司DSP开发资料及相关开发。
  • ¥15 怎么配置广告联盟瀑布流
  • ¥15 Rstudio 保存代码闪退