duanjiao5543 2012-10-05 12:47
浏览 34
已采纳

从网站检索特定数据

I am currently building a scraper to scrape certain information from a website.

For example, I would like to get a restaurant name, address, opening hours & telephone number from a website.

By using curl, I managed to get the data from the website:

    $url = "http://localhost/test.html";
    $ch = curl_init(); 
    curl_setopt($ch, CURLOPT_URL, $url); 
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1); 
    $data = curl_exec($ch); 
    curl_close($ch);

However, I need some ideas on how would I be able to pin point my scraper to the exact location to scrape these information out.

I have tried regular expressions, but was unable to get it to work.

  • 写回答

2条回答 默认 最新

  • dongwang3066 2012-10-05 12:48
    关注

    Use SimpleHTMLDom parser for php:
    http://simplehtmldom.sourceforge.net/

    Download here:
    http://sourceforge.net/projects/simplehtmldom/files/

    Documentation here:
    http://simplehtmldom.sourceforge.net/manual.htm

    That is as I have experience with parsing the best tool for parsing HTML with php...

    Also you don't need to use curl for getting content if it is not necessary, for simpleHTMLDom parser just use:

    $remote_html = file_get_html("http://www.somesite.com/");
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 RL+GNN解决人员排班问题时梯度消失
  • ¥15 统计大规模图中的完全子图问题
  • ¥15 使用LM2596制作降压电路,一个能运行,一个不能
  • ¥60 要数控稳压电源测试数据
  • ¥15 能帮我写下这个编程吗
  • ¥15 ikuai客户端l2tp协议链接报终止15信号和无法将p.p.p6转换为我的l2tp线路
  • ¥15 phython读取excel表格报错 ^7个 SyntaxError: invalid syntax 语句报错
  • ¥20 @microsoft/fetch-event-source 流式响应问题
  • ¥15 ogg dd trandata 报错
  • ¥15 高缺失率数据如何选择填充方式