dpf7891 2013-05-13 10:24
浏览 34
已采纳

PHP-从网站的多个页面检索特定内容

What I want to accomplish might be a little hardcore, but I want to know if it's possible:

The question:
My question is the same as PHP-Retrieve content from page, but I want to use it on multiple pages.

The situation:
I'm using a website about TV shows. All the TV shows have the same URL and then the name of the show:
http://bierdopje.com/shows/NAME_OF_SHOW
On every show page, there's a line which tells you if the show is cancelled or still running. I want to retrieve that line to make an overview of the cancelled shows (the website only supports an overview of running shows, so I want to make an extra functionality).

The real question:
How can I tell DOM to retrieve all the shows and check for the status of the show? (http://bierdopje.com/shows/*).

The Note:
I understand that this process may take a while because it is reading the whole website (or is it too much data?).

  • 写回答

2条回答 默认 最新

  • dongsu4345 2013-05-13 11:49
    关注

    I use phpquery to fetch data from a web page, like jQuery in Dom.

    For example, to get the list of all shows, you can do this :

    <?php
    require_once 'phpQuery/phpQuery/phpQuery.php';
    
    $doc = phpQuery::newDocumentHTML(
        file_get_contents('http://www.bierdopje.com/shows')
    );
    
    foreach (pq('.listing a') as $key => $a) {
    
        $url = pq($a)->attr('href'); // will give "/shows/07-ghost"
        $show = pq($a)->text(); // will give "07 Ghost"
    
    } 
    

    Now you can process all shows individualy, make a new phpQuery::newDocumentHTML for each show and with an selector extract the information you need.


    Get the status of a show

    $html = file_get_contents('http://www.bierdopje.com/shows/alcatraz');
    $doc = phpQuery::newDocumentHTML($html);
    
    $status = pq('.content>span:nth-child(6)')->text();
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥20 基于MSP430f5529的MPU6050驱动,求出欧拉角
  • ¥20 Java-Oj-桌布的计算
  • ¥15 powerbuilder中的datawindow数据整合到新的DataWindow
  • ¥20 有人知道这种图怎么画吗?
  • ¥15 pyqt6如何引用qrc文件加载里面的的资源
  • ¥15 安卓JNI项目使用lua上的问题
  • ¥20 RL+GNN解决人员排班问题时梯度消失
  • ¥60 要数控稳压电源测试数据
  • ¥15 能帮我写下这个编程吗
  • ¥15 ikuai客户端l2tp协议链接报终止15信号和无法将p.p.p6转换为我的l2tp线路