dpf7891 2013-05-13 10:24
浏览 34
已采纳

PHP-从网站的多个页面检索特定内容

What I want to accomplish might be a little hardcore, but I want to know if it's possible:

The question:
My question is the same as PHP-Retrieve content from page, but I want to use it on multiple pages.

The situation:
I'm using a website about TV shows. All the TV shows have the same URL and then the name of the show:
http://bierdopje.com/shows/NAME_OF_SHOW
On every show page, there's a line which tells you if the show is cancelled or still running. I want to retrieve that line to make an overview of the cancelled shows (the website only supports an overview of running shows, so I want to make an extra functionality).

The real question:
How can I tell DOM to retrieve all the shows and check for the status of the show? (http://bierdopje.com/shows/*).

The Note:
I understand that this process may take a while because it is reading the whole website (or is it too much data?).

  • 写回答

2条回答 默认 最新

  • dongsu4345 2013-05-13 11:49
    关注

    I use phpquery to fetch data from a web page, like jQuery in Dom.

    For example, to get the list of all shows, you can do this :

    <?php
    require_once 'phpQuery/phpQuery/phpQuery.php';
    
    $doc = phpQuery::newDocumentHTML(
        file_get_contents('http://www.bierdopje.com/shows')
    );
    
    foreach (pq('.listing a') as $key => $a) {
    
        $url = pq($a)->attr('href'); // will give "/shows/07-ghost"
        $show = pq($a)->text(); // will give "07 Ghost"
    
    } 
    

    Now you can process all shows individualy, make a new phpQuery::newDocumentHTML for each show and with an selector extract the information you need.


    Get the status of a show

    $html = file_get_contents('http://www.bierdopje.com/shows/alcatraz');
    $doc = phpQuery::newDocumentHTML($html);
    
    $status = pq('.content>span:nth-child(6)')->text();
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 python天天向上类似问题,但没有清零
  • ¥30 3天&7天&&15天&销量如何统计同一行
  • ¥30 帮我写一段可以读取LD2450数据并计算距离的Arduino代码
  • ¥15 C#调用python代码(python带有库)
  • ¥15 矩阵加法的规则是两个矩阵中对应位置的数的绝对值进行加和
  • ¥15 活动选择题。最多可以参加几个项目?
  • ¥15 飞机曲面部件如机翼,壁板等具体的孔位模型
  • ¥15 vs2019中数据导出问题
  • ¥20 云服务Linux系统TCP-MSS值修改?
  • ¥20 关于#单片机#的问题:项目:使用模拟iic与ov2640通讯环境:F407问题:读取的ID号总是0xff,自己调了调发现在读从机数据时,SDA线上并未有信号变化(语言-c语言)