dongyi6845 2013-11-17 02:05
浏览 31
已采纳

需要帮助选择PHP简单的HTML DOM解析器

Working on a community website, converting it from ASP to PHP. At the moment, the client manually enters the movie times each week for our local theater, which they grab from another website. I figured I would try to automate this process since we are redoing the site anyway, so I found PHP Simple HTML DOM Parser. I'm stuck on selecting the rating of the movie (PG, 18, etc).

Here is a div that includes the information for one movie:

            <div class="mshow">
                <span style="float:right; font-size:11px;">
                    <a href="/trailers/enders-game/19330/" title="enders-game movie trailer" style="font-size:11px;">Trailer</a> | 
                    <a href="/reviews/enders-game/30945/" title="Ender's Game movie reviews" style="font-size:11px;">Rating: </a>
                    <b>Tribute</b>
                    <img src="/images/stars/4_sm.gif" alt="Current rating: 3.88" border="0" />
                </span>
                <strong>
                    <a href="/movies/enders-game/30945/" title="Ender's Game movie info">Ender's Game</a>
                </strong>
                (PG)<br />
                <div class="block">&nbsp;</div>
                <div class="rsd">Fri, Nov 15: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Sat, Nov 16: </div>
                <div class="rst" >1:00pm &nbsp;&nbsp;3:15pm &nbsp;&nbsp;7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Sun, Nov 17: </div>
                <div class="rst" >1:00pm &nbsp;&nbsp;3:15pm &nbsp;&nbsp;7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Mon, Nov 18: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Tue, Nov 19: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Wed, Nov 20: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
                <div class="rsd">Thu, Nov 21: </div>
                <div class="rst" >7:00pm &nbsp;&nbsp;9:20pm &nbsp;&nbsp;</div><br />
            </div>

And here is my code so far:

            <?php
            include_once('../simple_html_dom.php');

            $html = file_get_html('http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1');
            $movies = array();
            foreach ($html->find("div.mshow") as $movie) {
                $item['trailer'] = $movie->find('a', 0)->href;
                $item['reviews'] = $movie->find('a', 1)->href;
                $item['link'] = $movie->find('a', 2)->href;
                $item['title'] = $movie->find('a', 2)->plaintext;
                $movies[] = $item;
            }

            var_dump($movies);
            ?>

I can't figure out how to grab (PG). Any suggestions?

Edit: This works, but doesn't seem like a great solution.

            function parseDOM($url) {
                $movies = array();
                foreach ($url->find("div.mshow") as $movie) {
                    $item['trailer'] = $movie->find('a', 0)->href;
                    $item['reviews'] = $movie->find('a', 1)->href;
                    $item['link'] = $movie->find('a', 2)->href;
                    $item['title'] = $movie->find('a', 2)->plaintext;
                    $info = $movie->plaintext;
                    preg_match('/\((.*?)\)/', $info, $matches);
                    $item['rating'] = $matches[1];
                    $movies[] = $item;
                }
                return $movies;
            }
  • 写回答

1条回答 默认 最新

  • douliaodan2738 2013-11-17 05:11
    关注

    Unfortunately Simple HTML DOM library was a bad choice. It doesn't support full XPath queries nor have a seemly sibling node selector.

    With the built-in DOM module you can easily achieve what you want with that:

    $dom = new DOMDocument;
    @$dom->loadHTMLFile('http://www.tribute.ca/showtimes/theatres/may-cinema-6/mayc5/?datefilter=-1');
    $xpath = new DOMXPath($dom);
    $movies = array();
    
    foreach ($xpath->query("//div[@class='mshow']") as $movie) {
        $item = array();
        $links = $xpath->query('.//a', $movie);
        $item['trailer'] = $links->item(0)->getAttribute('href');
        $item['reviews'] = $links->item(1)->getAttribute('href');
        $item['link'] = $links->item(2)->getAttribute('href');
        $item['title'] = $links->item(2)->nodeValue;
        $item['rating'] = trim($xpath->query('.//strong/following-sibling::text()',
            $movie)->item(0)->nodeValue);
        $movies[] = $item;
    }
    
    var_dump($movies);
    

    This gave me the following:

    array(7) {
      [0]=>
      array(5) {
        ["trailer"]=>
        string(28) "/trailers/enders-game/19330/"
        ["reviews"]=>
        string(27) "/reviews/enders-game/30945/"
        ["link"]=>
        string(26) "/movies/enders-game/30945/"
        ["title"]=>
        string(12) "Ender's Game"
        ["rating"]=>
        string(4) "(PG)"
      }
      [1]=>
      array(5) {
        ["trailer"]=>
        string(27) "/trailers/free-birds/19436/"
        ["reviews"]=>
        string(26) "/reviews/free-birds/36183/"
        ["link"]=>
        string(25) "/movies/free-birds/36183/"
        ["title"]=>
        string(10) "Free Birds"
        ["rating"]=>
        string(3) "(G)"
      }
      [2]=>
      array(5) {
        ["trailer"]=>
        string(30) "/trailers/free-birds-3d/14421/"
        ["reviews"]=>
        string(29) "/reviews/free-birds-3d/37230/"
        ["link"]=>
        string(28) "/movies/free-birds-3d/37230/"
        ["title"]=>
        string(13) "Free Birds 3D"
        ["rating"]=>
        string(3) "(G)"
      }
      [3]=>
      array(5) {
        ["trailer"]=>
        string(45) "/trailers/jackass-presents-bad-grandpa/19318/"
        ["reviews"]=>
        string(44) "/reviews/jackass-presents-bad-grandpa/36493/"
        ["link"]=>
        string(43) "/movies/jackass-presents-bad-grandpa/36493/"
        ["title"]=>
        string(29) "Jackass Presents: Bad Grandpa"
        ["rating"]=>
        string(5) "(14A)"
      }
      [4]=>
      array(5) {
        ["trailer"]=>
        string(27) "/trailers/last-vegas/19291/"
        ["reviews"]=>
        string(26) "/reviews/last-vegas/35853/"
        ["link"]=>
        string(25) "/movies/last-vegas/35853/"
        ["title"]=>
        string(10) "Last Vegas"
        ["rating"]=>
        string(4) "(PG)"
      }
      [5]=>
      array(5) {
        ["trailer"]=>
        string(36) "/trailers/thor-the-dark-world/19327/"
        ["reviews"]=>
        string(35) "/reviews/thor-the-dark-world/32002/"
        ["link"]=>
        string(34) "/movies/thor-the-dark-world/32002/"
        ["title"]=>
        string(20) "Thor: The Dark World"
        ["rating"]=>
        string(4) "(PG)"
      }
      [6]=>
      array(5) {
        ["trailer"]=>
        string(39) "/trailers/thor-the-dark-world-3d/14425/"
        ["reviews"]=>
        string(38) "/reviews/thor-the-dark-world-3d/34705/"
        ["link"]=>
        string(37) "/movies/thor-the-dark-world-3d/34705/"
        ["title"]=>
        string(23) "Thor: The Dark World 3D"
        ["rating"]=>
        string(4) "(PG)"
      }
    }
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 删除虚拟显示器驱动 删除所有 Xorg 配置文件 删除显示器缓存文件 重启系统 可是依旧无法退出虚拟显示器
  • ¥15 vscode程序一直报同样的错,如何解决?
  • ¥15 关于使用unity中遇到的问题
  • ¥15 开放世界如何写线性关卡的用例(类似原神)
  • ¥15 关于并联谐振电磁感应加热
  • ¥15 this signal is connected to multiple drivers怎么解决
  • ¥60 请查询全国几个煤炭大省近十年的煤炭铁路及公路的货物周转量
  • ¥15 请帮我看看我这道c语言题到底漏了哪种情况吧!
  • ¥66 如何制作支付宝扫码跳转到发红包界面
  • ¥15 pnpm 下载element-plus