douguaidian8021
douguaidian8021
2012-02-07 07:26
采纳率: 100%
浏览 44
已采纳

使用PHP从网站上抓取数据[关闭]

I am trying to gather information into a text file, that I will later be uploading to a MySQL database. I am trying to gather all of the PS3 trophy information. I will be using this website : http://www.ps3trophies.org/games/psn/1/ to gather the information. What I need to do is go inside each game on every single page, get the game name, and each of the trophies and all of the information about them. Thanks for any info you can give me.

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

3条回答 默认 最新

  • dongtuo7364
    dongtuo7364 2012-02-07 07:34
    已采纳

    I recommend using the Simple HTML DOM Parser to do this. You can use jQuery/CSS selectors to navigate elements on the page. You could do something like this:

    $html = file_get_html('http://www.ps3trophies.org/games/psn/1/');
    $otherPages = $html->find('a[href^=/games/psn/]'); // this will get the links for the 7 other pages
    

    And then you can also build a selector for all the games pages, and load them. Read through the parser documentation for all the stuff you can do.

    点赞 评论
  • doumibi6899
    doumibi6899 2012-02-07 07:32

    In short, you need to use the PHP function get_file_contents()

    like so:

    for ($i = 0; i<number_of_pages; i++){
        $url = 'http://www.ps3trophies.org/games/psn/' . i;
        $html = get_file_contents($url);
    
        //do a regex search on $html to pinpoint your data
    
        //save it
    }
    

    now you can use the $html variable, combined with a regular expression, to find the data you need.

    点赞 评论
  • dousha2020
    dousha2020 2012-02-07 11:25

    Check this out will give you the expected output

    <?php
    error_reporting(E_ERROR | E_PARSE);
    $dom = new DOMDocument();
    $dom->loadHTMLFile('http://www.ps3trophies.org/games/psn/1/');
    $xml = simplexml_import_dom($dom);
    $links = $xml->xpath('//table/tr/td/a');
    for($i=30;$i<count($links);$i++): 
    ?>
    <a target="_blank" href="http://www.ps3trophies.org<?php echo $links[$i]['href']; ?>"><?php echo $links[$i]['href']; ?></a><br/>
    <?php
    endfor;
    ?>
    
    点赞 评论

相关推荐