douti9253 2012-02-26 13:00
浏览 164
已采纳

使用curl从网页中获取内容

First of all have a look at here,

www.zedge.net/txts/4519/

this page has so many text messages , I want my script to open each of the message and download it, but i am having some problem,

This is my simple script to open the page,

<?php
 $ch = curl_init();
 curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
 $contents = curl_exec ($ch);
 curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
 curl_close ($ch);
?>

The page download fine but how would i open every text message page inside this page one by one and save its content in a text file, I know how to save the content of a webpage in a text file using curl but in this case there are so many different pages inside the page i've downloaded how to open them one by one seperately ?

I've this idea but don't know if it will work,

Downlaod this page,

www.zedge.net/txts/4519

look for the all the links of text messages page inside the page and save each link into one text file (one in each line), then run another curl session , open the text file read each link one by one , open it copy the content from the particular DIV and then save it in a new file.

  • 写回答

2条回答 默认 最新

  • drtj40036 2012-02-26 13:24
    关注

    The algorithm is pretty straight forward:

    • download www.zedge.net/txts/4519 with curl
    • parse it with DOM (or alternative) for links
    • either store them all into text file/database or process them on the fly with "subrequest"

     

    // Load main page
    $ch = curl_init();
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
    $contents = curl_exec ($ch);
    $dom = new DOMDocument();
    $dom->loadHTML( $contents);
    
    // Filter all the links
    $xPath = new DOMXPath( $dom);
    $items = $xPath->query( '//a[class=myLink]');
    
    foreach( $items as $link){
        $url = $link->getAttribute('href');
        if( strncmp( $url, 'http', 4) != 0){
            // Prepend http:// or something
        }
    
        // Open sub request
        curl_setopt($ch, CURLOPT_URL, "http://www.zedge.net/txts/4519");
        $subContent = curl_exec( $ch);
    }
    

    See documentation and examples for xPath::query, note that DOMNodeList implements Traversable and therefor you can use foreach.

    Tips:

    • Use curl opt COOKIE_JAR_FILE
    • Use sleep(...) not to flood server
    • Set php time and memory limit
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上
  • ¥20 关于#anlogic#sdram#的问题,如何解决?(关键词-performance)
  • ¥15 相敏解调 matlab
  • ¥15 求lingo代码和思路
  • ¥15 公交车和无人机协同运输
  • ¥15 stm32代码移植没反应