dongwen9975 2013-04-28 01:45
浏览 31
已采纳

某些内容未附带CURL请求

i am trying to develop a spider to get data from other sites, just for academic meanings. Very well, i am trying to crawl this website: http://urlmin.com/ngz What happens if that: I can get all the data i want, but the photo's directories. Why? Because it is loaded with javascript; until here its fine. Here is the js code that loads the image elements after dom is loaded:

    var exibirImg = new ExibirImagens();
exibirImg.Imagens = [

    new ItemImagem(
        '../fotosanuncios/13886-Papucha 20074.JPG',
        '../fotosanuncios/13886-p-Papucha 20074.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Motores Novos.JPG',
        '../fotosanuncios/13886-p-Motores Novos.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Panther reformada5.JPG',
        '../fotosanuncios/13886-p-Panther reformada5.JPG'),

    new ItemImagem(
        '../fotosanuncios/13886-Panther reformada 2007.JPG',
        '../fotosanuncios/13886-p-Panther reformada 2007.JPG'),

];
exibirImg.PreLoad();
exibirImg.Titulo = 'Oferta A Gtr 323';
exibirImg.EscreveImagens();
exibirImg.TimeOutJs = 3500;
exibirImg.ImagemNotFound = 'imagens/ImagemNotFound.png';
exibirImg.IdImagemPrincipal = 'imagemPrincipalPF';
exibirImg.IdImagemMini = 'imagensPequenasPF';

It would be really easy, if my CURL gets the JS like above, but it doesnt. It comes like this:

var exibirImg = new ExibirImagens();
exibirImg.Imagens = [

];
exibirImg.PreLoad();
exibirImg.Titulo = 'Oferta A Gtr 323';
exibirImg.EscreveImagens();
exibirImg.TimeOutJs = 3500;
exibirImg.ImagemNotFound = 'imagens/ImagemNotFound.png';
exibirImg.IdImagemPrincipal = 'imagemPrincipalPF';
exibirImg.IdImagemMini = 'imagensPequenasPF';

exibirImg.Iniciar();

Again, the array must be loaded with AJAX or something. But the real puzzle here is that, if i turn off my browser's javascript support, the array still come with the image's directories. So the only explanation is that it came from Server Side. And question is, if it came from server side, why the hell my curl does not get it?

Thanks, hope someone can understand me.

You can check that script on the same page in the line 262

  • 写回答

1条回答 默认 最新

  • dsc71976 2013-04-28 12:55
    关注

    Works for me:

    $url = 'http://urlmin.com/ngz';
    
    $ch = curl_init( $url );
    curl_setopt( $ch, CURLOPT_RETURNTRANSFER, true);
    curl_setopt( $ch, CURLOPT_FOLLOWLOCATION, true);
    
    if ( $result =  curl_exec($ch) )
    {
        echo $result;
    }
    else
    echo "cURL error: ".curl_error($ch);   
    
    curl_close( $ch );
    

    And $result contains:

    var exibirImg = new ExibirImagens();
    exibirImg.Imagens = [
    
        new ItemImagem(
            '../fotosanuncios/13886-Papucha 20074.JPG',
            '../fotosanuncios/13886-p-Papucha 20074.JPG'),
    
        new ItemImagem(
            '../fotosanuncios/13886-Motores Novos.JPG',
            '../fotosanuncios/13886-p-Motores Novos.JPG'),
    
        new ItemImagem(
            '../fotosanuncios/13886-Panther reformada5.JPG',
            '../fotosanuncios/13886-p-Panther reformada5.JPG'),
    
        new ItemImagem(
            '../fotosanuncios/13886-Panther reformada 2007.JPG',
            '../fotosanuncios/13886-p-Panther reformada 2007.JPG'),
    
    ];
    exibirImg.PreLoad();
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 虚幻5 UE美术毛发渲染
  • ¥15 CVRP 图论 物流运输优化
  • ¥15 Tableau online 嵌入ppt失败
  • ¥100 支付宝网页转账系统不识别账号
  • ¥15 基于单片机的靶位控制系统
  • ¥15 真我手机蓝牙传输进度消息被关闭了,怎么打开?(关键词-消息通知)
  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测