duanbu9345 2014-06-04 23:55
浏览 61
已采纳

使用XPATH和JSON进行php web抓取

I've been learning some web scraping using XPath in PHP and I have successfully scraped contents from many websites and using many selectors until I tried it with a JSON.

What I find quite weird is that when I run the command $x("//body/text()"); in the browser I get the desired result but there is something wrong in my code and I'm not aware what is it.

this is an example in viper-7 in which I successfully scrape odds from a website. scraped Odds

On the other hand, I'm trying to use the same code to scrape a json right from the body and I can't seem to get it right. Not only did I try to use json_decode but also json_encode.

This is the code I can't seem to fix. scrape JSON

  • 写回答

1条回答 默认 最新

  • douningqiu4991 2014-06-05 01:06
    关注

    If you want to search JSON, you should use JSONPath, not XPath:

    <?php
        require_once('json.php');
        require_once('jsonpath.php');
    
        $parser = new Services_JSON(SERVICES_JSON_LOOSE_TYPE);
    
        $json = file_get_contents('https://www.realproperty.cl/mobilData.php?functName=getInmuebles&inmuebleID=561');
        $o = $parser->decode($json);
        $result = jsonPath($o, "$..descripcion");
    
        echo '<ul>'."
    ";
        foreach ($result as $item) {
            echo '    <li>'.$parser->encode($item).'</li>'."
    ";
        }
        echo '</ul>'."
    ";
    ?>
    

    You will need jsonpath.php and json.php

    This prints:

    <ul>
        <li>"Edificio Trancura se encuentra ubicado en un importante sector residencial de la comuna de Las Condes, a pasos de Av. Crist\u00f3bal Col\u00f3n, cercano a diversos servicios como supermercados, restaurantes, farmacias, strip center, etc.
    
    Este proyecto cuenta con un innovador dise\u00f1o que incluye espacios de doble altura en los departamentos (3 dormitorios), lo que genera una gran sensaci\u00f3n de amplitud y a su vez permite un mejor ingreso de luz natural.
    
    Recibimos su propiedad en parte de pago."</li>
        <li>"Elevador"</li>
        <li>"Condominio"</li>
        <li>"Estacionamiento Visitas"</li>
        <li>"Bodega"</li>
        <li>"Estacionamiento cubierto"</li>
        <li>"ATM"</li>
        <li>"Colegio"</li>
        <li>"Farmacia"</li>
        <li>"Mall"</li>
        <li>"Parada Bus"</li>
        <li>"Parada de taxi"</li>
        <li>"Restaurante"</li>
        <li>"Supermercado"</li>
        <li>"Universidad"</li>
    </ul>
    

    See http://viper-7.com/hMxQLa (I pasted the required libraries - your code is at the end of the listing)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 matlab计算中误差
  • ¥15 对于相关问题的求解与代码
  • ¥15 ubuntu子系统密码忘记
  • ¥15 信号傅里叶变换在matlab上遇到的小问题请求帮助
  • ¥15 保护模式-系统加载-段寄存器
  • ¥15 电脑桌面设定一个区域禁止鼠标操作
  • ¥15 求NPF226060磁芯的详细资料
  • ¥15 使用R语言marginaleffects包进行边际效应图绘制
  • ¥20 usb设备兼容性问题
  • ¥15 错误(10048): “调用exui内部功能”库命令的参数“参数4”不能接受空数据。怎么解决啊