duanbu9345 2014-06-04 23:55
浏览 61
已采纳

使用XPATH和JSON进行php web抓取

I've been learning some web scraping using XPath in PHP and I have successfully scraped contents from many websites and using many selectors until I tried it with a JSON.

What I find quite weird is that when I run the command $x("//body/text()"); in the browser I get the desired result but there is something wrong in my code and I'm not aware what is it.

this is an example in viper-7 in which I successfully scrape odds from a website. scraped Odds

On the other hand, I'm trying to use the same code to scrape a json right from the body and I can't seem to get it right. Not only did I try to use json_decode but also json_encode.

This is the code I can't seem to fix. scrape JSON

  • 写回答

1条回答 默认 最新

  • douningqiu4991 2014-06-05 01:06
    关注

    If you want to search JSON, you should use JSONPath, not XPath:

    <?php
        require_once('json.php');
        require_once('jsonpath.php');
    
        $parser = new Services_JSON(SERVICES_JSON_LOOSE_TYPE);
    
        $json = file_get_contents('https://www.realproperty.cl/mobilData.php?functName=getInmuebles&inmuebleID=561');
        $o = $parser->decode($json);
        $result = jsonPath($o, "$..descripcion");
    
        echo '<ul>'."
    ";
        foreach ($result as $item) {
            echo '    <li>'.$parser->encode($item).'</li>'."
    ";
        }
        echo '</ul>'."
    ";
    ?>
    

    You will need jsonpath.php and json.php

    This prints:

    <ul>
        <li>"Edificio Trancura se encuentra ubicado en un importante sector residencial de la comuna de Las Condes, a pasos de Av. Crist\u00f3bal Col\u00f3n, cercano a diversos servicios como supermercados, restaurantes, farmacias, strip center, etc.
    
    Este proyecto cuenta con un innovador dise\u00f1o que incluye espacios de doble altura en los departamentos (3 dormitorios), lo que genera una gran sensaci\u00f3n de amplitud y a su vez permite un mejor ingreso de luz natural.
    
    Recibimos su propiedad en parte de pago."</li>
        <li>"Elevador"</li>
        <li>"Condominio"</li>
        <li>"Estacionamiento Visitas"</li>
        <li>"Bodega"</li>
        <li>"Estacionamiento cubierto"</li>
        <li>"ATM"</li>
        <li>"Colegio"</li>
        <li>"Farmacia"</li>
        <li>"Mall"</li>
        <li>"Parada Bus"</li>
        <li>"Parada de taxi"</li>
        <li>"Restaurante"</li>
        <li>"Supermercado"</li>
        <li>"Universidad"</li>
    </ul>
    

    See http://viper-7.com/hMxQLa (I pasted the required libraries - your code is at the end of the listing)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!