du5591 2018-07-31 18:06
浏览 81

使用XPath无法获得正确的价值

I'm using a src which collects data and works as an API. The website it takes info from had been redone and some stuff work, some don't.

PHP:

protected $namexpath = ".//h1[contains(@itemprop,\"name\")]/a";

Works with HTML Source:

 <h1 itemprop="name" class="fn itemTitle">
    <a title="https://www.paginegialle.it/altopascio-lu/lotto-ricevitorie/lucky-planet-duro-anastasia-tabaccheria-ricevitori" href="https://www.paginegialle.it/altopascio-lu/lotto-ricevitorie/lucky-planet-duro-anastasia-tabaccheria-ricevitori">
        Lucky <strong>Planet</strong> - Duro Anastasia <strong>Tabaccheria</strong> Ricevitoria Lotto
    </a>
</h1>

But this is not working:

PHP:

protected $telephonexpath = ".//div[@class=\"hidden-phone-elem visiblePhone\"]/span";

HTML Source:

<section itemscope="" itemtype="https://schema.org/LocalBusiness" class="vcard listElement   flFree " data-user="teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda" data-id="4" data-fl_free="true" data-cd_opec="GU01WAAW" data-cd_aggregazione="23787370" data-cd_id_sede="E57901ED-8833-A2AD-E040-A8C08D264C56">
<div class="container">
    <div class="row">
        <div class="col contentCol">
            <header>
                <div class="tabletOnlyBadge">
                </div>
                <h1 itemprop="name" class="fn itemTitle">
                    <a title="https://www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda" href="https://www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda">
                    <strong>Planet</strong> Cafe' di Tozzi Iolanda
                    </a>
                </h1>
                <span class="itemSubtitle">
                </span>
                <div>
                    <span class="itemAddress">
                        <span class="adr" itemprop="location" itemscope="" itemtype="https://schema.org/Place">
                            <div class="street-address">
                                <span>105, Via Roma</span> -
                                <span class="postal-code">81030</span> 
                                <span class="locality">Teverola</span> <span class="region">(CE)</span>
                            </div>
                            <div style="display: none;">
                                <span>40.99494</span>
                                <span>14.2077</span>
                            </div>
                        </span>
                    </span>
                </div>
            </header>
            <div>
                <div class="hidden-phone-wrapper">
                    <span class="custom-label"></span>
                    <div class="hidden-phone-elem">
                        <div class="btn btn-yellow btn-show-phone" data-pag="mostra telefono" data-context="listing">
                            <span>MOSTRA TELEFONO</span>
                        </div>
                        <div class="btn btn-hidden-phone">
                            <span class="phIco "></span>
                            <span class="phone-label">081 5034556</span>
                        </div>
                    </div>
                </div>
                <div class="itemGeoLinks">
                    <ul>
                    </ul>
                </div>
                <div class="itemPayoff">
                    <p class="payoff-title">
                        <a class="cat" href="//www.paginegialle.it/ricerca/cat/008647000" rel="nofollow"><strong>Tabacchi</strong>, sigarette e sigari - produzione e commercio</a>
                    </p>
                    <p itemprop="description" class="payoff-txt"></p>
                </div>
                <div class="itemInfoTags">
                </div>
            </div>
        </div>
        <div class="col-3 logoCol">
            <div class="itemRating">
                <a rel="nofollow" href="//www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda/commenti#scrivi">
                    <ul class="stars">
                        <li></li>
                        <li></li>
                        <li></li>
                        <li></li>
                        <li></li>
                    </ul>
                    <span class="label scriviRecensione">Scrivi una recensione</span>
                </a>
            </div>
            <figure class="itemLogo">
                <div class="img-container-ext">
                    <div class="img-container-int">
                        <a href="https://www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda" title="Dettagli azienda">
                        <img itemprop="image" alt="Planet Cafe' di Tozzi Iolanda" title="Planet Cafe' di Tozzi Iolanda" data-original="" class="elementImage photo" src="data:image/gif;base64,R0lGODlhAQABAIAAAAAAAP///yH5BAEAAAAALAAAAAABAAEAAAIBRAA7" pagespeed_url_hash="1859759222" onload="pagespeed.CriticalImages.checkImageForCriticality(this);">
                        </a>
                    </div>
                </div>
            </figure>
        </div>
    </div>
</div>
<div class="container">
    <div class="row">
        <div class="col">
            <nav class="itemFooter">
                <a class="btn btn-black icn-vetrina shinystat_ssxl" data-pag="vetrina" href="//www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda">Vetrina</a>
                <a class="btn btn-blank icn-showOnMap btnShowOnMap shinystat_ssxl" data-pag="vedimappa" href="https://www.paginegialle.it/teverola-ce/bar/rivendita-generi-monopolio-n.-2-tozzi-iolanda/mappa" rel="nofollow">  <span>Vedi su mappa</span></a>
            </nav>
        </div>
    </div>
</div>

www.paginegialle.it//ricerca//TABACCO%20PLANET?mr=50 So You might see the HTML easier.

I edited and I am adding some text because It won't let me finalize edit since it says too much code, I fixed the first part and changed from span to h1

  • 写回答

1条回答 默认 最新

  • du_1993 2018-08-01 19:53
    关注

    The Xpath does not match the HTML. The relevant fragment seems to be:

    <div class="hidden-phone-elem">
        <div class="btn btn-yellow btn-show-phone" data-pag="mostra telefono" data-context="listing">
            <span>MOSTRA TELEFONO</span>
        </div>
        <div class="btn btn-hidden-phone">
            <span class="phIco "></span>
            <span class="phone-label">081 5034556</span>
        </div>
    </div>
    

    The div has only the class hidden-phone-elem and two descendant spans. Xpath 1.0 has no token selector function, but it can be emulated with string functions.

    • normalize-space() - replace all whitespace sequences with a single space, trim
    • concat() - concatenate strings
    • contains() - look for substring

    The trick is to normalize the attribute to something like classToMatch otherClass and look if that contains classToMatch. (Take note of the spaces at the start/end).

    $document = new DOMDocument();
    $document->loadHTML($html);
    $xpath = new DOMXpath($document);
    
    $expression = 'string(
        //div[
          contains(concat(" ", normalize-space(@class), " "), " hidden-phone-elem ")
        ]
        //span[
          contains(concat(" ", normalize-space(@class), " "), " phone-label ")
        ]
    )';
    
    var_dump($xpath->evaluate($expression));
    

    Output:

    string(11) "081 5034556"
    
    评论

报告相同问题?

悬赏问题

  • ¥20 wireshark抓不到vlan
  • ¥20 关于#stm32#的问题:需要指导自动酸碱滴定仪的原理图程序代码及仿真
  • ¥20 设计一款异域新娘的视频相亲软件需要哪些技术支持
  • ¥15 stata安慰剂检验作图但是真实值不出现在图上
  • ¥15 c程序不知道为什么得不到结果
  • ¥40 复杂的限制性的商函数处理
  • ¥15 程序不包含适用于入口点的静态Main方法
  • ¥15 素材场景中光线烘焙后灯光失效
  • ¥15 请教一下各位,为什么我这个没有实现模拟点击
  • ¥15 执行 virtuoso 命令后,界面没有,cadence 启动不起来