dongyonglie5132 2015-07-08 11:21
浏览 43

使用Domdoc + PHP刮取html

I would like to scrape the following HTML

 <div class="venue-event-list " rel="GB">
                            <div class="tracks-list">
<div class="single-track">
            <a href="//livevideo.betfair.com/Default.do?mi=119408124" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
    <div class="info-container">
        <span class="track-name">
            <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>
        </span>
        <div class="races-list">


<div class="single-race" id="m-1_119408124">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408124"
            title="5f Nursery | 7 Runners">14:10</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408128">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408128"
            title="6f Mdn Stks | 11 Runners">14:40</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408132">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408132"
            title="7f Mdn Stks | 6 Runners">15:10</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408136">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408136"
            title="2m Hcap | 12 Runners">15:40</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408140">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408140"
            title="1m2f Sell Stks | 6 Runners">16:10</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408144">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408144"
            title="1m3f Hcap | 8 Runners">16:40</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408148">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408148"
            title="1m1f Hcap | 14 Runners">17:10</a>
    </span>
</div>
        </div>
    </div>
</div>
                    </div>
                            <div class="tracks-list">
<div class="single-track">
            <a href="//livevideo.betfair.com/Default.do?mi=119408153" target="_blank" class="live-video-link"><div class="bf-icon-live-video tag-i13n i13n-ltxt-LVid i13n-sec-GB i13n-tab-today" title="Watch now on Betfair Live Video"></div></a>
    <div class="info-container">
        <span class="track-name">
            <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today" href="/exchange/plus/#/horse-racing/market/1.119408153">Wolverhampton</a>
        </span>
        <div class="races-list">


<div class="single-race" id="m-1_119408153">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408153"
            title="5f Mdn Stks | 7 Runners">14:20</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408157">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408157"
            title="1m6f Hcap | 7 Runners">14:50</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408161">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408161"
            title="1m4f Sell Stks | 5 Runners">15:20</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408165">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408165"
            title="1m1f Hcap | 13 Runners">15:50</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408169">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408169"
            title="1m1f Hcap | 11 Runners">16:20</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408173">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408173"
            title="1m Mdn Stks | 11 Runners">16:50</a>
    </span>
        <span class="separator">|</span>
</div>


<div class="single-race" id="m-1_119408177">
    <span class="race-time link-text">
        <a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
            href="/exchange/plus/#/horse-racing/market/1.119408177"
            title="1m Hcap | 13 Runners">17:20</a>
    </span>
</div>
        </div>
    </div>
</div>
                    </div>

I have used the following code to pull the racename and the time of the race

$url         = ""; 
$html        = file_get_contents($url);
$dom         = new DOMDocument();
@$dom->loadHTML($html);
$dom->preserveWhiteSpace = false;
$xpath                   = new DOMXPath($dom);
//pull the individual cards for the day
//li class="rac-cardsclass="ix ixc"
$getdropdown             = '//div[contains(@class, "tracks-list")]';
$getdropdown2            = $xpath->query($getdropdown);
//loop through each individual card
foreach ($getdropdown2 as $dropresults) {
echo $dropresults->textContent. "<br />";
}

What i would like to do is pull the meeting name if only the link (shown below) contains "GB" and "today" (this is within the class text) -

>  <a class="tag-i13n i13n-ltxt-meeting i13n-sec-GB i13n-tab-today"
> href="/exchange/plus/#/horse-racing/market/1.119408124">Lingfield</a>

so the outcome would be lingfield... if this is true i would like to then pull the time of the race and the market id from the following :

<a class="race-link  tag-i13n i13n-ltxt-race i13n-sec-GB i13n-tab-today"
        href="/exchange/plus/#/horse-racing/market/1.119408124"
        title="5f Nursery | 7 Runners">14:10</a>

so the outcome would be:

Lingfield 14:10 1.119408124 
Lingfield 14:40 1.119408144
 ............................. 
Wolverhampton 14:20 1.119408153
  • 写回答

1条回答 默认 最新

  • doremifasodo0008008 2015-07-08 12:04
    关注
    $xpath->query("a[contains(@class,'GB') and contains(@class,'today')]");
    

    It will be helpful.

    评论

报告相同问题?

悬赏问题

  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么
  • ¥15 banner广告展示设置多少时间不怎么会消耗用户价值
  • ¥16 mybatis的代理对象无法通过@Autowired装填
  • ¥15 可见光定位matlab仿真
  • ¥15 arduino 四自由度机械臂
  • ¥15 wordpress 产品图片 GIF 没法显示
  • ¥15 求三国群英传pl国战时间的修改方法
  • ¥15 matlab代码代写,需写出详细代码,代价私
  • ¥15 ROS系统搭建请教(跨境电商用途)
  • ¥15 AIC3204的示例代码有吗,想用AIC3204测量血氧,找不到相关的代码。