dsplos5731 2014-10-09 04:18
浏览 28
已采纳

too long

I'm having trouble to extract the integers between the brackets from this website.

Part of markup from the website:

<span class="b-label b-link-number" data-num="(322206)">Music &amp; Video</span>
<span class="b-label b-link-number" data-num="(954218)">Toys, Hobbies &amp; Games</span>
<span class="b-label b-link-number" data-num="(502981)">Kids, Baby &amp; Maternity</span>

How do I extract the integers between the brackets?

Desired output:

322206
954218
502981

Should I use Regex since they got the same class name (but not Regex to get between brackets since there are other unwanted elements inside bracket as well from the source code).

Normally, this would be the way I use to extract information:

<?php
//header('Content-Type: text/html; charset=utf-8');
$grep = new DoMDocument();
@$grep->loadHTMLFile("http://global.rakuten.com/en/search/?tl=&k=");
$finder = new DomXPath($grep);
$class = "b-list-item";
$nodes = $finder->query("//*[contains(@class, '$class')]");

foreach ($nodes as $node) {
    $span = $node->childNodes;
    $search = array(0,1,2,3,4,5,6,7,8,9,'(',')');
    $categories = str_replace($search, '', $span->item(0)->nodeValue);
    echo '<br>' . '<font color="green">' . $categories . '  ' . '</font>' ;

}
?>

but since the data I want is inside the tag, how do I extract them?

  • 写回答

2条回答 默认 最新

  • doyhq66282 2014-10-09 04:22
    关注

    Adding on your current code, its simply straight forward, just change that $class to that class you desire and use ->getAttribute() to get those data-num's:

    $grep = new DoMDocument();
    @$grep->loadHTMLFile("http://global.rakuten.com/en/search/?tl=&k=");
    $finder = new DomXPath($grep);
    $class = "b-link-number"; // change the span class
    $nodes = $finder->query("//*[contains(@class, '$class')]"); // target those
    
    $numbers = array();
    foreach ($nodes as $node) { // for every found elemenet
        $link_num = $node->getAttribute('data-num'); // get the attribute `data-num`
        $link_num = str_replace(['(', ')'], '', $link_num); // simply remove those parenthesis
        $numbers[] = $link_num; // push it inside the container
    }
    
    echo '<pre>';
    print_r($numbers);
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 关于#java#的问题:找一份能快速看完mooc视频的代码
  • ¥15 这种微信登录授权 谁可以做啊
  • ¥15 请问我该如何添加自己的数据去运行蚁群算法代码
  • ¥20 用HslCommunication 连接欧姆龙 plc有时会连接失败。报异常为“未知错误”
  • ¥15 网络设备配置与管理这个该怎么弄
  • ¥20 机器学习能否像多层线性模型一样处理嵌套数据
  • ¥20 西门子S7-Graph,S7-300,梯形图
  • ¥50 用易语言http 访问不了网页
  • ¥50 safari浏览器fetch提交数据后数据丢失问题
  • ¥15 matlab不知道怎么改,求解答!!