duanmao2774 2016-07-02 23:44
浏览 70

简单的html dom解析器表到数组(扩展)

There is this website

http://www.oxybet.com/france-vs-iceland/e/5209778/

What I want is to scrape not the full table but PARTS of this table.

For example to only display rows that include sportingbet stoiximan and mybet and I don't need all columns only 1 x 2 columns, also the numbers that are with red must be scraped as is with the red box or just display an asterisk next to them in the scrape can this be done or do I need to scrape the whole table on a database first then query the database?

What I got now is this code I borrowed from another similar question on this forum which is:

<?php

require('simple_html_dom.php');


$html = file_get_html('http://www.oxybet.com/france-vs-iceland/e/5209778/');

$table = $html->find('table', 0);
$rowData = array();


foreach($table->find('tr') as $row) {
// initialize array to store the cell data from each row
$flight = array();

foreach($row->find('td') as $cell) {
    // push the cell's text to the array

    $flight[] = $cell->plaintext;
}
$rowData[] = $flight;
}

echo '<table>';
foreach ($rowData as $row => $tr) {
echo '<tr>'; 
foreach ($tr as $td)
    echo '<td>' . $td .'</td>';
echo '</tr>';
}
echo '</table>';

?>

which returns the full table. What I want mainly is somehow to detect the numbers selected in the red box (in 1 x 2 areas) and display an asterisk next to them in my scrape, secondly I want to know if its possible to scrape specific columns and rows and not everything do i need to use xpath?

I beg for someone to point me in the right direction I spent hours on this, the manual doesn't explain much http://simplehtmldom.sourceforge.net/manual.htm

  • 写回答

1条回答 默认 最新

  • doumen6605 2016-10-09 12:54
    关注

    Link is dead. However, you can do this with xPath and reference the cells that you want by their colour and order, and many more ways too.

    This snippet will give you the general gist; taken from a project I'm working on atm:

    function __construct($URL)
    {
    
        // make new DOM for nodes
        $this->dom = new DOMDocument();
    
        // set error level
        libxml_use_internal_errors(true);
    
        // Grab and set HTML Source
        $this->HTMLSource = file_get_contents($URL);
    
        // Load HTML into the dom
        $this->dom->loadHTML($this->HTMLSource);
    
        // Make xPath queryable
        $this->xpath = new DOMXPath($this->dom);
    }
    
    function xPathQuery($query){
        return $this->xpath->query($query);
    }
    

    Then simply pass a query to your DOMXPath, like //tr[1]

    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题