douye6812 2013-11-07 18:42
浏览 159
已采纳

使用Simple HTML DOM Parser从HTML中提取数据

For a college project, I am creating a website with some back end algorithms and to test these in a demo environment I require a lot of fake data. To get this data I intend to scrape some sites. One of these sites is freelance.com.To extract the data I am using the Simple HTML DOM Parser but so far I have been unsuccessful in my efforts to actually get the data I need.

Here is an example of the HTML layout of the page I intend to scrape. The red boxes mark the required data.

Screenshot of HTML Code on Freelance.com

Here is the code I have written so far after following some tutorials.

<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {

    foreach($tr->find('td[class=title-col]') as $t) {
        //get the inner HTML
        $data = $t->outertext;
        echo $data;
    }
}

?>

Hopefully someone can point me in the right direction as to how I can get this working.

Thanks.

  • 写回答

1条回答 默认 最新

  • dongxizhe9755 2013-11-07 22:19
    关注

    The raw source code is different, that's why you're not getting the expected results...

    You can check the raw source code using ctrl+u, the data are in table[id=project_table_static], and the cells td have no attributes, so, here's a working code to get all the URLs from the table:

    $url = 'http://www.freelancer.com/jobs/Website-Design/1/';
    // Create DOM from URL
    $html = file_get_html($url);
    
    //Get all data inside the <tr> of <table id="project_table">
    foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {
    
        // Skip the first empty element
        if ($i==0) {
            continue;
        }
    
        echo "<br/>\$i=".$i;
    
        // get the first anchor
        $anchor = $tr->find('a', 0);
        echo " => ".$anchor->href;
    }
    
    // Clear dom object
    $html->clear(); 
    unset($html);
    

    Demo

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 HFSS 中的 H 场图与 MATLAB 中绘制的 B1 场 部分对应不上
  • ¥15 如何在scanpy上做差异基因和通路富集?
  • ¥20 关于#硬件工程#的问题,请各位专家解答!
  • ¥15 关于#matlab#的问题:期望的系统闭环传递函数为G(s)=wn^2/s^2+2¢wn+wn^2阻尼系数¢=0.707,使系统具有较小的超调量
  • ¥15 FLUENT如何实现在堆积颗粒的上表面加载高斯热源
  • ¥30 截图中的mathematics程序转换成matlab
  • ¥15 动力学代码报错,维度不匹配
  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 報錯:Person is not mapped,如何解決?