douye6812 2013-11-07 18:42
浏览 159
已采纳

使用Simple HTML DOM Parser从HTML中提取数据

For a college project, I am creating a website with some back end algorithms and to test these in a demo environment I require a lot of fake data. To get this data I intend to scrape some sites. One of these sites is freelance.com.To extract the data I am using the Simple HTML DOM Parser but so far I have been unsuccessful in my efforts to actually get the data I need.

Here is an example of the HTML layout of the page I intend to scrape. The red boxes mark the required data.

Screenshot of HTML Code on Freelance.com

Here is the code I have written so far after following some tutorials.

<?php
include "simple_html_dom.php";
// Create DOM from URL
$html = file_get_html('http://www.freelancer.com/jobs/Website-Design/1/');

//Get all data inside the <tr> of <table id="project_table">
foreach($html->find('table[id=project_table] tr') as $tr) {

    foreach($tr->find('td[class=title-col]') as $t) {
        //get the inner HTML
        $data = $t->outertext;
        echo $data;
    }
}

?>

Hopefully someone can point me in the right direction as to how I can get this working.

Thanks.

  • 写回答

1条回答 默认 最新

  • dongxizhe9755 2013-11-07 22:19
    关注

    The raw source code is different, that's why you're not getting the expected results...

    You can check the raw source code using ctrl+u, the data are in table[id=project_table_static], and the cells td have no attributes, so, here's a working code to get all the URLs from the table:

    $url = 'http://www.freelancer.com/jobs/Website-Design/1/';
    // Create DOM from URL
    $html = file_get_html($url);
    
    //Get all data inside the <tr> of <table id="project_table">
    foreach($html->find('table#project_table_static tbody tr') as $i=>$tr) {
    
        // Skip the first empty element
        if ($i==0) {
            continue;
        }
    
        echo "<br/>\$i=".$i;
    
        // get the first anchor
        $anchor = $tr->find('a', 0);
        echo " => ".$anchor->href;
    }
    
    // Clear dom object
    $html->clear(); 
    unset($html);
    

    Demo

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 做个有关计算的小程序
  • ¥15 MPI读取tif文件无法正常给各进程分配路径
  • ¥15 如何用MATLAB实现以下三个公式(有相互嵌套)
  • ¥30 关于#算法#的问题:运用EViews第九版本进行一系列计量经济学的时间数列数据回归分析预测问题 求各位帮我解答一下
  • ¥15 setInterval 页面闪烁,怎么解决
  • ¥15 如何让企业微信机器人实现消息汇总整合
  • ¥50 关于#ui#的问题:做yolov8的ui界面出现的问题
  • ¥15 如何用Python爬取各高校教师公开的教育和工作经历
  • ¥15 TLE9879QXA40 电机驱动
  • ¥20 对于工程问题的非线性数学模型进行线性化