dongluanban3536 2014-11-11 01:18
浏览 24
已采纳

简单的HTML DOM Parser刮div

I am trying to scrape some data, with Simple HTML DOM Parser, from a page that has the following structure:

    <div class='image'>
        <img class='a' src='1.jpg'>
    </div>
    <div class='data'>
        lorem ipsum 1
   </div>
    <div class='data'>
        lorem ipsum 2
   </div>
    <div class='data'>
        lorem ipsum 3
   </div>

    <div class='image'>
        <img class='a' src='2.jpg'>
    </div>
    <div class='data'>
        lorem ipsum 4
   </div>

    <div class='image'>
       <img class='a' src='3.jpg'>
    </div>
    <div class='data'>
        lorem ipsum 5
   </div>
        <div class='data'>
            lorem ipsum 6
       </div>

I can easily get all the data. My problem is that I cannot associate the images with the data divs underneath. (Divs are not nested)

I need to associate image 1.jpg with data 1, 2 & 3 image 2.jpg with data 4 image 3.jpg with data 5,6

The number of divs between the image divs are random

Is there any way to count the number of divs between two divs with class image even if they are not nested.

I apologize if the question seems complicated, but I assure you the question is very simple if you look at it carefully.

  • 写回答

1条回答 默认 最新

  • duancenxiao0482 2014-11-11 01:46
    关注

    You could try to check the sequences by using a loop (foreach). Check if the div has an image class, if it has increment the grouping key, else, use the current key and push the data inside.

    Rough example:

    $data = array();
    $html = str_get_html($html_markup);
    $current_key = 0;
    foreach ($html->find('div') as $div) {
        if($div->class == 'image') {
            $current_key++;
            $data[$current_key]['image'] = $div->find('img', 0)->src;
        }
    
        if($div->class == 'data') {
            $data[$current_key]['data'][] = $div->innertext;
        }
    }
    
    echo '<pre>';
    print_r($data);
    

    The data should be grouped something like this:

    Array
    (
        [1] => Array
        (
            [image] => 1.jpg
            [data] => Array
            (
                [0] =>      lorem ipsum 1 
                [1] =>      lorem ipsum 2 
                [2] =>      lorem ipsum 3 
            )
        )
    
        [2] => Array
        (
            [image] => 2.jpg
            [data] => Array
            (
                [0] =>      lorem ipsum 4 
            )
        )
    
        [3] => Array
        (
            [image] => 3.jpg
            [data] => Array
            (
                [0] =>      lorem ipsum 5 
                [1] =>      lorem ipsum 6 
            )
    
        )
    )
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 IAR程序莫名变量多重定义
  • ¥15 (标签-UDP|关键词-client)
  • ¥15 关于库卡officelite无法与虚拟机通讯的问题
  • ¥100 已有python代码,要求做成可执行程序,程序设计内容不多
  • ¥15 目标检测项目无法读取视频
  • ¥15 GEO datasets中基因芯片数据仅仅提供了normalized signal如何进行差异分析
  • ¥100 求采集电商背景音乐的方法
  • ¥15 数学建模竞赛求指导帮助
  • ¥15 STM32控制MAX7219问题求解答
  • ¥20 在本地部署CHATRWKV时遇到了AttributeError: 'str' object has no attribute 'requires_grad'