dongluanban3536 2014-11-11 01:18
浏览 24
已采纳

简单的HTML DOM Parser刮div

I am trying to scrape some data, with Simple HTML DOM Parser, from a page that has the following structure:

    <div class='image'>
        <img class='a' src='1.jpg'>
    </div>
    <div class='data'>
        lorem ipsum 1
   </div>
    <div class='data'>
        lorem ipsum 2
   </div>
    <div class='data'>
        lorem ipsum 3
   </div>

    <div class='image'>
        <img class='a' src='2.jpg'>
    </div>
    <div class='data'>
        lorem ipsum 4
   </div>

    <div class='image'>
       <img class='a' src='3.jpg'>
    </div>
    <div class='data'>
        lorem ipsum 5
   </div>
        <div class='data'>
            lorem ipsum 6
       </div>

I can easily get all the data. My problem is that I cannot associate the images with the data divs underneath. (Divs are not nested)

I need to associate image 1.jpg with data 1, 2 & 3 image 2.jpg with data 4 image 3.jpg with data 5,6

The number of divs between the image divs are random

Is there any way to count the number of divs between two divs with class image even if they are not nested.

I apologize if the question seems complicated, but I assure you the question is very simple if you look at it carefully.

  • 写回答

1条回答 默认 最新

  • duancenxiao0482 2014-11-11 01:46
    关注

    You could try to check the sequences by using a loop (foreach). Check if the div has an image class, if it has increment the grouping key, else, use the current key and push the data inside.

    Rough example:

    $data = array();
    $html = str_get_html($html_markup);
    $current_key = 0;
    foreach ($html->find('div') as $div) {
        if($div->class == 'image') {
            $current_key++;
            $data[$current_key]['image'] = $div->find('img', 0)->src;
        }
    
        if($div->class == 'data') {
            $data[$current_key]['data'][] = $div->innertext;
        }
    }
    
    echo '<pre>';
    print_r($data);
    

    The data should be grouped something like this:

    Array
    (
        [1] => Array
        (
            [image] => 1.jpg
            [data] => Array
            (
                [0] =>      lorem ipsum 1 
                [1] =>      lorem ipsum 2 
                [2] =>      lorem ipsum 3 
            )
        )
    
        [2] => Array
        (
            [image] => 2.jpg
            [data] => Array
            (
                [0] =>      lorem ipsum 4 
            )
        )
    
        [3] => Array
        (
            [image] => 3.jpg
            [data] => Array
            (
                [0] =>      lorem ipsum 5 
                [1] =>      lorem ipsum 6 
            )
    
        )
    )
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 使用ue5插件narrative时如何切换关卡也保存叙事任务记录
  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行
  • ¥20 测距传感器数据手册i2c