drtzb06222 2014-03-28 14:43
浏览 119

如果有数据拇指,simple_html_dom将不会抓取图像src

I have a problem that i don't understand why it's happening. On some elements where i have data-thumb inside img it won't grab the image src element and i can't figure out why.

Here is an example how html page is formatted. Let's call it somepage.com/search?q=singing

<div class="videos">
    <div class="thumbWrapper">
        <div class="postThumbnail">
           <img id="2019485" class="videoThumb" width="190" height="143" alt="some post title" src="http://imageurl.com/uploaded/image/3.jpg" category="7">
        </div>
    </div>
    <div class="thumbWrapper">
        <div class="postThumbnail">
           <img id="2019485" class="videoThumb" width="190" height="143" alt="some post title" data-thumb="http://imageurl.com/uploaded/image/3.jpg" src="http://imageurl.com/uploaded/image/3.jpg" category="7">
        </div>
    </div>
    <div class="thumbWrapper">
        <div class="postThumbnail">
           <img id="2019485" class="videoThumb" width="190" height="143" alt="some post title" data-thumb="http://imageurl.com/uploaded/image/3.jpg" src="http://imageurl.com/uploaded/image/3.jpg" category="7">
        </div>
    </div>
    <div class="thumbWrapper">
        <div class="postThumbnail">
           <img id="2019485" class="videoThumb" width="190" height="143" alt="some post title" src="http://imageurl.com/uploaded/image/3.jpg" category="7">
        </div>
    </div>
</div>

You see that there is data-thumb on some images, this is totally random, some have that, some not, on a same page.

Here is how i grab a page

            $get = curl_init();
            curl_setopt($get, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
            curl_setopt($get, CURLOPT_URL, 'somepage.com/search?q=singing');
            curl_setopt($get, CURLOPT_RETURNTRANSFER, 1);
            curl_setopt($get, CURLOPT_CONNECTTIMEOUT, 10);
            $str = curl_exec($get);
            curl_close($get);

            $URL = str_get_html($str);

And this works or at least i see it works, next step is to extract elements from page and get those thumbs.

            foreach($URL->find('div[class="thumbWrapper"]') as $video) {
                $thumb = $video->find('img[class="videoThumb"]');
                $image = $thumb[0]->src;
                }

And there i get the problem, on img elements where i have

data-thumb

It won't get an image.

On the simplehtmldom page it just says that i need to use like

$video->find('img');
$thumb->src;

But it won't work, i had to specify img class and use a [0] of an array. But i guess when there is data-thumb array is shifted so src is not more [0] in an array?

I don't know i just started using simplehtmldom and still learning, any suggestions?

  • 写回答

1条回答 默认 最新

  • douqianmin5367 2014-03-28 18:19
    关注

    I have found kinda workaround with this

                    if($thumb[0]->{'data-thumb'} != '') {
                        $image = $thumb[0]->{'data-thumb'};
                    } else {
                        $image = $thumb[0]->src;
                    }
    

    I don't know if it's best approach, but it works, if data-thumb exist get that image, else get src.

    评论

报告相同问题?

悬赏问题

  • ¥20 软件测试决策法疑问求解答
  • ¥15 win11 23H2删除推荐的项目,支持注册表等
  • ¥15 matlab 用yalmip搭建模型,cplex求解,线性化处理的方法
  • ¥15 qt6.6.3 基于百度云的语音识别 不会改
  • ¥15 关于#目标检测#的问题:大概就是类似后台自动检测某下架商品的库存,在他监测到该商品上架并且可以购买的瞬间点击立即购买下单
  • ¥15 神经网络怎么把隐含层变量融合到损失函数中?
  • ¥15 lingo18勾选global solver求解使用的算法
  • ¥15 全部备份安卓app数据包括密码,可以复制到另一手机上运行
  • ¥20 测距传感器数据手册i2c
  • ¥15 RPA正常跑,cmd输入cookies跑不出来