dongwenhui8900 2019-04-19 18:54 采纳率: 0%
浏览 69

如何使用file_get_contents作为数组来获取图像

I have the following problem with getting images as array. In this code I'm trying to check if images for search Test 1 exist - if yes, then display, if not then try with Test 2 and that's it. Current code can do it but is super slow.

This if (sizeof($matches[1]) > 3) { because this 3 sometimes contains advertisement on crawled website, so this is my secure how to skip it.

My question is how I can speed up code below to get if (sizeof($matches[1]) > 3) { faster? I believe that this makes code very slow, because this array may contain up to 1000 images

$get_search = 'Test 1';

$html = file_get_contents('https://www.everypixel.com/search?q='.$get_search.'&is_id=1&st=free');
preg_match_all('|<img.*?src=[\'"](.*?)[\'"].*?>|i', $html, $matches);

if (sizeof($matches[1]) > 3) {
  $ch_foreach = 1;
}

if ($ch_foreach == 0) {

    $get_search = 'Test 2';

  $html = file_get_contents('https://www.everypixel.com/search?q='.$get_search.'&is_id=1&st=free');
  preg_match_all('|<img.*?src=[\'"](.*?)[\'"].*?>|i', $html, $matches);

  if (sizeof($matches[1]) > 3) {
     $ch_foreach = 1;
  }

}

foreach ($matches[1] as $match) if ($tmp++ < 20) {

  if (@getimagesize($match)) {

    // display image
    echo $match;

  }

}
  • 写回答

1条回答 默认 最新

  • duanchuonong5370 2019-04-20 08:26
    关注
    $html = file_get_contents('https://www.everypixel.com/search?q='.$get_search.'&is_id=1&st=free');
    

    unless the www.everypixel.com server is is on the same LAN (in which case compression overhead may be slower than transferring it in plain), curl with CURLOPT_ENCODING should do this faster than file_get_contents, and even if it is on the same lan, curl should be faster than file_get_contents because file_get_contents keeps reading until the server close the connection, but curl keeps reading until Content-Length bytes has been read, which is faster than waiting for a server to close a socket, so do this instead:

    $ch=curl_init('https://www.everypixel.com/search?q='.$get_search.'&is_id=1&st=free');
    curl_setopt_array($ch,array(CURLOPT_ENCODING=>'',CURLOPT_RETURNTRANSFER=>1));
    $html=curl_exec($ch);
    

    about your regex:

    preg_match_all('|<img.*?src=[\'"](.*?)[\'"].*?>|i', $html, $matches);
    

    DOMDocument with getElementsByTagName("img") and getAttribute("src") should be faster than using your regex, so do this instead:

    $domd=@DOMDocument::loadHTML($html);
    $urls=[];
    foreach($domd->getElementsByTagName("img") as $img){
        $url=$img->getAttribute("src");
        if(!empty($url)){
            $urls[]=$url;
        }
    }
    

    and probably the slowest part of your entire code, the @getimagesize($match) inside a loop potentially containing over 1000 urls, every call to getimagesize() with an url makes php download the image, and it uses the file_get_contents method meaning it suffers from the same Content-Length issue that makes file_get_contents slow. in addition, all the images are downloaded sequentially, downloading them in parallel should be much faster, which can be done with the curl_multi api, but doing that is a complex task and i cba writing an example for you, but i can point you to an example: https://stackoverflow.com/a/54717579/1067003

    评论

报告相同问题?

悬赏问题

  • ¥20 模型在y分布之外的数据上预测能力不好如何解决
  • ¥15 processing提取音乐节奏
  • ¥15 gg加速器加速游戏时,提示不是x86架构
  • ¥15 python按要求编写程序
  • ¥15 Python输入字符串转化为列表排序具体见图,严格按照输入
  • ¥20 XP系统在重新启动后进不去桌面,一直黑屏。
  • ¥15 opencv图像处理,需要四个处理结果图
  • ¥15 无线移动边缘计算系统中的系统模型
  • ¥15 深度学习中的画图问题
  • ¥15 java报错:使用mybatis plus查询一个只返回一条数据的sql,却报错返回了1000多条