dongwei3336 2015-10-23 23:27
浏览 76
已采纳

内存高效的base64解码

We're having some trouble in our application with people pasting images into our rich-text WYSIWYG, at which point they exist as base64-encoded strings. eg:

<img src="..." />

The submission form is submitted and processed just fine, but when our application is generating a page containing multiple images it can cause PHP to hit its memory limit, as well as bloating page source, etc.

What I've done is written some code to add to our form processor to extract the embedded images, write them to a file, and then put the URL in the src attribute. The problem is that while processing an image memory usage spikes to 4x the size of the data which could potentially break the form processor as well.

My POC code:

<?php
function profile($label) {
    printf("%10s %11d %11d
", $label, memory_get_usage(), memory_get_peak_usage());
}

function handleEmbedded(&$src) {
    $dom = new DOMDocument;
    $dom->loadHTML($src);
    profile('domload');
    $images = $dom->getElementsByTagName('img');
    profile('getimgs');
    foreach ($images as $image) {
        if( strpos($image->getAttribute('src'), 'data:') === 0 ) {
            $image->setAttribute('src', saneImage($image->getAttribute('src')));
        }
    }
    profile('presave');
    $src = $dom->saveHTML();
    profile('postsave');
}

function saneImage($data) {
    $type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
    $filename = generateFilename('./', 'data_', $type);
    //file_put_contents($filename, base64_decode(substr($data, strpos($data, ';')+8)));
    $fh = fopen($filename, 'w');
    stream_filter_append($fh, 'convert.base64-decode');
    fwrite($fh, substr($data, strpos($data, ';')+8));
    fclose($fh);
    profile('filesaved');
    return $filename;
}

function generateFilename($dir, $prefix, $suffix) {
    $dir = preg_replace('@/$@', '', $dir);
    do {
        $filename = sprintf("%s/%s%s.%s", $dir, $prefix, md5(mt_rand()), $suffix);
    } while( file_exists($filename) );
    return "foo.$suffix";
    return $filename;
}

profile('start');
$src = file_get_contents('derp.txt');
profile('load');
handleEmbedded($src);
profile('end');

Output:

     start      236296      243048
      load     1306264     1325312
   domload     1306640     2378768
   getimgs     1306880     2378768
 filesaved     2371080     4501168
   presave     1307264     4501168
  postsave      244152     4501168
       end      243480     4501168

As you can see the memory usage still jumps into the 4MB range while the file is saved, despite trying to shave bytes by using a stream filter. I think that there's some buffering happening in the background, and if I was simply transcribing between files I'd break the data into chunks, but I don't know if that is feasible/advisable in this case.

Is there anywhere I might be able to pare down my memory usage?


Notes:

  • file_put_contents() and changing handleEmbedded() to not pass by reference have the same memory usage.
  • derp.txt contains a snippet of HTML with a single base64-encoded image.
  • 4MB is not the end of the world, however just yesterday someone tried to upload a 61MB JPEG so who knows what someone will put in a richtext box. :I
  • 写回答

1条回答 默认 最新

  • douyan1921 2015-10-26 17:11
    关注

    Props to Norbert for punching a hole in my mental block:

    function saneImage($data) {
        $type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
        $filename = generateFilename('./', 'data_', $type);
        writefile($filename, $data);
        profile('filesaved');
        return $filename;
    }
    
    function writefile($filename, $data) {
        $fh = fopen($filename, 'w');
        stream_filter_append($fh, 'convert.base64-decode');
        $chunksize=12*1024;
        $offset = strpos($data, ';')+8;
        for( $i=0; $chunk=substr($data,($chunksize*$i)+$offset,$chunksize); $i++ ) {
            fwrite($fh, $chunk);
        }
        fclose($fh);
    }
    

    Output:

         start      237952      244672
          load     1307920     1327000
       domload     1308296     2380664
       getimgs     1308536     2380664
     filesaved     2372712     2400592
       presave     1308944     2400592
      postsave      245832     2400592
           end      245160     2400592
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 GD32 SPI通信时我从机原样返回收到的数据怎么弄?
  • ¥15 phython读取excel表格报错 ^7个 SyntaxError: invalid syntax 语句报错
  • ¥20 @microsoft/fetch-event-source 流式响应问题
  • ¥15 ogg dd trandata 报错
  • ¥15 高缺失率数据如何选择填充方式
  • ¥50 potsgresql15备份问题
  • ¥15 Mac系统vs code使用phpstudy如何配置debug来调试php
  • ¥15 目前主流的音乐软件,像网易云音乐,QQ音乐他们的前端和后台部分是用的什么技术实现的?求解!
  • ¥60 pb数据库修改与连接
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?