dongwei3336 2015-10-23 23:27
浏览 76
已采纳

内存高效的base64解码

We're having some trouble in our application with people pasting images into our rich-text WYSIWYG, at which point they exist as base64-encoded strings. eg:

<img src="..." />

The submission form is submitted and processed just fine, but when our application is generating a page containing multiple images it can cause PHP to hit its memory limit, as well as bloating page source, etc.

What I've done is written some code to add to our form processor to extract the embedded images, write them to a file, and then put the URL in the src attribute. The problem is that while processing an image memory usage spikes to 4x the size of the data which could potentially break the form processor as well.

My POC code:

<?php
function profile($label) {
    printf("%10s %11d %11d
", $label, memory_get_usage(), memory_get_peak_usage());
}

function handleEmbedded(&$src) {
    $dom = new DOMDocument;
    $dom->loadHTML($src);
    profile('domload');
    $images = $dom->getElementsByTagName('img');
    profile('getimgs');
    foreach ($images as $image) {
        if( strpos($image->getAttribute('src'), 'data:') === 0 ) {
            $image->setAttribute('src', saneImage($image->getAttribute('src')));
        }
    }
    profile('presave');
    $src = $dom->saveHTML();
    profile('postsave');
}

function saneImage($data) {
    $type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
    $filename = generateFilename('./', 'data_', $type);
    //file_put_contents($filename, base64_decode(substr($data, strpos($data, ';')+8)));
    $fh = fopen($filename, 'w');
    stream_filter_append($fh, 'convert.base64-decode');
    fwrite($fh, substr($data, strpos($data, ';')+8));
    fclose($fh);
    profile('filesaved');
    return $filename;
}

function generateFilename($dir, $prefix, $suffix) {
    $dir = preg_replace('@/$@', '', $dir);
    do {
        $filename = sprintf("%s/%s%s.%s", $dir, $prefix, md5(mt_rand()), $suffix);
    } while( file_exists($filename) );
    return "foo.$suffix";
    return $filename;
}

profile('start');
$src = file_get_contents('derp.txt');
profile('load');
handleEmbedded($src);
profile('end');

Output:

     start      236296      243048
      load     1306264     1325312
   domload     1306640     2378768
   getimgs     1306880     2378768
 filesaved     2371080     4501168
   presave     1307264     4501168
  postsave      244152     4501168
       end      243480     4501168

As you can see the memory usage still jumps into the 4MB range while the file is saved, despite trying to shave bytes by using a stream filter. I think that there's some buffering happening in the background, and if I was simply transcribing between files I'd break the data into chunks, but I don't know if that is feasible/advisable in this case.

Is there anywhere I might be able to pare down my memory usage?


Notes:

  • file_put_contents() and changing handleEmbedded() to not pass by reference have the same memory usage.
  • derp.txt contains a snippet of HTML with a single base64-encoded image.
  • 4MB is not the end of the world, however just yesterday someone tried to upload a 61MB JPEG so who knows what someone will put in a richtext box. :I
  • 写回答

1条回答 默认 最新

  • douyan1921 2015-10-26 17:11
    关注

    Props to Norbert for punching a hole in my mental block:

    function saneImage($data) {
        $type = explode('/', substr($data, 5, strpos($data, ';')-5))[1];
        $filename = generateFilename('./', 'data_', $type);
        writefile($filename, $data);
        profile('filesaved');
        return $filename;
    }
    
    function writefile($filename, $data) {
        $fh = fopen($filename, 'w');
        stream_filter_append($fh, 'convert.base64-decode');
        $chunksize=12*1024;
        $offset = strpos($data, ';')+8;
        for( $i=0; $chunk=substr($data,($chunksize*$i)+$offset,$chunksize); $i++ ) {
            fwrite($fh, $chunk);
        }
        fclose($fh);
    }
    

    Output:

         start      237952      244672
          load     1307920     1327000
       domload     1308296     2380664
       getimgs     1308536     2380664
     filesaved     2372712     2400592
       presave     1308944     2400592
      postsave      245832     2400592
           end      245160     2400592
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥100 Jenkins自动化部署—悬赏100元
  • ¥15 关于#python#的问题:求帮写python代码
  • ¥20 MATLAB画图图形出现上下震荡的线条
  • ¥15 关于#windows#的问题:怎么用WIN 11系统的电脑 克隆WIN NT3.51-4.0系统的硬盘
  • ¥15 perl MISA分析p3_in脚本出错
  • ¥15 k8s部署jupyterlab,jupyterlab保存不了文件
  • ¥15 ubuntu虚拟机打包apk错误
  • ¥199 rust编程架构设计的方案 有偿
  • ¥15 回答4f系统的像差计算
  • ¥15 java如何提取出pdf里的文字?