dongtiao2105 2013-08-07 02:15
浏览 39
已采纳

整数字符串压缩算法[关闭]

can someone please name an existing algo which is used for compressing numbers? numbers are integers and totally random with no spaces and decimals, eg. 35637462736423478235687479567456....n

well, so far, all i have is this, it converts the integers into ascii reducing approx 40% of the original size

function intergerToChar($v)
{
    $buffer="";
    $charsLen=strlen($v);
    for($i = 0; $i <= $charsLen; $i++)
    {     
        $asc=$v[$i];
        if($asc==0){$buffer[]=0;}
        elseif($asc==1){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        elseif($asc==2)
        {
            if($v[$i+1]<5){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
            elseif($v[$i+1]==5 && $v[$i+2]<6){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
            else{$buffer[]=$v[$i].$v[$i+1];$i++;}       
        }
        else{$buffer[]=$v[$i].$v[$i+1];$i++;}  
    }
    return $buffer;   
}

btw, i know PHP is not meant for building a compression tool. I'll be using C/C++

UPDATE: This is another PHP code with better compressing result than the above code, it can compress upto 66% if the integers on the position 1st,6th,12,th, and so on has vales of less than 256 and the 3 integers following them have a values not more than 256 than the preceding 3 integers egs, 134298156286159.... can be compressed upto 66% i knw its not optimal, please feel free to make suggestions/corrections

function intergerToChar2($v)
{
    $buffer="";
    $charsLen=strlen($v);
    for($i = 0; $i <= $charsLen; $i++)
    {     
        if($v[$i].$v[$i+1].$v[$i+2]<256){$base=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        else{$base=$v[$i].$v[$i+1];$i=$i+1;}$i=$i+1;

        if($v[$i].$v[$i+1].$v[$i+2]<256){$next=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        else{$next=$v[$i].$v[$i+1];$i=$i+1;}

        if($next!=="")
        {
            $next=$next-$base;
            if($next<0)$next=255+$next;
        }

        $buffer[]=$base;
        $buffer[]=$next;
    }
    return $buffer;   
}

btw, 10 bit encoding or 40 bit encoding can be easily done using base_convert() or 4th comment from http://php.net/manual/en/ref.bc.php page which always shows a compression of about 58.6%.

  • 写回答

1条回答 默认 最新

  • douzi9744 2013-08-07 03:51
    关注

    If the digits are random, then you cannot compress the sequence more than the information-theoretic limit, which is log210 bits/digit. (Actually, it's slightly more than that unless the precise length of the string is fixed.) You can achieve that limit by representing the digits as a (very long) binary number; however, that's awkward and timeconsuming to compress and decompress.

    A very near optimal solution results from the fact that 1000 is only slightly less than 210, so you can represent 3 digits using 10 bits. That's 3.33 bits/digits, compared with the theoretically optimal 3.32 bits/digit. (In other words, it's about 99.7% optimal.)

    Since there actually 1024 possible 10-bit codes, and you only need 1000 of them to represent 3 digits, you have some left over; one of them can be used to indicate the end of the stream, if necessary.

    It's a little bit annoying to output 10-bit numbers. It's easier to output 40-bit numbers, since 40 bits is exactly five bytes. Fortunately, most languages these days support 40-bit arithmetic (actually 64-bit arithmetic).

    (Note: This is not that different from your solution. But it's a bit easier and a bit more compressed.)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 有没有会使用flac3d软件的家人
  • ¥20 360摄像头无法解绑使用,请教解绑当前账号绑定问题,
  • ¥15 docker实践项目
  • ¥15 数电几道习题,写出作答过程,ai一律不采用
  • ¥15 利用pthon计算薄膜结构的光导纳
  • ¥15 海康hlss视频流怎么播放
  • ¥15 Paddleocr:out of memory error on GPU
  • ¥30 51单片机C语言数码管驱动单片机为AT89C52
  • ¥100 只改动本课件的 cal_portfolio_weight_series(decision_date), 跑完本课件。设计一个信息比率尽量高的策略。
  • ¥20 如何在visual studio 2022中添加ImageMagick库