dongtiao2105 2013-08-07 02:15
浏览 39
已采纳

整数字符串压缩算法[关闭]

can someone please name an existing algo which is used for compressing numbers? numbers are integers and totally random with no spaces and decimals, eg. 35637462736423478235687479567456....n

well, so far, all i have is this, it converts the integers into ascii reducing approx 40% of the original size

function intergerToChar($v)
{
    $buffer="";
    $charsLen=strlen($v);
    for($i = 0; $i <= $charsLen; $i++)
    {     
        $asc=$v[$i];
        if($asc==0){$buffer[]=0;}
        elseif($asc==1){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        elseif($asc==2)
        {
            if($v[$i+1]<5){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
            elseif($v[$i+1]==5 && $v[$i+2]<6){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
            else{$buffer[]=$v[$i].$v[$i+1];$i++;}       
        }
        else{$buffer[]=$v[$i].$v[$i+1];$i++;}  
    }
    return $buffer;   
}

btw, i know PHP is not meant for building a compression tool. I'll be using C/C++

UPDATE: This is another PHP code with better compressing result than the above code, it can compress upto 66% if the integers on the position 1st,6th,12,th, and so on has vales of less than 256 and the 3 integers following them have a values not more than 256 than the preceding 3 integers egs, 134298156286159.... can be compressed upto 66% i knw its not optimal, please feel free to make suggestions/corrections

function intergerToChar2($v)
{
    $buffer="";
    $charsLen=strlen($v);
    for($i = 0; $i <= $charsLen; $i++)
    {     
        if($v[$i].$v[$i+1].$v[$i+2]<256){$base=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        else{$base=$v[$i].$v[$i+1];$i=$i+1;}$i=$i+1;

        if($v[$i].$v[$i+1].$v[$i+2]<256){$next=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        else{$next=$v[$i].$v[$i+1];$i=$i+1;}

        if($next!=="")
        {
            $next=$next-$base;
            if($next<0)$next=255+$next;
        }

        $buffer[]=$base;
        $buffer[]=$next;
    }
    return $buffer;   
}

btw, 10 bit encoding or 40 bit encoding can be easily done using base_convert() or 4th comment from http://php.net/manual/en/ref.bc.php page which always shows a compression of about 58.6%.

  • 写回答

1条回答 默认 最新

  • douzi9744 2013-08-07 03:51
    关注

    If the digits are random, then you cannot compress the sequence more than the information-theoretic limit, which is log210 bits/digit. (Actually, it's slightly more than that unless the precise length of the string is fixed.) You can achieve that limit by representing the digits as a (very long) binary number; however, that's awkward and timeconsuming to compress and decompress.

    A very near optimal solution results from the fact that 1000 is only slightly less than 210, so you can represent 3 digits using 10 bits. That's 3.33 bits/digits, compared with the theoretically optimal 3.32 bits/digit. (In other words, it's about 99.7% optimal.)

    Since there actually 1024 possible 10-bit codes, and you only need 1000 of them to represent 3 digits, you have some left over; one of them can be used to indicate the end of the stream, if necessary.

    It's a little bit annoying to output 10-bit numbers. It's easier to output 40-bit numbers, since 40 bits is exactly five bytes. Fortunately, most languages these days support 40-bit arithmetic (actually 64-bit arithmetic).

    (Note: This is not that different from your solution. But it's a bit easier and a bit more compressed.)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 App的会员连续扣费
  • ¥15 不同数据类型的特征融合应该怎么做
  • ¥15 用proteus软件设计一个基于8086微处理器的简易温度计
  • ¥15 用联想小新14Pro
  • ¥15 multisim中关于74ls192n和DSWPK开关仿真图分析(减法计数器)
  • ¥15 基于8255的交通灯设计
  • ¥15 w3wp,exe 中发生未处理的 Microsoft ,NETFramework 异常。
  • ¥20 C51单片机程序及仿真(加减器)
  • ¥15 AQWA | 水动力分析 二阶波浪力
  • ¥15 c语言题目:成绩管理系统