dongtiao2105 2013-08-07 02:15
浏览 39
已采纳

整数字符串压缩算法[关闭]

can someone please name an existing algo which is used for compressing numbers? numbers are integers and totally random with no spaces and decimals, eg. 35637462736423478235687479567456....n

well, so far, all i have is this, it converts the integers into ascii reducing approx 40% of the original size

function intergerToChar($v)
{
    $buffer="";
    $charsLen=strlen($v);
    for($i = 0; $i <= $charsLen; $i++)
    {     
        $asc=$v[$i];
        if($asc==0){$buffer[]=0;}
        elseif($asc==1){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        elseif($asc==2)
        {
            if($v[$i+1]<5){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
            elseif($v[$i+1]==5 && $v[$i+2]<6){$buffer[]=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
            else{$buffer[]=$v[$i].$v[$i+1];$i++;}       
        }
        else{$buffer[]=$v[$i].$v[$i+1];$i++;}  
    }
    return $buffer;   
}

btw, i know PHP is not meant for building a compression tool. I'll be using C/C++

UPDATE: This is another PHP code with better compressing result than the above code, it can compress upto 66% if the integers on the position 1st,6th,12,th, and so on has vales of less than 256 and the 3 integers following them have a values not more than 256 than the preceding 3 integers egs, 134298156286159.... can be compressed upto 66% i knw its not optimal, please feel free to make suggestions/corrections

function intergerToChar2($v)
{
    $buffer="";
    $charsLen=strlen($v);
    for($i = 0; $i <= $charsLen; $i++)
    {     
        if($v[$i].$v[$i+1].$v[$i+2]<256){$base=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        else{$base=$v[$i].$v[$i+1];$i=$i+1;}$i=$i+1;

        if($v[$i].$v[$i+1].$v[$i+2]<256){$next=$v[$i].$v[$i+1].$v[$i+2];$i=$i+2;}
        else{$next=$v[$i].$v[$i+1];$i=$i+1;}

        if($next!=="")
        {
            $next=$next-$base;
            if($next<0)$next=255+$next;
        }

        $buffer[]=$base;
        $buffer[]=$next;
    }
    return $buffer;   
}

btw, 10 bit encoding or 40 bit encoding can be easily done using base_convert() or 4th comment from http://php.net/manual/en/ref.bc.php page which always shows a compression of about 58.6%.

  • 写回答

1条回答 默认 最新

  • douzi9744 2013-08-07 03:51
    关注

    If the digits are random, then you cannot compress the sequence more than the information-theoretic limit, which is log210 bits/digit. (Actually, it's slightly more than that unless the precise length of the string is fixed.) You can achieve that limit by representing the digits as a (very long) binary number; however, that's awkward and timeconsuming to compress and decompress.

    A very near optimal solution results from the fact that 1000 is only slightly less than 210, so you can represent 3 digits using 10 bits. That's 3.33 bits/digits, compared with the theoretically optimal 3.32 bits/digit. (In other words, it's about 99.7% optimal.)

    Since there actually 1024 possible 10-bit codes, and you only need 1000 of them to represent 3 digits, you have some left over; one of them can be used to indicate the end of the stream, if necessary.

    It's a little bit annoying to output 10-bit numbers. It's easier to output 40-bit numbers, since 40 bits is exactly five bytes. Fortunately, most languages these days support 40-bit arithmetic (actually 64-bit arithmetic).

    (Note: This is not that different from your solution. But it's a bit easier and a bit more compressed.)

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!
  • ¥15 drone 推送镜像时候 purge: true 推送完毕后没有删除对应的镜像,手动拷贝到服务器执行结果正确在样才能让指令自动执行成功删除对应镜像,如何解决?
  • ¥15 求daily translation(DT)偏差订正方法的代码
  • ¥15 js调用html页面需要隐藏某个按钮
  • ¥15 ads仿真结果在圆图上是怎么读数的
  • ¥20 Cotex M3的调试和程序执行方式是什么样的?
  • ¥20 java项目连接sqlserver时报ssl相关错误
  • ¥15 一道python难题3
  • ¥15 牛顿斯科特系数表表示