dongtiao2105 2013-08-07 02:15

# 整数字符串压缩算法[关闭]

can someone please name an existing algo which is used for compressing numbers? numbers are integers and totally random with no spaces and decimals, eg. 35637462736423478235687479567456....n

well, so far, all i have is this, it converts the integers into ascii reducing approx 40% of the original size

``````function intergerToChar(\$v)
{
\$buffer="";
\$charsLen=strlen(\$v);
for(\$i = 0; \$i <= \$charsLen; \$i++)
{
\$asc=\$v[\$i];
if(\$asc==0){\$buffer[]=0;}
elseif(\$asc==1){\$buffer[]=\$v[\$i].\$v[\$i+1].\$v[\$i+2];\$i=\$i+2;}
elseif(\$asc==2)
{
if(\$v[\$i+1]<5){\$buffer[]=\$v[\$i].\$v[\$i+1].\$v[\$i+2];\$i=\$i+2;}
elseif(\$v[\$i+1]==5 && \$v[\$i+2]<6){\$buffer[]=\$v[\$i].\$v[\$i+1].\$v[\$i+2];\$i=\$i+2;}
else{\$buffer[]=\$v[\$i].\$v[\$i+1];\$i++;}
}
else{\$buffer[]=\$v[\$i].\$v[\$i+1];\$i++;}
}
return \$buffer;
}
``````

btw, i know PHP is not meant for building a compression tool. I'll be using C/C++

UPDATE: This is another PHP code with better compressing result than the above code, it can compress upto 66% if the integers on the position 1st,6th,12,th, and so on has vales of less than 256 and the 3 integers following them have a values not more than 256 than the preceding 3 integers egs, `1`34`2`98`1`56`2`86`1`59.... can be compressed upto 66% i knw its not optimal, please feel free to make suggestions/corrections

``````function intergerToChar2(\$v)
{
\$buffer="";
\$charsLen=strlen(\$v);
for(\$i = 0; \$i <= \$charsLen; \$i++)
{
if(\$v[\$i].\$v[\$i+1].\$v[\$i+2]<256){\$base=\$v[\$i].\$v[\$i+1].\$v[\$i+2];\$i=\$i+2;}
else{\$base=\$v[\$i].\$v[\$i+1];\$i=\$i+1;}\$i=\$i+1;

if(\$v[\$i].\$v[\$i+1].\$v[\$i+2]<256){\$next=\$v[\$i].\$v[\$i+1].\$v[\$i+2];\$i=\$i+2;}
else{\$next=\$v[\$i].\$v[\$i+1];\$i=\$i+1;}

if(\$next!=="")
{
\$next=\$next-\$base;
if(\$next<0)\$next=255+\$next;
}

\$buffer[]=\$base;
\$buffer[]=\$next;
}
return \$buffer;
}
``````

btw, 10 bit encoding or 40 bit encoding can be easily done using base_convert() or 4th comment from http://php.net/manual/en/ref.bc.php page which always shows a compression of about 58.6%.

• 写回答

#### 1条回答默认 最新

• douzi9744 2013-08-07 03:51
关注

If the digits are random, then you cannot compress the sequence more than the information-theoretic limit, which is log210 bits/digit. (Actually, it's slightly more than that unless the precise length of the string is fixed.) You can achieve that limit by representing the digits as a (very long) binary number; however, that's awkward and timeconsuming to compress and decompress.

A very near optimal solution results from the fact that 1000 is only slightly less than 210, so you can represent 3 digits using 10 bits. That's 3.33 bits/digits, compared with the theoretically optimal 3.32 bits/digit. (In other words, it's about 99.7% optimal.)

Since there actually 1024 possible 10-bit codes, and you only need 1000 of them to represent 3 digits, you have some left over; one of them can be used to indicate the end of the stream, if necessary.

It's a little bit annoying to output 10-bit numbers. It's easier to output 40-bit numbers, since 40 bits is exactly five bytes. Fortunately, most languages these days support 40-bit arithmetic (actually 64-bit arithmetic).

(Note: This is not that different from your solution. But it's a bit easier and a bit more compressed.)

本回答被题主选为最佳回答 , 对您是否有帮助呢?
评论

#### 悬赏问题

• ¥15 App的会员连续扣费
• ¥15 不同数据类型的特征融合应该怎么做
• ¥15 用proteus软件设计一个基于8086微处理器的简易温度计
• ¥15 用联想小新14Pro
• ¥15 multisim中关于74ls192n和DSWPK开关仿真图分析（减法计数器）
• ¥15 基于8255的交通灯设计
• ¥15 w3wp,exe 中发生未处理的 Microsoft ,NETFramework 异常。
• ¥20 C51单片机程序及仿真(加减器)
• ¥15 AQWA | 水动力分析 二阶波浪力
• ¥15 c语言题目:成绩管理系统