duanhe1976 2013-02-27 09:42
浏览 130
已采纳

使用PHP中的位

Say I want to store a sequence of 8 words in PHP, and I don't want to use compression.

Since there are only 8 words, I could assign each one a binary value and then store these binary values in a file instead of the ascii words.

The possible binary values would be:

000, 001, 010, 011, 100, 101, 110, 111

This would be much more efficient to parse because: (1) each word is now the same size, and, (2) it takes up much less space.

My question is:

How can I do this in PHP? How can I assign a binary value to something, then write this to a file (writing the bits how I want them), then read this back again?

The reason I want to do this is to create an efficient indexing system.

  • 写回答

1条回答 默认 最新

  • duanli0119 2013-02-28 04:33
    关注

    First, if you want to compress data, use php builtin functions for that like the gzip extension..

    But as you requested, I've prepared an example how this can be done in PHP. It is not perfect, just a trivial implementation. The compression rate could be better if I would use the gap between bit 30 and 32 of each integer. Maybe will add this feature... However I've used 32bit unsigned integers in favour of bytes as with them the loss is 2 bits per 32 bits instead of 2 bits per byte.

    First we prepare the lookup table that contains the relations word => decimal number, its the coding table:

    <?php
    
    // coding table
    $lookupTable = array (
    //  'word0' => chr(0), // reserved for 0 byte gap in last byte
        'word1' => chr(1),
        'word2' => chr(2),
        'word3' => chr(3),
        'word4' => chr(4),
        'word5' => chr(5),
        'word6' => chr(6),
        // reserve one word for white space
        ' ' => chr(7)
    );
    

    Then comes the compression function:

    /**
     *
     */
    function _3bit_compress($text, $lookupTable) {
    
        echo 'before compression                  : ' . strlen($text) . ' chars', PHP_EOL;
    
        // first step is one byte compression using the lookup table
        $text = strtr($text, $lookupTable);
        echo 'after one byte per word compression : ' . strlen($text) . ' chars', PHP_EOL;
    
        $bin = ''; // the result
        $carrier = 0; // 32 bit usingned int can 'carry' 10 words in 3 bit notation
    
        for($c = 0; $c < strlen($text); $c++) {
            $triplet = $c % 10;
            // every 30 bits we add the 4byte unsigned integer to $bin.
            // please read the manual of pack
            if($triplet === 0 && $carrier !== 0) {
                $bin .= pack('N', $carrier);
                $carrier = 0;
            }
    
            $char = $text[$c];
            $carrier  <<= 3; // make space for the the next 3 bits
            $carrier += ord($char); // add the next 3 bit pattern
            // echo $carrier, ' added ' . ord($char), PHP_EOL;
        }
        $bin .= pack('N', $carrier); // don't forget the remaining bits
        echo 'after 3 bit compression             : ' . strlen($bin) . ' chars', PHP_EOL;
        return $bin;
    }
    

    And the decompression function:

    /**
     *
     */
    function _3_bit_uncompress($compressed, $lookupTable) {
        $len = strlen($compressed);
        echo 'compressed length:            : ' . $len . ' chars', PHP_EOL;
    
        $i = 0;
        $tmp = '';
        $text = '';
        // unpack string as 4byte unsigned integer
        foreach(unpack('N*', $compressed) as $carrier) {
            while($i < 10) {
                $code = $carrier & 7; // get the next code
                // echo $carrier . ' ' . $code, PHP_EOL;
                $tmp = chr($code) . $tmp;
                $i++;
                $carrier >>= 3; // shift forward to the next 3 bits
            }
            $i = 0;
            $text = $text . $tmp;
            $tmp = '';
        }
        // reverse translate from decimal codes to words
        return strtr($text, array_flip($lookupTable));
    }
    

    Now its time to test the functions :)

    $original = <<<EOF
    word1 word2 word3 word4 word5 word6 word1 word3 word3  word2
    EOF;
    
    
    $compressed = _3bit_compress($original, $lookupTable);
    $restored = _3_bit_uncompress($compressed, $lookupTable);
    
    echo 'compressed size: ' . round(strlen($compressed) * 100 / strlen($original), 2) . '%', PHP_EOL;
    
    echo 'Message before compression  : ' . $original, PHP_EOL;
    echo 'Message after decompression : ' . $restored, PHP_EOL;
    

    The example should give you:

    before compression                  : 60 chars
    after one byte per word compression : 20 chars
    after 3 bit compression             : 8 chars
    compressed length:            : 8 chars
    compressed size: 13,33%
    Message before compression  : word1 word2 word3 word4 word5 word6 word1 word3 word3  word2
    Message after decompression : word1 word2 word3 word4 word5 word6 word1 word3 word3  word2
    

    If we are testing with loooong words the compression rate will of course get even better:

    before compression                  : 112 chars
    after one byte per word compression : 16 chars
    after 3 bit compression             : 8 chars
    compressed length:            : 8 chars
    compressed size: 7,14%
    Message before compression  : wooooooooord1 wooooooooord2 wooooooooord2 wooooooooord3 wooooooooord1 wooooooooord2 wooooooooord2 wooooooooord3 
    Message after decompression : wooooooooord1 wooooooooord2 wooooooooord2 wooooooooord3 wooooooooord1 wooooooooord2 wooooooooord2 wooooooooord3 
    
    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 Oracle中如何从clob类型截取特定字符串后面的字符
  • ¥15 想通过pywinauto自动电机应用程序按钮,但是找不到应用程序按钮信息
  • ¥15 MATLAB中streamslice问题
  • ¥15 如何在炒股软件中,爬到我想看的日k线
  • ¥15 51单片机中C语言怎么做到下面类似的功能的函数(相关搜索:c语言)
  • ¥15 seatunnel 怎么配置Elasticsearch
  • ¥15 PSCAD安装问题 ERROR: Visual Studio 2013, 2015, 2017 or 2019 is not found in the system.
  • ¥15 (标签-MATLAB|关键词-多址)
  • ¥15 关于#MATLAB#的问题,如何解决?(相关搜索:信噪比,系统容量)
  • ¥500 52810做蓝牙接受端