doucao1888
doucao1888
2014-01-04 00:54
浏览 55
已采纳

PHP中数字索引数组的最短可能查询字符串

I’m looking for the most concise URL rather than the shortest PHP code. I don’t want my users to be scared by the hideous URLs that PHP creates when encoding arrays.

PHP will do a lot of repetition in query string if you just stuff an array ($fn) through http_build_query:

$fs = array(5, 12, 99);
$url = "http://$_SERVER[HTTP_HOST]/?" .
    http_build_query(array('c' => 'asdf', 'fs' => $fs));

The resulting $url is

http://example.com/?c=asdf&fs[0]=5&fs[1]=12&fs[3]=99

How do I get it down to a minimum (using PHP or methods easily implemented in PHP)?

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 邀请回答

3条回答 默认 最新

  • duankui1532
    duankui1532 2014-01-04 01:23
    已采纳

    Default PHP way

    What http_build_query does is a common way to serialize arrays to URL. PHP automatically deserializes it in $_GET.

    When wanting to serialize just a (non-associative) array of integers, you have other options.

    Small arrays

    For small arrays, conversion to underscore-separated list is quite convenient and efficient. It is done by $fs = implode('_', $fs). Then your URL would look like this:

    http://example.com/?c=asdf&fs=5_12_99
    

    The downside is that you’ll have to explicitly explode('_', $_GET['fs']) to get the values back as an array.

    Other delimiters may be used too. Underscore is considered alphanumeric and as such rarely has special meaning. In URLs, it is usually used as space replacement (e.g. by MediaWiki). It is hard to distinguish when used in underlined text. Hyphen is another common replacement for space. It is also often used as minus sign. Comma is a typical list separator, but unlike underscore and hyphen in is percent-encoded by http_build_query and has special meaning almost everywhere. Similar situation is with vertical bar (“pipe”).

    Large arrays

    When having large arrays in URLs, you should first stop coding a start thinking. This almost always indicates bad design. Wouldn’t POST HTTP method be more appropriate? Don’t you have any more readable and space efficient way of identifying the addressed resource?

    URLs should ideally be easy to understand and (at least partially) remember. Placing a large blob inside is really a bad idea.

    Now I warned you. If you still need to embed a large array in URL, go ahead. Compress the data as much as you can, base64-encode them to convert the binary blob to text and url-encode the text to sanitize it for embedding in URL.

    Modified base64

    Mmm. Or better use a modified version of base64. The one of my choice is using

    • - instead of +,
    • _ instead of / and
    • omits the padding =.
    define('URL_BASE64_FROM', '+/');
    define('URL_BASE64_TO', '-_');
    function url_base64_encode($data) {
        $encoded = base64_encode($data);
        if ($encoded === false) {
            return false;
        }
        return str_replace('=', '', strtr($encoded, URL_BASE64_FROM, URL_BASE64_TO));
    }
    function url_base64_decode($data) {
        $len = strlen($data);
        if (is_null($len)) {
            return false;
        }
        $padded = str_pad($data, 4 - $len % 4, '=', STR_PAD_RIGHT);
        return base64_decode(strtr($padded, URL_BASE64_TO, URL_BASE64_FROM));
    }
    

    This saves two bytes on each character, that would be percent-encoded otherwise. There is no need to call urlencode function, too.

    Compression

    Choice between gzip (gzcompress) and bzip2 (bzcompress) should be made. Do not want to invest time in their comparison, gzip looks better on several relatively small inputs (around 100 chars) for any setting of block size.

    Packing

    But what data should be fed into the compression algorithm?

    In C, one would cast array of integers to array of chars (bytes) and hand it over to the compression function. That’s the most obvious way to do things. In PHP the most obvious way to do things is converting all the integers to their decimal representation as strings, then concatenation using delimiters, and only after that compression. What a waste of space!

    So, let’s use the C approach! We’ll get rid of the delimiters and otherwise wasted space and encode each integer in 2 bytes using pack:

    define('PACK_NUMS_FORMAT', 'n*');
    function pack_nums($num_arr) {
        array_unshift($num_arr, PACK_NUMS_FORMAT);
        return call_user_func_array('pack', $num_arr);
    }
    function unpack_nums($packed_arr) {
        return unpack(PACK_NUMS_FORMAT, $packed_arr);
    }
    

    Warning: pack and unpack behavior is machine-dependent in this case. Byte order could change between machines. But I think it will not be a problem in practice, because the application will not run on two systems with different endianity at the same time. When integrating multiple systems, though, the problem might arise. Also if you switch to a system with different endianity, links using the original one will break.

    Encoding together

    Now packing, compression and modified base64, all in one:

    function url_embed_array($arr) {
        return url_base64_encode(gzcompress(pack_nums($arr)));
    }
    function url_parse_array($data) {
        return unpack_nums(gzuncompress(url_base64_decode($data)));
    }
    

    See the result on IdeOne. It is better than OP’s answer where on his 40-element array my solution produced 91 chars while his one 98. When using range(1, 1000) (generates array(1, 2, 3, …, 1000)) as a benchmark, OP’s solution produces 2712 characters while mine just 2032 characters. This is about 25 % better.

    For the sake of completeness, OP’s solution is

    function url_embed_array($arr) {
        return urlencode(base64_encode(gzcompress(implode(',', $arr))));
    }
    
    点赞 评论
  • doubingqi5829
    doubingqi5829 2014-01-04 01:10

    There are multiple approaches possible:

    1. serialize + base64 - can swallow any object, but data overhead is horrible.
    2. implode + base64 - limited to arrays, forces user to find unused char as delimiter, data overhead is much smaller.
    3. implode - unsafe for unescaped strings. Requires strict data control.
    $foo = array('some unsafe data', '&&&==http://', '65535');
    $ser = base64_encode(serialize($foo));
    $imp = implode($foo, '|');
    $imp2 = base64_encode($imp);
    echo "$ser
    $imp
    $imp2";
    

    Results are as follows:

    YTozOntpOjA7czoxNjoic29tZSB1bnNhZmUgZGF0YSI7aToxO3M6MTI6IiYmJj09aHR0cDovLyI7aToyO3M6NToiNjU1MzUiO30=
    some unsafe data|&&&==http://|65535
    c29tZSB1bnNhZmUgZGF0YXwmJiY9PWh0dHA6Ly98NjU1MzU=
    

    While serialize+base64 results are horribly long, implode+serialize gives output of manageable length with safety for GET… except for that = at end.

    点赞 评论
  • duanli4146
    duanli4146 2014-01-04 02:02

    I believe the answer depends on the size of the query string.

    Short query strings

    For shorter query strings, this may be the best way:

    $fs = array(5, 12, 99);
    $fs_no_array = implode(',', $fs);
    $url = "http://$_SERVER[HTTP_HOST]/?" .
        http_build_query(array('c' => 'asdf', 's' => 'jkl')) . '&fs=' . $fs_no_array;
    

    resulting in

    http://example.com/?c=asdf&s=jkl&fs=5,12,99
    

    On the other end you do this to get your array back:

    $fs = array_map('intval', explode(',', $_GET['fs']));
    

    Quick note about delimiters: A valid reasons to avoid commas is that they are used as delimiters in so many other applications. On the off-chance you may want to parse your URLs in Excel, for example, the commas might make it slightly more difficult. Underscores also would work, but can blend in with the underlining that is standard in web formatting for links. So dashes may actually be a better choice than either commas or underscores.

    Long query strings

    I came across another possible solution:

    $fs_compressed = urlencode(base64_encode(gzcompress($fs_no_array)));
    

    On the other end it can be decompressed by

    $fs_decompressed = gzuncompress(base64_decode($_GET['fs']));
    $fs = array_map('intval', explode(',', $fs_decompressed));
    

    assuming it’s passed in through GET variable.

    Effectivity tests

    31 elements

    $fs = array(7,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,52,53,54,61);
    

    Result:

    eJwFwckBwCAQxLCG%2FMh4D6D%2FxiIdpGiG5fLIR0IkRZoMWXLIJQ8%2FDIqFjYOLBy8jU0yz%2BQGlbxAB
    

    $fs_no_array is 84 characters long, $fs_compressed 84 characters long. The same!

    40 elements

    $fs = array(7,2,3,4,5,6,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31,32,33,34,35,36,37,52,53,54,61);
    

    Result:

    eJwNzEkBwDAQAzFC84jtPRL%2BxFoB0GJC0QyXhw4SMgoq1GjQoosePljYOLhw48GLL37kEJE%2FDCnSZMjSpkMXow%2BdIBUs
    

    $fs_no_array is 111 characters long, $fs_compressed 98 characters long.

    Summary

    The savings is only about 10 %. But at greater lengths the savings will increase to beyond 50 %.

    If you use Yahoo sites, you notice things like comma separated lists as well as sometimes a series of random looking characters. They may be employing these solutions in the wild already.

    Also check out this stack question, which talks in way too much detail about what is allowed in a URI.

    点赞 评论

相关推荐