An Android app that I am writing acquires data compressed using LZString and sent out as base 64. I am using this implementation for LZString in Java along with this one in PHP. Both of these implementations are the top recomendations listed here by the author of the original JavaScript port of LZW.
I have had a really tough time understanding why the LZString data sent out by PHP throw up exceptions in Java. After much experiment and frayed nerves I have eventually worked out that the issue is down to apparent padding that is expected in Java and is missing in the data sent out from PHP. Take the following as examples
Original String being compressed
Betty bought a bit of butter but it was bitter so she bought some better butter to make the bitter butter better
a sentence I use for testing since with it multiple repetitions it is likely to compress well.
The PHP implementation of LZString spits out the following byte array
69 73 85 119 76 109 67 101 65 69 66 71 68 50 66 88 65 53 103 67 122 78 65
104 110 65 108 104 43 65 90 110 73 104 67 65 69 55 69 90 55 81 68 117 109 65
122 114 113 82 102 102 78 80 97 105 72 69 109 104 113 119 76 90 100 89 52 77
79 85 113 105 75 89 78 118 48 119 66 114 76 109 69 53 77 74 52 115 99 79 90
65
while the Java implementation generates the following byte array
69 73 85 119 76 109 67 101 65 69 66 71 68 50 66 88 65 53 103 67 122 78 65
104 110 65 108 104 43 65 90 110 73 104 67 65 69 55 69 90 55 81 68 117 109 65
122 114 113 82 102 102 78 80 97 105 72 69 109 104 113 119 76 90 100 89 52 77
79 85 113 105 75 89 78 118 48 119 66 114 76 109 69 53 77 74 52 115 99 79 90
65 **65 65 61 61**
You will note that the Java implementation tags on extra **AA==**
.
I can at a pinch understand why there is an ==
- padding to get to the desired length multiple. However, I cannot understand why or where the AA are coming from.
I tested LZString.decompressFromBase64
in Java after tagging on an additional AA==
and found that it works. On the other hand simply tagging on an ==
threw an exception. Further experiment revealed that tagging on ====
worked and so too did BB==
indicating that these four bytes are simply used for padding and not put to any other use.
At this point I could quite simply append padding as appropriate in Java prior to doing LZString.decompressFromBase64
. However, that I fear that would be a "solution" implemented without a full understanding of what is happening here. Perhaps someone here can shed some light?