It depends on the encoding you're talking about. Both UTF-16LE and UTF-32LE have tons of characters ending in null bytes, for example, which trim
removes by default.
The string "a" in UTF-16LE consists of the bytes 0x61
0x00
, and trim
will remove the null byte leaving just 0x61
.
Note that this problem goes the other way too, trim
strips bytes from the beginning of strings as well as the end. If your string "a" is in UTF-16BE it will be encoded as 0x00
0x61
- with trim
again leaving you with just 0x61
.
Example:
$utf16le = iconv("ASCII", "UTF-16LE", "a");
$utf16be = iconv("ASCII", "UTF-16BE", "a");
var_dump(
bin2hex($utf16le),
bin2hex(trim($utf16le)),
bin2hex($utf16be),
bin2hex(trim($utf16be))
);
Output:
string(4) "6100"
string(2) "61"
string(4) "0061"
string(2) "61"
If you're only worried about UTF-8 then no, there aren't any conflicts. It is ASCII compatible and all single byte characters in UTF-8 are in the form of 0xxx xxxx
while all bytes of a multibyte character have their most significant bit set, 1xxx xxxx
, so there is no ambiguity. With UTF-8 trim
using its default character mask is safe.
If you're concerned about other encodings then it's going to depend on what they are. If you try using multibyte characters as part of trim
's character mask you'll definitely run into problems as each byte will be treated individually.