Are these characters really multi-byte?
As shown by this online demo, the following code returns 1
(the TRUE
value) three times:
$regex = "@^[0-9A-ZĄąĆćĘꣳÓ󯿏źŃńŚś., -]{5,35}$@i";
echo preg_match($regex,"Żółkiewski")."
";
echo preg_match($regex,"Zielona Góra")."
";
echo preg_match($regex,"Równina")."
";
Therefore the problem is not with the regex, but with a mismatch between the encoding of the script where the regex lives and of the input fed to the regex. It may well be, for instance, that your script is using one of the Windows or ISO Eastern European encodings... In which case they may not be multi-byte at all. Many IDEs and editors are able to convert a text file's encoding.
The best choice for future-proofing is to make sure every component of your system talks utf-8:
- The script's encoding
- The header sent by the script
- The meta tag
- The connection to the database
- The data in the database
And so on. Addressing how to achieve all of those is a topic for a book chapter and beyond the scope of this question.