Let's say I have the following:
$str1 = "via Tokyo";
$str2 = "via 東京";
I want to match any non-whitespace characters after the "via ". Normally I'd use the following:
preg_match("/via\s(\S+)/", $str2, $match);
to obtain the matching characters. I assumed this wouldn't work with the above due to preg_match
not understanding utf8, however it works perfectly in this case.
Is this working correctly because preg_match
is simply looking for bytes that aren't whitespace, and if so, am I safe to use this for any UTF8 characters?
PS I'm aware that I should really be using the mb_ereg
functions for this (or avoiding PHP altogether) but I'm looking for a better understanding of why this works. Thanks!