anubhava's answer about matching ranges of unicode characters led me to the regex to use for cleaning up a specific range of single code point of characters. With it, now I can match all miscellaneous symbols in this list (includes emoticons) with this simple expression:
preg_replace('/[\x{2600}-\x{26FF}]/u', '', $str);
However, I also want to match those in this list of paired/double surrogates emoji, but as nhahtdh explained in a comment:
There is a range from
d800
todfff
to specify surrogates in UTF-16 to allow for more characters to be specified. A single surrogate is not a valid character in UTF-16 (a pair is necessary to specify a valid character).
So, for example, when I try this:
preg_replace('/\x{D83D}\x{DE00}/u', '', $str);
For replacing only the first of the paired surrogates on this list, i.e.: