After converting my site to use utf-8, I'm now faced with the prospect of validating all incoming utf data, to ensure its valid and coherent.
There seems to be various regexp's and PHP API to detect whether a string is utf, but the ones Ive seen seem incomplete (regexps which validate utf, but still allow invalid 3rd bytes etc).
I'm also concerned about detecting (and preventing) overlong encoding, meaning ASCII characters that can be encoded as multibyte utf sequences.
Any suggestions or links welcome!