donglu8344812 2011-07-21 17:13
浏览 132

删除所有类型字符

I have constant problems with data where odd characters like  will show up in our database causing everything to break at some point down the line. I need to get a system in place that only allows specific characters through and ignores all of these crazy things that can be pasted from Microsoft Office. Is there something like this built in, or should I start from scratch?

  • 写回答

2条回答 默认 最新

  • doucai6663 2011-07-21 17:29
    关注

    Well, you can remove all such characters via e.g. $text = preg_replace('@[^\d\w\s,.;:]@', '', $text); where [^\d\w\s,.;:] is a set of characters to keep (\d\w\s means all digits, letters, and spaces). Amend the set with other characters you do want to keep.

    However, that is the wrong approach. You should instead ensure that your entire application is using and processing UTF-8 from ground up, so that you can store and handle those characters correctly. Making an ASCII or ISO Latin site in this day and age is just weird and essentially causes data loss due to cutting out characters that people actually use...

    评论

报告相同问题?