doucaishi0077 2014-09-18 09:10
浏览 36
已采纳

ICU:Transliterate然后删除所有非字母数字字符

Can it be done with ICU without falling back to regex?

Currently I normalize filenames like this:

protected function normalizeFilename($filename)
{
    $transliterator = Transliterator::createFromRules(
        'Any-Latin; Latin-ASCII; [:Punctuation:] Remove;'
    );
    $filename = $transliterator->transliterate($filename);
    $filename = preg_replace('/[^A-Za-z0-9_]/', '', $filename);
    return $filename;

}

Can I get rid of regular expression here and do everything with ICU calls?

  • 写回答

1条回答 默认 最新

  • dongxu1668 2014-09-25 21:46
    关注

    Use the correct tool for the job

    I don't see anything wrong with what you're doing now.

    ICU transliteration is first and foremost language oriented. It tries to preserve meaning.

    Regular expressions, on the other hand, can manipulate characters in detail, giving you the assurance that the file name is restricted to the selected characters.

    The combination is perfect, in this case.

    I have, of course, looked for a solution to your question. But to be honest, I couldn't find something that would work on all possible inputs.

    For instance, not all characters, we would consider punctuation marks, are removed by [:Punctuation:] Remove;. Try the Russian name: Корнильев, Кирилл. After applying your id it becomes: Kornilʹev Kirill. Clearly that's not a punctuation mark, but you don't want it in your file name.

    So I would advice to use the correct tool for the job:

    1. Use ICU to get the best ASCII enquivalent. Only using Latin-ASCII; as the id will do. Nice and simple.
    2. Then use a regular expression, just like you did, to make sure you're left with only the characters you need.

    There is really nothing wrong with this.

    PS: Personally I think the person, or persons, who wrote the ICU user guide should not be complimented on a job well done. What a mess.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥50 potsgresql15备份问题
  • ¥15 Mac系统vs code使用phpstudy如何配置debug来调试php
  • ¥15 目前主流的音乐软件,像网易云音乐,QQ音乐他们的前端和后台部分是用的什么技术实现的?求解!
  • ¥60 pb数据库修改与连接
  • ¥15 spss统计中二分类变量和有序变量的相关性分析可以用kendall相关分析吗?
  • ¥15 拟通过pc下指令到安卓系统,如果追求响应速度,尽可能无延迟,是不是用安卓模拟器会优于实体的安卓手机?如果是,可以快多少毫秒?
  • ¥20 神经网络Sequential name=sequential, built=False
  • ¥16 Qphython 用xlrd读取excel报错
  • ¥15 单片机学习顺序问题!!
  • ¥15 ikuai客户端多拨vpn,重启总是有个别重拨不上