iteye_8238 2011-01-11 10:12
浏览 366
已采纳

Erlang中文模式匹配疑问

新建一个名为data.txt,格式为utf8无bom的,里面只是飞机二字,然后用以下程序测试抛出异常:
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(Content)]).

get_id_by_name("飞机") -> %% 当然可以转为get_id_by_name([39134,26426])方法处理
                          %% 但是实际项目却是数十个,甚至数百个,这样做就不现实了
    plane;
get_id_by_name("火车") ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

exception error: no function clause matching
                    demo:get_id_by_name([39134,26426])
     in function  demo:test_cn/0

请教下,这个是怎么回事?如何可以做得到中文匹配呢?
问题补充

jigloo 写道
把demo.erl也要存成utf8格式,
然后试下这个
io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).


谢谢jialoo,解决问题了
另外,demo.erl是utf8格式的,之前也试过unicode:characters_to_binary,不过没加后面两个参数,加上后发现还是有问题,后台把中文匹配参数加上<<>>就ok了,最后再贴出解决后的完整代码,以供初学者或将来有遇到同样问题的童鞋们注意一下。
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

get_id_by_name(<<"飞机">>) ->
    plane;
get_id_by_name(<<"火车">>) ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
为了更容易理解该问题,顺手贴一下官方API说明
characters_to_binary(Data, InEncoding, OutEncoding) -> binary() | {error, binary(), RestData} | {incomplete, binary(), binary()}

Types:
Data = latin1_chardata() | chardata() | external_chardata()
RestData = latin1_chardata() | chardata() | external_chardata()
InEncoding = latin1 | unicode | utf8 | utf16 | utf32 | {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}
OutEncoding = latin1 | unicode | utf8 | utf16 | utf32| {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}

This function behaves as characters_to_list/2, but produces an binary instead of a unicode list. The InEncoding defines how input is to be interpreted if binaries are present in the Data, while OutEncoding defines in what format output is to be generated.

The option unicode is an alias for utf8, as this is the preferred encoding for Unicode characters in binaries. utf16 is an alias for {utf16,big} and utf32 is an alias for {utf32,big}. The big and little atoms denote big or little endian encoding.

Errors and exceptions occur as in characters_to_list/2, but the second element in the error or incomplete tuple will be a binary() and not a list().
  • 写回答

1条回答 默认 最新

  • jigloo123 2011-01-11 10:12
    关注

    把demo.erl也要存成utf8格式,
    然后试下这个
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 如何用stata画出文献中常见的安慰剂检验图
  • ¥15 c语言链表结构体数据插入
  • ¥40 使用MATLAB解答线性代数问题
  • ¥15 COCOS的问题COCOS的问题
  • ¥15 FPGA-SRIO初始化失败
  • ¥15 MapReduce实现倒排索引失败
  • ¥15 ZABBIX6.0L连接数据库报错,如何解决?(操作系统-centos)
  • ¥15 找一位技术过硬的游戏pj程序员
  • ¥15 matlab生成电测深三层曲线模型代码
  • ¥50 随机森林与房贷信用风险模型