iteye_8238
iteye_8238
2011-01-11 10:12

Erlang中文模式匹配疑问

已采纳

新建一个名为data.txt,格式为utf8无bom的,里面只是飞机二字,然后用以下程序测试抛出异常:
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(Content)]).

get_id_by_name("飞机") -> %% 当然可以转为get_id_by_name([39134,26426])方法处理
                          %% 但是实际项目却是数十个,甚至数百个,这样做就不现实了
    plane;
get_id_by_name("火车") ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

exception error: no function clause matching
                    demo:get_id_by_name([39134,26426])
     in function  demo:test_cn/0

请教下,这个是怎么回事?如何可以做得到中文匹配呢?
问题补充

jigloo 写道
把demo.erl也要存成utf8格式,
然后试下这个
io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).


谢谢jialoo,解决问题了
另外,demo.erl是utf8格式的,之前也试过unicode:characters_to_binary,不过没加后面两个参数,加上后发现还是有问题,后台把中文匹配参数加上<<>>就ok了,最后再贴出解决后的完整代码,以供初学者或将来有遇到同样问题的童鞋们注意一下。
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

get_id_by_name(<<"飞机">>) ->
    plane;
get_id_by_name(<<"火车">>) ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
为了更容易理解该问题,顺手贴一下官方API说明
characters_to_binary(Data, InEncoding, OutEncoding) -> binary() | {error, binary(), RestData} | {incomplete, binary(), binary()}

Types:
Data = latin1_chardata() | chardata() | external_chardata()
RestData = latin1_chardata() | chardata() | external_chardata()
InEncoding = latin1 | unicode | utf8 | utf16 | utf32 | {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}
OutEncoding = latin1 | unicode | utf8 | utf16 | utf32| {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}

This function behaves as characters_to_list/2, but produces an binary instead of a unicode list. The InEncoding defines how input is to be interpreted if binaries are present in the Data, while OutEncoding defines in what format output is to be generated.

The option unicode is an alias for utf8, as this is the preferred encoding for Unicode characters in binaries. utf16 is an alias for {utf16,big} and utf32 is an alias for {utf32,big}. The big and little atoms denote big or little endian encoding.

Errors and exceptions occur as in characters_to_list/2, but the second element in the error or incomplete tuple will be a binary() and not a list().
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

1条回答

  • jigloo123 jigloo123 10年前

    把demo.erl也要存成utf8格式,
    然后试下这个
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

    点赞 评论 复制链接分享

相关推荐