Erlang中文模式匹配疑问

新建一个名为data.txt,格式为utf8无bom的,里面只是飞机二字,然后用以下程序测试抛出异常:
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(Content)]).

get_id_by_name("飞机") -> %% 当然可以转为get_id_by_name([39134,26426])方法处理
                          %% 但是实际项目却是数十个,甚至数百个,这样做就不现实了
    plane;
get_id_by_name("火车") ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

exception error: no function clause matching
                    demo:get_id_by_name([39134,26426])
     in function  demo:test_cn/0

请教下,这个是怎么回事?如何可以做得到中文匹配呢?
问题补充

jigloo 写道
把demo.erl也要存成utf8格式,
然后试下这个
io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).


谢谢jialoo,解决问题了
另外,demo.erl是utf8格式的,之前也试过unicode:characters_to_binary,不过没加后面两个参数,加上后发现还是有问题,后台把中文匹配参数加上<<>>就ok了,最后再贴出解决后的完整代码,以供初学者或将来有遇到同样问题的童鞋们注意一下。
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

get_id_by_name(<<"飞机">>) ->
    plane;
get_id_by_name(<<"火车">>) ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
为了更容易理解该问题,顺手贴一下官方API说明
characters_to_binary(Data, InEncoding, OutEncoding) -> binary() | {error, binary(), RestData} | {incomplete, binary(), binary()}

Types:
Data = latin1_chardata() | chardata() | external_chardata()
RestData = latin1_chardata() | chardata() | external_chardata()
InEncoding = latin1 | unicode | utf8 | utf16 | utf32 | {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}
OutEncoding = latin1 | unicode | utf8 | utf16 | utf32| {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}

This function behaves as characters_to_list/2, but produces an binary instead of a unicode list. The InEncoding defines how input is to be interpreted if binaries are present in the Data, while OutEncoding defines in what format output is to be generated.

The option unicode is an alias for utf8, as this is the preferred encoding for Unicode characters in binaries. utf16 is an alias for {utf16,big} and utf32 is an alias for {utf32,big}. The big and little atoms denote big or little endian encoding.

Errors and exceptions occur as in characters_to_list/2, but the second element in the error or incomplete tuple will be a binary() and not a list().

1个回答

把demo.erl也要存成utf8格式,
然后试下这个
io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

Csdn user default icon
上传中...
上传图片
插入图片
抄袭、复制答案,以达到刷声望分或其他目的的行为,在CSDN问答是严格禁止的,一经发现立刻封号。是时候展现真正的技术了!
立即提问
相关内容推荐