iteye_8238 2011-01-11 10:12
浏览 366
已采纳

Erlang中文模式匹配疑问

新建一个名为data.txt,格式为utf8无bom的,里面只是飞机二字,然后用以下程序测试抛出异常:
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(Content)]).

get_id_by_name("飞机") -> %% 当然可以转为get_id_by_name([39134,26426])方法处理
                          %% 但是实际项目却是数十个,甚至数百个,这样做就不现实了
    plane;
get_id_by_name("火车") ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

exception error: no function clause matching
                    demo:get_id_by_name([39134,26426])
     in function  demo:test_cn/0

请教下,这个是怎么回事?如何可以做得到中文匹配呢?
问题补充

jigloo 写道
把demo.erl也要存成utf8格式,
然后试下这个
io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).


谢谢jialoo,解决问题了
另外,demo.erl是utf8格式的,之前也试过unicode:characters_to_binary,不过没加后面两个参数,加上后发现还是有问题,后台把中文匹配参数加上<<>>就ok了,最后再贴出解决后的完整代码,以供初学者或将来有遇到同样问题的童鞋们注意一下。
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

get_id_by_name(<<"飞机">>) ->
    plane;
get_id_by_name(<<"火车">>) ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
为了更容易理解该问题,顺手贴一下官方API说明
characters_to_binary(Data, InEncoding, OutEncoding) -> binary() | {error, binary(), RestData} | {incomplete, binary(), binary()}

Types:
Data = latin1_chardata() | chardata() | external_chardata()
RestData = latin1_chardata() | chardata() | external_chardata()
InEncoding = latin1 | unicode | utf8 | utf16 | utf32 | {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}
OutEncoding = latin1 | unicode | utf8 | utf16 | utf32| {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}

This function behaves as characters_to_list/2, but produces an binary instead of a unicode list. The InEncoding defines how input is to be interpreted if binaries are present in the Data, while OutEncoding defines in what format output is to be generated.

The option unicode is an alias for utf8, as this is the preferred encoding for Unicode characters in binaries. utf16 is an alias for {utf16,big} and utf32 is an alias for {utf32,big}. The big and little atoms denote big or little endian encoding.

Errors and exceptions occur as in characters_to_list/2, but the second element in the error or incomplete tuple will be a binary() and not a list().
  • 写回答

1条回答 默认 最新

  • jigloo123 2011-01-11 10:12
    关注

    把demo.erl也要存成utf8格式,
    然后试下这个
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 乘性高斯噪声在深度学习网络中的应用
  • ¥15 运筹学排序问题中的在线排序
  • ¥15 关于docker部署flink集成hadoop的yarn,请教个问题 flink启动yarn-session.sh连不上hadoop,这个整了好几天一直不行,求帮忙看一下怎么解决
  • ¥30 求一段fortran代码用IVF编译运行的结果
  • ¥15 深度学习根据CNN网络模型,搭建BP模型并训练MNIST数据集
  • ¥15 C++ 头文件/宏冲突问题解决
  • ¥15 用comsol模拟大气湍流通过底部加热(温度不同)的腔体
  • ¥50 安卓adb backup备份子用户应用数据失败
  • ¥20 有人能用聚类分析帮我分析一下文本内容嘛
  • ¥30 python代码,帮调试,帮帮忙吧