iteye_8238 2011-01-11 10:12
浏览 366
已采纳

Erlang中文模式匹配疑问

新建一个名为data.txt,格式为utf8无bom的,里面只是飞机二字,然后用以下程序测试抛出异常:
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(Content)]).

get_id_by_name("飞机") -> %% 当然可以转为get_id_by_name([39134,26426])方法处理
                          %% 但是实际项目却是数十个,甚至数百个,这样做就不现实了
    plane;
get_id_by_name("火车") ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%

exception error: no function clause matching
                    demo:get_id_by_name([39134,26426])
     in function  demo:test_cn/0

请教下,这个是怎么回事?如何可以做得到中文匹配呢?
问题补充

jigloo 写道
把demo.erl也要存成utf8格式,
然后试下这个
io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).


谢谢jialoo,解决问题了
另外,demo.erl是utf8格式的,之前也试过unicode:characters_to_binary,不过没加后面两个参数,加上后发现还是有问题,后台把中文匹配参数加上<<>>就ok了,最后再贴出解决后的完整代码,以供初学者或将来有遇到同样问题的童鞋们注意一下。
-module(demo).
-compile(export_all).

test_cn() ->
    {ok, Fp} = file:open("data.txt", [read, {encoding, utf8}]),
    {ok, Content} = file:read_line(Fp),
    file:close(Fp),
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

get_id_by_name(<<"飞机">>) ->
    plane;
get_id_by_name(<<"火车">>) ->
    train.

%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%%
为了更容易理解该问题,顺手贴一下官方API说明
characters_to_binary(Data, InEncoding, OutEncoding) -> binary() | {error, binary(), RestData} | {incomplete, binary(), binary()}

Types:
Data = latin1_chardata() | chardata() | external_chardata()
RestData = latin1_chardata() | chardata() | external_chardata()
InEncoding = latin1 | unicode | utf8 | utf16 | utf32 | {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}
OutEncoding = latin1 | unicode | utf8 | utf16 | utf32| {utf16,little} | {utf16,big} | {utf32,little} | {utf32,big}

This function behaves as characters_to_list/2, but produces an binary instead of a unicode list. The InEncoding defines how input is to be interpreted if binaries are present in the Data, while OutEncoding defines in what format output is to be generated.

The option unicode is an alias for utf8, as this is the preferred encoding for Unicode characters in binaries. utf16 is an alias for {utf16,big} and utf32 is an alias for {utf32,big}. The big and little atoms denote big or little endian encoding.

Errors and exceptions occur as in characters_to_list/2, but the second element in the error or incomplete tuple will be a binary() and not a list().
  • 写回答

1条回答

  • jigloo123 2011-01-11 10:12
    关注

    把demo.erl也要存成utf8格式,
    然后试下这个
    io:format("~w~n", [get_id_by_name(unicode:characters_to_binary(Content, unicode, utf8))]).

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 寻一个支付宝扫码远程授权登录的软件助手app
  • ¥15 解riccati方程组
  • ¥15 display:none;样式在嵌套结构中的已设置了display样式的元素上不起作用?
  • ¥30 用arduino开发esp32控制ps2手柄一直报错
  • ¥15 使用rabbitMQ 消息队列作为url源进行多线程爬取时,总有几个url没有处理的问题。
  • ¥15 Ubuntu在安装序列比对软件STAR时出现报错如何解决
  • ¥50 树莓派安卓APK系统签名
  • ¥65 汇编语言除法溢出问题
  • ¥15 Visual Studio问题
  • ¥20 求一个html代码,有偿