为什么现代 Perl 默认避免使用 UTF-8？

I wonder why most modern solutions built using Perl don't enable UTF-8 by default.

I understand there are many legacy problems for core Perl scripts, where it may break things. But, from my point of view, in the 21^st century, big new projects (or projects with a big perspective) should make their software UTF-8 proof from scratch. Still I don't see it happening. For example, Moose enables strict and warnings, but not Unicode. Modern::Perl reduces boilerplate too, but no UTF-8 handling.

Why? Are there some reasons to avoid UTF-8 in modern Perl projects in the year 2011?

Commenting @tchrist got too long, so I'm adding it here.

It seems that I did not make myself clear. Let me try to add some things.

tchrist and I see situation pretty similarly, but our conclusions are completely in opposite ends. I agree, the situation with Unicode is complicated, but this is why we (Perl users and coders) need some layer (or pragma) which makes UTF-8 handling as easy as it must be nowadays.

tchrist pointed to many aspects to cover, I will read and think about them for days or even weeks. Still, this is not my point. tchrist tries to prove that there is not one single way "to enable UTF-8". I have not so much knowledge to argue with that. So, I stick to live examples.

I played around with Rakudo and UTF-8 was just there as I needed. I didn't have any problems, it just worked. Maybe there are some limitation somewhere deeper, but at start, all I tested worked as I expected.

Shouldn't that be a goal in modern Perl 5 too? I stress it more: I'm not suggesting UTF-8 as the default character set for core Perl, I suggest the possibility to trigger it with a snap for those who develop new projects.

Another example, but with a more negative tone. Frameworks should make development easier. Some years ago, I tried web frameworks, but just threw them away because "enabling UTF-8" was so obscure. I did not find how and where to hook Unicode support. It was so time-consuming that I found it easier to go the old way. Now I saw here there was a bounty to deal with the same problem with Mason 2: How to make Mason2 UTF-8 clean?. So, it is pretty new framework, but using it with UTF-8 needs deep knowledge of its internals. It is like a big red sign: STOP, don't use me!

I really like Perl. But dealing with Unicode is painful. I still find myself running against walls. Some way tchrist is right and answers my questions: new projects don't attract UTF-8 because it is too complicated in Perl 5.

转载于:https://stackoverflow.com/questions/6162484/why-does-modern-perl-avoid-utf-8-by-default

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

5条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
10.24 2011-05-28 15:19
关注
There's a truly horrifying amount of ancient code out there in the wild, much of it in the form of common CPAN modules. I've found I have to be fairly careful enabling Unicode if I use external modules that might be affected by it, and am still trying to identify and fix some Unicode-related failures in several Perl scripts I use regularly (in particular, iTiVo fails badly on anything that's not 7-bit ASCII due to transcoding issues).

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

UTF-8和不带BOM的UTF-8有什么区别？
2019-12-20 09:24

p15097962069的博客没有BOM的 UTF-8和UTF-8有什么区别？哪个更好？
ANSI、UTF-8带BOM及UTF-8无BOM的三种不同编码格式
2025-03-19 20:06

BabyFish13的博客：默认为 UTF-8ANSI 编码表示英文字符时用一个字节，表示中文用两个或四个字节 —— 这带来了存储空间的减少，但却带来的格式的不统一和混乱。ANSI是一种字符代码，为使计算机支持更多语言，通常使用 0x00~0x79 范围...
perl_utf8_cheat_sheet
2021-05-20 10:54

- Perl 5.8及以后的版本默认启用UTF-8内部字符串表示，这意味着字符串可以包含Unicode字符。 - `use utf8` 指令告诉编译器源代码中的字面量字符串是用UTF-8编码的。 - `binmode(STDIN, ":encoding(UTF-8)");` 和 ...
perl中utf-8编码的处理
2013-08-11 01:02

ok_我的心的博客这里顺便告诉大家一个秘诀，只要文本被perl 按正确编码解释后，利用/w就可以匹配一个字母、数字、_、汉字，这个特性是不是很方便，所以我们只要用如下两次正则表达式就可以去掉所有非汉字字符，包括全角的一些标点...
Unicode-ICU：从perl到ICU的接口
2021-02-05 18:19

2. **字符转换**：ICU可以处理各种字符编码之间的转换，如UTF-8、UTF-16等。 3. **日期和时间格式化**：ICU可以根据不同的地区设置，将日期和时间格式化为用户友好的字符串。 4. **数字格式化**：ICU能够按照各种...
Python - 高级动态编程语言 - 入门基础知识（上）
2021-04-17 09:58

名字里有三个木的博客 Python 是一种易于学习、功能强大的高级编程语言。它提供了高效的高级数据结构，还能简单有效地面向对象编程。Python 优雅的语法和动态类型，以及解释型语言的本质，使它成为多数平台上写脚本和快速开发应用的理想...
讨论perl中的utf-8编码处理
2010-01-17 03:53

波特王子的博客这里顺便告诉大家一个秘诀，只要文本被perl 按正确编码解释后，利用/w就可以匹配一个字母、数字、_、汉字，这个特性是不是很方便，所以我们只要用如下两次正则表达式就可以去掉所有非汉字字符，包括全角的一些标点...
Perl 语言入门学习
2021-07-13 14:01

Drifting Kern的博客 Perl 语言学习 – 入门篇入门篇包括但不限于以下内容 Perl 简介版本历史第一个Perl程序 Perl的基本数据类型 Perl的运算操作符和判断操作符 Perl的数组操作 Perl的控制结构语句 Perl的子程序注：尽管本文中...
php设置mysql 编码_php设置mysql编码为utf-8的方法
2021-01-28 13:49

weixin_39912368的博客【摘要】PHP即“超文本预处理器”，是一种通用开源...下面是php设置mysql编码为utf-8的方法，让我们一起来看看php设置mysql编码为utf-8的方法的具体内容吧！php设置mysql编码为utf-8的方法php设置mysql编码为utf-8...
PCB设计中的‘安全距离‘博弈：3W原则在成本与性能间的平衡艺术
2025-12-09 02:03

wdx01234567的博客涵盖将sed、awk、C代码及shell脚本转换为Perl的方法，介绍find2perl等实用工具，并详细解析Perl中Unicode的支持机制，包括字符编码、规范化形式（NFD/NFKD）、组合字符处理以及源码中的UTF-8使用。通过实际示例和...
没有解决我的问题, 去提问

为什么现代 Perl 默认避免使用 UTF-8？

5条回答 默认 最新

5条回答默认最新