weixin_39715652
weixin_39715652
2020-11-28 00:50

Cannot handle CJK characters correctly

CJK characters usually require 2 visual spaces for each character instead of one visual space in terminals.

which means that, 4 Chinese characters usually has the same visual length of 8 english characters.

this figure is 4 Chinese characters and 8 english characters compared. they have the same visual length.

asciiflow2 treat CJK characters same visual length as english characters, which is wrong and breaks the figure:

in preview, each CJK characters are put into one visual space that they "joined" together and looks ugly

after export each CJK characters are showed correctly as 2 visual length but the figure breaks.

There's one thing that makes it more complicated. In terminals it is true that CJK characters have double visual length, but it may not true for web browsers or applications. for web browsers or applications it has rich text format and the visual length depends on what font it is using.

different english monospace fonts have different visual length. but when showing CJK characters that are not defined in those fonts, they will all fall back to the same default CJK font which has the same visual length.

so the web previews may still breaks the figure even if the visual length is considered unless the font was carefully chosen and defined.

but the figure will be correct after coping to terminal if the visual length is considered, and no font definition is required.

the following code will result like this in web view.


<p style="font-family: 'Courier New';">
abcdefgh<br>
这是中文
</p>
<p style="font-family: 'Lucida Console';">
abcdefgh<br>
这是中文
</p>

该提问来源于开源项目:lewish/asciiflow2

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

4条回答

  • weixin_39715652 weixin_39715652 5月前

    There's a full description about CJK character length at Unicode® Standard Annex #11

    点赞 评论 复制链接分享
  • weixin_39604983 weixin_39604983 5月前

    Absolutely a bug, thank you for the excellently written report. I'm pretty much as white and western as they come so Chinese characters are a bit of a new area for me - is the assumption that CJK characters are exactly (or supposed to be) double the width of latin characters a standard thing? Or is it just convention?

    Thanks again :)

    点赞 评论 复制链接分享
  • weixin_39715652 weixin_39715652 5月前

    I'm not a language expert. I don't know if my understading is correct.

    AFAK all Chinese characters should be shaped like a square, and all of the characters with the same font size should have the same width and height (no difference between proportional and monospace), while latin caracters should be shaped like a tall rectangle, and may have different width even with the same font size (unless use monospace font instead of proportional).

    For Japanease things get a little bit complex. Japanease uses three different character systems altogether. Some Japanease characters (which are borrowed from Chinese) should be shaped like a square(which are called "fullwidth characters"), some other characters (which are not borrowed from Chinese) should be shaped like a tall rectangle(which are called "halfwidth characters").

    I'm not famillar with Korean. From some Korean text I've seen I think Korean uses fullwidth characters for text and latin characters for punctuations. Japanease also uses latin punctuations, while Chinese has its own fullwidth punctuation characters, "," instead of "," for example.

    It is not required that those fullwidth characters are exactly double the width of latin characters, but when justification is important, like when writting program code, draw ASCII art, or any other situation that monospace font should be used, fullwidth CJK characters are supposed to be double the width of latin characters and the same height of latin characters, and halfwidth CJK characters are supposed to be the same width and the same height of latin characters. This is the only way to make justification correct.

    点赞 评论 复制链接分享
  • weixin_39604983 weixin_39604983 5月前

    fullwidth CJK characters are supposed to be double the width of latin characters and the same height of latin characters, and halfwidth CJK characters are supposed to be the same width and the same height of latin characters

    This is what I was looking for. I'll make sure to incorporate this into v3.

    点赞 评论 复制链接分享

相关推荐