doulang9521 2015-04-02 06:24
浏览 25
已采纳

Go的LeftStr,RightStr,SubStr

I believe there are no LeftStr(str,n) (take at most n first characters), RightStr(str,n) (take at most n last characters) and SubStr(str,pos,n) (take first n characters after pos) function in Go, so I tried to make one

// take at most n first characters
func Left(str string, num int) string {
    if num <= 0 {
        return ``
    }
    if num > len(str) {
        num = len(str)
    }
    return str[:num]
}

// take at most last n characters
func Right(str string, num int) string {
    if num <= 0 {
        return ``
    }
    max := len(str)
    if num > max {
        num = max
    }
    num = max - num
    return str[num:]
}

But I believe those functions will give incorrect output when the string contains unicode characters. What's the fastest solution for those function, is using for range loop is the only way?

  • 写回答

1条回答 默认 最新

  • drfu80954 2015-04-02 16:06
    关注

    As mentioned in already in comments, combining characters, modifying runes, and other multi-rune "characters" can cause difficulties.

    Anyone interested in Unicode handling in Go should probably read the Go Blog articles "Strings, bytes, runes and characters in Go" and "Text normalization in Go". In particular, the later talks about the golang.org/x/text/unicode/norm package which can help in handling some of this.

    You can consider several levels increasingly of more accurate (or increasingly more Unicode aware) spiting the first (or last) "n characters" from a string.

    1. Just use n bytes. This may split in the middle of a rune but is O(1), is very simple, and in many cases you know the input consists of only single byte runes. E.g. str[:n].

    2. Split after n runes. This may split in the middle of a character. This can be done easily, but at the expense of copying and converting with just string([]rune(str)[:n]). You can avoid the conversion and copying by using the unicode/utf8 package's DecodeRuneInString (and DecodeLastRuneInString) functions to get the length of each of the first n runes in turn and then return str[:sum] (O(n), no allocation).

    3. Split after the n'th "boundary". One way to do this is to use norm.NFC.FirstBoundaryInString(str) repeatedly or norm.Iter to find the byte position to split at and then return str[:pos].

    Consider the displayed string "cafés" which could be represented in Go code as: "cafés", "caf\u00E9s", or "caf\xc3\xa9s" which all result in the identical six bytes. Alternative it could represented as "cafe\u0301s" or "cafe\xcc\x81s" which both result in the identical seven bytes.

    The first "method" above may split those into "caf\xc3"+"\xa9s" and cafe\xcc"+"\x81s".

    The second may split them into "caf\u00E9"+"s" ("café"+"s") and "cafe"+"\u0301s" ("cafe"+"́s").

    The third should split them into "caf\u00E9"+"s" and "cafe\u0301"+"s" (both shown as "café"+"s").

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 个人网站被恶意大量访问,怎么办
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM
  • ¥15 划分vlan后不通了
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)
  • ¥15 自适应 AR 模型 参数估计Matlab程序
  • ¥100 角动量包络面如何用MATLAB绘制
  • ¥15 merge函数占用内存过大