douwen1549 2014-07-10 05:29 采纳率: 100%
浏览 167
已采纳

如何使函数检测go中的字符串是否是二进制安全的(golang)?

I have received a few close requests says its unclear what I am asking. For me its extremely clear what I am asking and I might have added a few extra thoughts on the issue, but my question is extremely direct. Here is the goal/thesis of my question:

How does one detects if a string is binary safe or not in go.

A function like:

IsBinarySafe(str) //returns true if its safe and false if its not.

Please ask clarifying comments if there is something unclear. I always take a lot of care and time to make good question, so put as much effort as I do to my question and help me make it better so everyone can get as much benefit as I do.

Any comment after this are just things I have thought or attempted to solve this:


I assumed that there must exist a library that already does this but had a tough time finding it. If there isn't one, how do you implement this?

I was thinking of some solution but wasn't really convinced they were good solutions. One of them was to iterate over the bytes, and have a hash map of all the illegal byte sequences. I also thought of maybe writing a regex with all the illegal strings but wasn't sure if that was a good solution. I also was not sure if a sequence of bytes from other languages counted as binary safe. Say the typical golang example:

世界

Would:

IsBinarySafe(世界) //true or false?

Would it return true or false? I was assuming that all binary safe string should only use 1 byte. So iterating over it in the following way:

const nihongo = "日本語abc日本語"
    for i, w := 0, 0; i < len(nihongo); i += w {
        runeValue, width := utf8.DecodeRuneInString(nihongo[i:])
        fmt.Printf("%#U starts at byte position %d
", runeValue, i)
        w = width
    }

and returning false whenever the width was great than 1. These are just some ideas I had just in case there wasn't a library for something like this already but I wasn't sure.

  • 写回答

1条回答 默认 最新

  • dongshi3818 2014-07-10 12:54
    关注

    Binary safety has nothing to do with how wide a character is, it's mainly to check for non-printable characters more or less, like null bytes and such.

    From Wikipedia:

    Binary-safe is a computer programming term mainly used in connection with string manipulating functions. A binary-safe function is essentially one that treats its input as a raw stream of data without any specific format. It should thus work with all 256 possible values that a character can take (assuming 8-bit characters).

    I'm not sure what your goal is, almost all languages handle utf8/16 just fine now, however for your specific question there's a rather simple solution:

    // checks if s is ascii and printable, aka doesn't include tab, backspace, etc.
    func IsAsciiPrintable(s string) bool {
        for _, r := range s {
            if r > unicode.MaxASCII || !unicode.IsPrint(r) {
                return false
            }
        }
        return true
    }
    
    func main() {
        fmt.Printf("len([]rune(s)) = %d, len([]byte(s)) = %d
    ", len([]rune(s)), len([]byte(s)))
    
        fmt.Println(IsAsciiPrintable(s), IsAsciiPrintable("test"))
    }
    

    <kbd>playground</kbd>

    From unicode.IsPrint:

    IsPrint reports whether the rune is defined as printable by Go. Such characters include letters, marks, numbers, punctuation, symbols, and the ASCII space character, from categories L, M, N, P, S and the ASCII space character. This categorization is the same as IsGraphic except that the only spacing character is ASCII space, U+0020.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 基于卷积神经网络的声纹识别
  • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
  • ¥100 为什么这个恒流源电路不能恒流?
  • ¥15 有偿求跨组件数据流路径图
  • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
  • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
  • ¥15 CSAPPattacklab
  • ¥15 一直显示正在等待HID—ISP
  • ¥15 Python turtle 画图
  • ¥15 stm32开发clion时遇到的编译问题