du8864 2014-01-27 01:37
浏览 70
已采纳

在Go中使用BOM表读取文件

I need to read Unicode files that may or may not contain a byte-order mark. I could of course check the first few bytes of the file myself, and discard a BOM if I find one. But before I do, is there any standard way of doing this, either in the core libraries or a third party?

  • 写回答

3条回答 默认 最新

  • duan0531 2014-01-27 07:45
    关注

    No standard way, IIRC (and the standard library would really be a wrong layer to implement such a check in) so here are two examples of how you could deal with it yourself.

    One is to use a buffered reader above your data stream:

    import (
        "bufio"
        "os"
        "log"
    )
    
    func main() {
        fd, err := os.Open("filename")
        if err != nil {
            log.Fatal(err)
        }
        defer closeOrDie(fd)
        br := bufio.NewReader(fd)
        r, _, err := br.ReadRune()
        if err != nil {
            log.Fatal(err)
        }
        if r != '\uFEFF' {
            br.UnreadRune() // Not a BOM -- put the rune back
        }
        // Now work with br as you would do with fd
        // ...
    }
    

    Another approach, which works with objects implementing the io.Seeker interface, is to read the first three bytes and if they're not BOM, io.Seek() back to the beginning, like in:

    import (
        "os"
        "log"
    )
    
    func main() {
        fd, err := os.Open("filename")
        if err != nil {
            log.Fatal(err)
        }
        defer closeOrDie(fd)
        bom := [3]byte
        _, err = io.ReadFull(fd, bom[:])
        if err != nil {
            log.Fatal(err)
        }
        if bom[0] != 0xef || bom[1] != 0xbb || bom[2] != 0xbf {
            _, err = fd.Seek(0, 0) // Not a BOM -- seek back to the beginning
            if err != nil {
                log.Fatal(err)
            }
        }
        // The next read operation on fd will read real data
        // ...
    }
    

    This is possible since instances of *os.File (what os.Open() returns) support seeking and hence implement io.Seeker. Note that that's not the case for, say, Body reader of HTTP responses since you can't "rewind" it. bufio.Buffer works around this feature of non-seekable streams by performing some buffering (obviously) — that's what allows you yo UnreadRune() on it.

    Note that both examples assume the file we're dealing with is encoded in UTF-8. If you need to deal with other (or unknown) encoding, things get more complicated.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 Power query添加列问题
  • ¥50 Kubernetes&Fission&Eleasticsearch
  • ¥15 有没有帮写代码做实验仿真的
  • ¥15 報錯:Person is not mapped,如何解決?
  • ¥30 vmware exsi重置后登不上
  • ¥15 易盾点选的cb参数怎么解啊
  • ¥15 MATLAB运行显示错误,如何解决?
  • ¥15 c++头文件不能识别CDialog
  • ¥15 Excel发现不可读取的内容
  • ¥15 关于#stm32#的问题:CANOpen的PDO同步传输问题