dongzhanbi0027 2018-02-02 05:36 采纳率: 0%
浏览 31

从bufio读取文件,并通过文件进行半复杂的排序

So there may be questions like this but its not a super easy thing to google. Basically I have a file thats a set of protobufs encoded and sequenced as they normally are from the protobuf spec.

So think of the bytes values being chunked something like this throughout the file:

[EncodeVarInt(size of protobuf struct)] [protobuf stuct bytes]

So you have a few bytes read one at a time that are used for large jump of a read on our protof structure.

My implementation using the os ReadAt method on a file currently looks something like this.

// getting the next value in a file context feature 
func (geobuf *Geobuf_Reader) Next() bool {
    if geobuf.EndPos <= geobuf.Pos {
        return false
    } else {
        startpos := int64(geobuf.Pos)

        for int(geobuf.Get_Byte(geobuf.Pos)) > 127 {
            geobuf.Pos += 1
        }
        geobuf.Pos += 1

        sizebytes := make([]byte,geobuf.Pos-int(startpos))

        geobuf.File.ReadAt(sizebytes,startpos)

        size,_ := DecodeVarint(sizebytes)

        geobuf.Feat_Pos = [2]int{int(size),geobuf.Pos}
        geobuf.Pos = geobuf.Pos+int(size)

        return true
    }
    return false
}

//  reads a geobuf feature as geojson
func (geobuf *Geobuf_Reader) Feature() *geojson.Feature {
    // getting raw bytes
    a := make([]byte,geobuf.Feat_Pos[0])
    geobuf.File.ReadAt(a,int64(geobuf.Feat_Pos[1]))

    return Read_Feature(a)
}

How can I implement something like bufio or other chunked reading mechanisms to speed up so many file ReadAt's? Most bufio implementations I've seen are for having a specific delimitter. Thanks in advance hopefully this wasn't a horrible question.

  • 写回答

1条回答 默认 最新

  • dongqin5604 2018-02-02 06:22
    关注

    Package bufio

    import "bufio" 
    

    type SplitFunc

    SplitFunc is the signature of the split function used to tokenize the input. The arguments are an initial substring of the remaining unprocessed data and a flag, atEOF, that reports whether the Reader has no more data to give. The return values are the number of bytes to advance the input and the next token to return to the user, plus an error, if any. If the data does not yet hold a complete token, for instance if it has no newline while scanning lines, SplitFunc can return (0, nil, nil) to signal the Scanner to read more data into the slice and try again with a longer slice starting at the same point in the input.

    If the returned error is non-nil, scanning stops and the error is returned to the client.

    The function is never called with an empty data slice unless atEOF is true. If atEOF is true, however, data may be non-empty and, as always, holds unprocessed text.

    type SplitFunc func(data []byte, atEOF bool) (advance int, token []byte, err error)
    

    Use bufio.Scanner and write a custom protobuf struct SplitFunc.

    评论

报告相同问题?

悬赏问题

  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥30 BC260Y用MQTT向阿里云发布主题消息一直错误
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据
  • ¥15 Vue3 大型图片数据拖动排序
  • ¥15 Centos / PETGEM
  • ¥15 划分vlan后不通了
  • ¥20 用雷电模拟器安装百达屋apk一直闪退
  • ¥15 算能科技20240506咨询(拒绝大模型回答)