dtcu5885 2015-12-21 02:12
浏览 270

将换行符分隔的JSON Blob的整个文件读取到内存中,并以golang中最少的转换量对每个Blob进行编组?

I'm new to go, so don't know a whole lot about the language specific constructs.

My use case is first to read into memory an input file containing JSON blobs that are newline delimited. From this "array" of JSON source, I'd like to unmarshal each array element to deal with it in golang. The expected structure mapping is already defined.

I typically like to read all lines at once, so ioutil.ReadFile() as mentioned in How can I read a whole file into a string variable in Golang? seems like a good choice. And json.Unmarshal appears to take byte array as the source. But if I'm using ReadFile(), I have a single array of bytes for the whole file. How might I extract slices of this byte array such that the newline bytes (as delimiters) are skipped and each slice is one of those JSON blobs? I'd assume the best technique is one that doesn't do or minimizes data type conversions. As the easy hack would be something like convert the byte array to string, split the newline delimited string to array then cast each string array element back to bytes to pass to json.Unmarshal. I'd prefer the optimized approach but not sure how to tackle the implementation algorithm details in go, could use some tips here.

Ideally, I'd like the preprocessing done beforehand, so that I'm not dealing with the content of the JSON byte array from file as I'm iterating over the slices, etc. Rather I'd like to preprocess the single byte array read from file into an array of byte array slices, with all the newline bytes removed, each slice being the segments that were delimited by newline.

  • 写回答

1条回答 默认 最新

  • douju8782 2015-12-21 02:15
    关注

    Use bufio.Scanner to read a line at a time:

     f, err := os.Open(fname)
     if err != nil {
         // handle error
     }
     s := bufio.NewScanner(f)
     for s.Scan() {
        var v ValueTypeToUnmarshalTo
        if err := json.Unmarshal(s.Bytes(), &v); err != nil {
           //handle error
        }
        // do something with v
    }
    if s.Err() != nil {
        // handle scan error
    }
    

    or use ioutil.ReadFile to slurp up the entire file and bytes.Split to break the file into lines:

     p, err := ioutil.ReadFile(fname)
     if err != nil {
        // handle error
     }
     for _, line := range bytes.Split(p, []byte{'
    '}) {
        var v ValueTypeToUnmarshalTo
        if err := json.Unmarshal(line, &v); err != nil {
           //handle error
        }
        // do something with v
     }
    

    or use the json.Decoder built-in streaming feature to read mulitple values from the file:

     f, err := os.Open(fname)
     if err != nil {
        // handle error
     }
     d := json.NewDecoder(f)
     for {
        var v ValueTypeToUnmarshalTo
        if err := d.Decode(&v); err == io.EOF {
           break // done decoding file
        } else if err != nil {
           // handle error
        }
        // do something with v
    }
    

    <kbd>Run the code on the playground</kbd>

    The ioutil.ReadFile approach uses more memory than the other approaches (one byte for each byte in file plus one slice header for each line).

    Because the decoder ignores whitespace following a JSON value, the three approaches handle line terminators.

    There are no data conversions in any of these approaches other than those inherent to unmarshalling JSON bytes to Go values.

    评论

报告相同问题?

悬赏问题

  • ¥15 应该如何判断含间隙的曲柄摇杆机构,轴与轴承是否发生了碰撞?
  • ¥15 vue3+express部署到nginx
  • ¥20 搭建pt1000三线制高精度测温电路
  • ¥15 使用Jdk8自带的算法,和Jdk11自带的加密结果会一样吗,不一样的话有什么解决方案,Jdk不能升级的情况
  • ¥15 画两个图 python或R
  • ¥15 在线请求openmv与pixhawk 实现实时目标跟踪的具体通讯方法
  • ¥15 八路抢答器设计出现故障
  • ¥15 opencv 无法读取视频
  • ¥15 按键修改电子时钟,C51单片机
  • ¥60 Java中实现如何实现张量类,并用于图像处理(不运用其他科学计算库和图像处理库))