dsqpx86002 2015-10-19 19:11
浏览 71
已采纳

在Go中加快JSON解析

We have transaction log files in which each transaction is a single line in JSON format. We often need to take selected parts of the data, perform a single time conversion, and feed results into another system in a specific format. I wrote a Python script that does this as we need, but I hoped that Go would be faster, and would give me a chance to start learning Go. So, I wrote the following:

package main
import "encoding/json"
import "fmt"
import "time"
import "bufio"
import "os"

func main() {

    sep := ","

    reader := bufio.NewReader(os.Stdin)

    for {
        data, _ := reader.ReadString('
')
        byt := []byte(data)

        var dat map[string]interface{}

        if err := json.Unmarshal(byt, &dat); err != nil {
            break
        }

        status := dat["status"].(string)
        a_status := dat["a_status"].(string)
        method := dat["method"].(string)
        path := dat["path"].(string)
        element_uid := dat["element_uid"].(string)
        time_local := dat["time_local"].(string)
        etime, _ := time.Parse("[02/Jan/2006:15:04:05 -0700]", time_local)
        fmt.Print(status, sep, a_status, sep, method, sep, path, sep, element_uid, sep, etime.Unix(), "
")
    }
}

That compiles without complaint, but I'm surprised at the lack of performance improvement. To test, I placed 2,000,000 lines of logs into a tmpfs (to ensure that disk I/O would not be a limitation) and compared the two versions of the script. My results:

$ time cat /mnt/ramdisk/logfile | ./stdin_conv > /dev/null 
real    0m51.995s

$ time cat /mnt/ramdisk/logfile | ./stdin_conv.py > /dev/null 
real    0m52.471s

$ time cat /mnt/ramdisk/logfile > /dev/null 
real    0m0.149s

How can this be made faster? I have made some rudimentary efforts. The ffjson project, for example, proposes to create static functions that make reflection unnecessary; however, I have failed so far to get it to work, getting the error:

Error: Go Run Failed for: /tmp/ffjson-inception810284909.go
STDOUT:

STDERR:
/tmp/ffjson-inception810284909.go:9:2: import "json_parse" is a program, not an importable package

:

Besides, wouldn't what I have above be considered statically typed? Possibly not-- I am positively dripping behind the ears where Go is concerned. I have tried selectively disabling different attributes in the Go code to see if one is especially problematic. None have had an appreciable effect on performance. Any suggestions on improving performance, or is this simply a case where compiled languages have no substantial benefit over others?

  • 写回答

3条回答 默认 最新

  • dtrnish3637 2015-10-19 19:30
    关注

    Try using a type to remove all this unnecessary assignment and type assertion;

    type RenameMe struct {
         Status string `json:"status"`
         Astatus string `json:"a_status"`
         Method string `json:"method"`
         Path string `json:"path"`
         ElementUid string `json:"element_uid"`
         TimeLocal time.Time `json:"time_local"`
         Etime time.Time // deal with this after the fact
    }
    
    data := &RenameMe{}
    if err := json.Unmarshal(byt, data); err != nil {
                break
            }
    
    data.Etime,  _ := time.Parse("[02/Jan/2006:15:04:05 -0700]", time_local)
    

    I'm not going to test this to ensure it outperforms your code but I bet it does by a large margin. Give it a try and let me know please.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(2条)

报告相同问题?

悬赏问题

  • ¥15 装 pytorch 的时候出了好多问题,遇到这种情况怎么处理?
  • ¥20 IOS游览器某宝手机网页版自动立即购买JavaScript脚本
  • ¥15 手机接入宽带网线,如何释放宽带全部速度
  • ¥30 关于#r语言#的问题:如何对R语言中mfgarch包中构建的garch-midas模型进行样本内长期波动率预测和样本外长期波动率预测
  • ¥15 ETLCloud 处理json多层级问题
  • ¥15 matlab中使用gurobi时报错
  • ¥15 这个主板怎么能扩出一两个sata口
  • ¥15 不是,这到底错哪儿了😭
  • ¥15 2020长安杯与连接网探
  • ¥15 关于#matlab#的问题:在模糊控制器中选出线路信息,在simulink中根据线路信息生成速度时间目标曲线(初速度为20m/s,15秒后减为0的速度时间图像)我想问线路信息是什么