dongnao2582 2017-05-09 03:55
浏览 222

使用Go客户端写入InfluxDB会导致随机错误

I am using the golang influx v2 client library to write to my influxdb. My code, based on the reference example in the client documentation is fairly straightforward:

func SaveMetricsToInflux(id string, rs Resultset, mtype string) {
    defer conf.Trace()()
    // Create a new HTTPClient
    c, err := client.NewHTTPClient(client.HTTPConfig{
        Addr:     conf.InfluxEP,
        Username: conf.InfluxUser,
        Password: conf.InfluxPasswd,
    })
    if err != nil {
        log.Println("Error connecting to the Influx server", err)
    }
    defer c.Close()

    bp, err := client.NewBatchPoints(client.BatchPointsConfig{
        Database:  conf.InfluxDb,
        Precision: "s",
    })
    if err != nil {
        log.Println("Error creating a NewBatchPoint", err)
    }
    for _, v := range rs.Values {
        tags := map[string]string{
            "id":    id,
            "mtype": mtype,
        }
        fields := map[string]interface{}{
            avgpcpu: v.MetricValue,
        }
        pt, err := client.NewPoint(
            tablename,
            tags,
            fields,
            time.Unix(v.MetricTime, 0),
        )
        if err != nil {
            log.Println("Error creating NewPoint ", err)
        }
        bp.AddPoint(pt)
        //      f, _ := pt.Fields()
        //      log.Printf("adding point %s >> %v >> %s >> %d >> %v", pt.Name(), pt.Tags(), pt.Time(), pt.UnixNano(), f)
    }
    if err := c.Write(bp); err != nil {
        log.Println("Error writing to Influx", err)
    }

}

In general, the Resultset will have about 4200 rows (sometimes its in lower 100s), and each of those is converted to a BatchPoint comprising two string tags, and one float field. The metric is retrieved from Cloudwatch and pushed into Influx, with the SaveMetricsToInflux method called in a loop, once per EC2 instance.

The problem I am facing is that almost every time, the write operation returns with a Partial Write error. The details of the error vary from attempt to next.

As an example, the following error came during my latest run (attempting to write 196 records):

Error writing to Influx {"error":"partial write: unable to parse 'metrics,id=i-094385fbc09268fd6,mtype=ec2-cpu avg-pcpu=1.662-cpu avg-pcpu=1.802 1494277200': invalid number unable to parse '6 1494244500': invalid field format unable to parse 'metrics,id=i-094385fbc09268fd6,mtype=ec': missing fields"}

As is seen, this looks like data corruption. First I thought that this was because of too many records being sent, but as seen in the last example above, it occurs also for low 100 records too.

Next, I thought, I might be corrupting the memory someway in my code. So I logged out each BatchPoint as it was added, and also logged out the HTTP Query data (from the Influx golang client) of the records being sent by HTTP. The data in both those places was pristine and correct, and yet I received the error.

The partial writes have so far resulted in a loss of less than a 100 records. Inside the Influxdb, this is also messing up my data by creating invalid measurements and data:

> show measurements
name: measurements
name
----
metricd01d7a  <-- this is invalid
metrics       <-- valid
s             <-- this is invalid

> select count(*) from metrics group by mtype
name: metrics
tags: mtype=             <-- this is invalid
time count_avg-pcpu
---- --------------
0    4

name: metrics
tags: mtype=ec2-cec2-cpu <-- this is invalid
time count_avg-pcpu
---- --------------
0    1

name: metrics            <-- valid
tags: mtype=ec2-cpu
time count_avg-pcpu
---- --------------
0    42210

Since I have not seen this issue widely reported, I have to assume this is uniquely occurring in my environment. In a previous development, I had used node.js and javascript API to write to the same InfluxDB (with about the same rows) without this issue occurring. Which seems to put the blame back onto the golang code I have written.

I am stumped on what else to do to debug this issue further. Any help or guidance would be greatly appreciated.

  • 写回答

0条回答 默认 最新

    报告相同问题?

    悬赏问题

    • ¥60 求一个简单的网页(标签-安全|关键词-上传)
    • ¥35 lstm时间序列共享单车预测,loss值优化,参数优化算法
    • ¥15 基于卷积神经网络的声纹识别
    • ¥15 Python中的request,如何使用ssr节点,通过代理requests网页。本人在泰国,需要用大陆ip才能玩网页游戏,合法合规。
    • ¥100 为什么这个恒流源电路不能恒流?
    • ¥15 有偿求跨组件数据流路径图
    • ¥15 写一个方法checkPerson,入参实体类Person,出参布尔值
    • ¥15 我想咨询一下路面纹理三维点云数据处理的一些问题,上传的坐标文件里是怎么对无序点进行编号的,以及xy坐标在处理的时候是进行整体模型分片处理的吗
    • ¥15 CSAPPattacklab
    • ¥15 一直显示正在等待HID—ISP