weixin_39665787
2020-12-06 19:01 阅读 1

Increasing writing time

Hello,

I am moving financial data from .csv files to InfluxDB. I made a test with 500,000 rows of data containing only one field. This is the result of the time it takes to insert the data into the database.


Converting lines 0 to 100000.
Reading csv...
Reading time: 17.01147222518921
Writing into influx
Writing time: 4.185628175735474

Converting lines 100000 to 200000.
Reading csv...
Reading time: 44.90596842765808
Writing into influx
Writing time: 11.27813982963562

Converting lines 200000 to 300000.
Reading csv...
Reading time: 60.74674725532532
Writing into influx
Writing time: 11.742997169494629

Converting lines 300000 to 400000.
Reading csv...
Reading time: 67.56130242347717
Writing into influx
Writing time: 14.29999566078186

Converting lines 400000 to 500000.
Reading csv...
Reading time: 81.25416588783264
Writing into influx
Writing time: 21.573907613754272

Converting lines 500000 to 600000.
Reading csv...
Reading time: 158.86228156089783
Writing into influx
Writing time: 23.835582971572876

As you can see, the writing time at the beginning is pretty good, inserting around 25,000 rows per second. But each insertion is slower. At the end, Influx is inserting less than 5,000 rows per second, which is a dramatic decrease. I want to have tables (measurements, sorry...) with hundreds of millions of rows of financial data. I am afraid that the writing time will be extremely slow. Is there anything that I can do?

The code for writing the data is pretty simple. The batch size is 5000 as I saw you recommended in another issue. I use this function:


def write_influx(df, measurement, database, verbose=True):
    t0 = time()

    # Writing data.
    client = DataFrameClient(database=database)
    client.write_points(df, measurement, batch_size=5000, protocol='json')
    client.close()

    if verbose:
        print("Writing time: {}".format(time() - t0))

I am very pleased with the reading speed of Influx, but I would like to increase the writing performance. Anything that I can do?

Thanks.

该提问来源于开源项目:influxdata/influxdb-python

  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享

4条回答 默认 最新

  • weixin_39852121 weixin_39852121 2020-12-06 19:01

    +1

    点赞 评论 复制链接分享
  • weixin_39907131 weixin_39907131 2020-12-06 19:01

    Same thing here. I'm inserting 1 million points with a batch size of 5000 and it takes 368.54 seconds (2714 per second).

    点赞 评论 复制链接分享
  • weixin_39634351 weixin_39634351 2020-12-06 19:01

    Same here, seems to be a serious problem with the driver.

    EDIT We switched to "requests" and line protocol and seen a 10x write speed increase.

    点赞 评论 复制链接分享
  • weixin_39603476 weixin_39603476 2020-12-06 19:01

    Did you use the HTTP APIs with requests and line protocol to write to influx DB instead of using the python library? I am also considering doing the same. Could you give me some details if possible.

    点赞 评论 复制链接分享

相关推荐