dsoy71058 2014-07-27 01:55
浏览 53
已采纳

使用Go和App Engine的数据存储区导入和解析大型CSV文件

Locally I am successfully able to (in a task):

  • Open the csv
  • Scan through each line (using Scanner.Scan)
  • Map the parsed CSV line to my desired struct
  • Save the struct to datastore

I see that blobstore has a reader that would allow me toread the value directly using a streaming file-like interface. -- but that seems to have a limit of 32MB. I also see there's a bulk upload tool -- bulk_uploader.py -- but it won't do all the data-massaging I require and I'd like to limit writes (and really cost) of this bulk insert.

How would one effectively read and parse a very large (500mb+) csv file without the benefit of reading from local storage?

  • 写回答

2条回答 默认 最新

  • drblhw5731 2014-07-27 10:55
    关注

    You will need to look at the following options and see if it works for you :

    1. Looking at the large file size, you should consider using Google Cloud Storage for the file. You can use the command line utilities that GCS provides to upload your file to your bucket. Once uploaded, you can look at using the JSON API directly to work with the file and import it into your datastore layer. Take a look at the following: https://developers.google.com/storage/docs/json_api/v1/json-api-go-samples

    2. If this is like a one time import of a large file, another option could be spinning up a Google Compute VM, writing an App there to read from GCS and pass on the data via smaller chunks to a Service running in App Engine Go, that can then accept and persist the data.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥15 虚拟机打包apk出现错误
  • ¥30 最小化遗憾贪心算法上界
  • ¥15 用visual studi code完成html页面
  • ¥15 聚类分析或者python进行数据分析
  • ¥15 逻辑谓词和消解原理的运用
  • ¥15 三菱伺服电机按启动按钮有使能但不动作
  • ¥15 js,页面2返回页面1时定位进入的设备
  • ¥50 导入文件到网吧的电脑并且在重启之后不会被恢复
  • ¥15 (希望可以解决问题)ma和mb文件无法正常打开,打开后是空白,但是有正常内存占用,但可以在打开Maya应用程序后打开场景ma和mb格式。
  • ¥20 ML307A在使用AT命令连接EMQX平台的MQTT时被拒绝