dsoy71058 2014-07-27 01:55
浏览 53
已采纳

使用Go和App Engine的数据存储区导入和解析大型CSV文件

Locally I am successfully able to (in a task):

  • Open the csv
  • Scan through each line (using Scanner.Scan)
  • Map the parsed CSV line to my desired struct
  • Save the struct to datastore

I see that blobstore has a reader that would allow me toread the value directly using a streaming file-like interface. -- but that seems to have a limit of 32MB. I also see there's a bulk upload tool -- bulk_uploader.py -- but it won't do all the data-massaging I require and I'd like to limit writes (and really cost) of this bulk insert.

How would one effectively read and parse a very large (500mb+) csv file without the benefit of reading from local storage?

  • 写回答

2条回答 默认 最新

  • drblhw5731 2014-07-27 10:55
    关注

    You will need to look at the following options and see if it works for you :

    1. Looking at the large file size, you should consider using Google Cloud Storage for the file. You can use the command line utilities that GCS provides to upload your file to your bucket. Once uploaded, you can look at using the JSON API directly to work with the file and import it into your datastore layer. Take a look at the following: https://developers.google.com/storage/docs/json_api/v1/json-api-go-samples

    2. If this is like a one time import of a large file, another option could be spinning up a Google Compute VM, writing an App there to read from GCS and pass on the data via smaller chunks to a Service running in App Engine Go, that can then accept and persist the data.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论
查看更多回答(1条)

报告相同问题?

悬赏问题

  • ¥30 这是哪个作者做的宝宝起名网站
  • ¥60 版本过低apk如何修改可以兼容新的安卓系统
  • ¥25 由IPR导致的DRIVER_POWER_STATE_FAILURE蓝屏
  • ¥50 有数据,怎么建立模型求影响全要素生产率的因素
  • ¥50 有数据,怎么用matlab求全要素生产率
  • ¥15 TI的insta-spin例程
  • ¥15 完成下列问题完成下列问题
  • ¥15 C#算法问题, 不知道怎么处理这个数据的转换
  • ¥15 YoloV5 第三方库的版本对照问题
  • ¥15 请完成下列相关问题!