Locally I am successfully able to (in a task):
- Open the csv
- Scan through each line (using Scanner.Scan)
- Map the parsed CSV line to my desired struct
- Save the struct to datastore
I see that blobstore has a reader that would allow me toread the value directly using a streaming file-like interface.
-- but that seems to have a limit of 32MB. I also see there's a bulk upload tool -- bulk_uploader.py -- but it won't do all the data-massaging I require and I'd like to limit writes (and really cost) of this bulk insert.
How would one effectively read and parse a very large (500mb+) csv file without the benefit of reading from local storage?