I have some large json files I want to parse, and I want to avoid loading all of the data into memory at once. I'd like a function/loop that can return me each character one at a time.
I found this example for iterating over words in a string, and the ScanRunes function in the bufio package looks like it could return a character at a time. I also had the ReadRune function from bufio mostly working, but that felt like a pretty heavy approach.
EDIT
I compared 3 approaches. All used a loop to pull content from either a bufio.Reader or a bufio.Scanner.
- Read runes in a loop using
.ReadRuneon abufio.Reader. Checked for errors from the call to.ReadRune. - Read bytes from a
bufio.Scannerafter calling.Split(bufio.ScanRunes)on the scanner. Called.Scanand.Byteson each iteration, checking.Scancall for errors. - Same as #2 but read text from a
bufio.Scannerinstead of bytes using.Text. Instead of joining a slice of runes withstring([]runes), I joined an slice of strings withstrings.Join([]strings, "")to form the final blobs of text.
The timing for 10 runs of each on a 23 MB json file was:
0.65 s2.40 s0.97 s
So it looks like ReadRune is not too bad after all. It also results in smaller less verbose call because each rune is fetched in 1 operation (.ReadRune) instead of 2 (.Scan and .Bytes).