I have some large json files I want to parse, and I want to avoid loading all of the data into memory at once. I'd like a function/loop that can return me each character one at a time.
I found this example for iterating over words in a string, and the ScanRunes function in the bufio package looks like it could return a character at a time. I also had the ReadRune
function from bufio mostly working, but that felt like a pretty heavy approach.
EDIT
I compared 3 approaches. All used a loop to pull content from either a bufio.Reader or a bufio.Scanner.
- Read runes in a loop using
.ReadRune
on abufio.Reader
. Checked for errors from the call to.ReadRune
. - Read bytes from a
bufio.Scanner
after calling.Split(bufio.ScanRunes)
on the scanner. Called.Scan
and.Bytes
on each iteration, checking.Scan
call for errors. - Same as #2 but read text from a
bufio.Scanner
instead of bytes using.Text
. Instead of joining a slice of runes withstring([]runes)
, I joined an slice of strings withstrings.Join([]strings, "")
to form the final blobs of text.
The timing for 10 runs of each on a 23 MB json file was:
0.65 s
2.40 s
0.97 s
So it looks like ReadRune
is not too bad after all. It also results in smaller less verbose call because each rune is fetched in 1 operation (.ReadRune
) instead of 2 (.Scan
and .Bytes
).