Go does not go by "strings are for text, byte types are for other stuff" like some other languages (e.g. Python 3) do. "In Go, a string is in effect a read-only slice of bytes." The string
type has a few behaviors attached that are handy for dealing with UTF-8 text, but it'll hold whatever bytes you put in it. Text-handling stuff in the standard library is often written to work with []byte
s too, e.g. package bytes
mirrors package strings
and regexp
deals in either.
Given that there's no rule about text/binary semantically belonging in one type or the other, the choice to use []byte
was probably made for practical reasons. Since strings are read-only slices of bytes, almost all operations changing strings have to copy bytes to a new string instead of modifying the existing one. (String slicing is a key exception; it just makes a new string header that can point into the old string's bytes.)
Copying string contents for each operation leads to a quadratic slowdown as the string length and number of copies both grow with input size. On top of the direct cost of the copies, allocating the space for them makes garbage collection happen more often. For those reasons, almost everything that builds up content via a lot of small operations in Go uses a []byte
internally. That includes Go's JSON-marshalling code, and the strings.Builder
class added in Go 1.10.
(For similar reasons, Java and C# offer string-builder types as well and modern JavaScript VMs have clever tricks to defer copying bytes until after a long series of concat operations, such as V8's cons strings and SpiderMonkey's ropes.)
Because []byte
s are read-write and strings are read-only, converting one to the other also has to copy bytes. If MarshalJSON returned a string
, that would require making another copy of the content (and the associated load on the GC). Also, if you're ultimately going to do I/O with this, Write()
takes a byte slice, so for that you'd have to convert back, creating another copy. (To slightly mitigate that, some I/O types including *os.File
support WriteString()
as well. But not all do!)
So it makes more sense for json.Encoder
to return the []byte
it built up internally; you can of course call string(bytes)
on the result if you need a string
and the copying isn't a problem.
A bit out of the original question's scope, but often the best performing option is just to stream the output directly to an io.Writer
using a json.Encoder
. You never have to allocate the whole chunk of output at once, and it can make your code simpler as well since there's no temp variable and you can handle marshalling and I/O errors in one place.