I am going through the book Building Git by James Coglan, where James walks you through implementing a basic version of Git in Ruby. I decided to make things more complicated for myself by doing my implementation in Go.
I've gotten to the part where I need to store compressed hashes of file contents into a tree to write to disk, but I am having trouble doing this kind of hex compression/packing that Git is looking for.
Here is the Ruby code im working off of
ENTRY_FORMAT = "A7Z*H40"
MODE = "100644"
FILE_NAME = "tree.rb"
SHA = "baae99010b237a699ff0aba02fd5310c18903b1b"
[MODE, FILE_NAME , SHA].pack(ENTRY_FORMAT)
the Ruby pack method apparently:
The Array#pack method takes an array of various kinds of values and returns a string that represents those values. Exactly how each value gets represented in the string is determined by the format string we pass to pack.
The encoding for the MODE
and FILE_NAME
I think I am pretty good on. It's the last part that encodes the sha that I am struggling with.
• H40: this encodes a string of forty hexadecimal digits, entry.oid, by packing each pair of digits into a single byte
It's the "packing each pair of digits into a single byte that I can't get my head around. This is my current attempt:
mode := 100644
fileName := "tree.go"
sha:= "baae99010b237a699ff0aba02fd5310c18903b1b"
// slice of strings for constructing the packed sha
var eid []string
// iterate through each character in id
for i := 0; i < len(sha); i += 2 {
// gathering them in pairs of two
one, two := sha[i], sha[i+1]
// compress two digits into one byte
// using bitwise or?? addition?? bit shifting?? not sure.
eid = append(eid, string(one|two))
}
// concat the new packed id with the mode and file name.
stringRep := fmt.Sprintf("%-7d", mode) + fileName + "\x00" + strings.Join(eid, "")
So for some reason that I can't figure out, the string representation of a tree entry that function produces isn't compatible with how Git stores trees on disk. I've tried shifting the bits before or
ing them, and I've tried just adding the bytes together, but nothing seems to be working. I basically need to replicate the behavior of the Ruby Array#pack
method in a way that Git will accept.
Any guidance or advice is greatly appreciated. I'd be happy to explain more or post more code samples if necessary. Thank you so much for your time!
P.S. more context around the packing git is performing, from Building Git
Git is storing the ID of each entry in a packed format, using twenty bytes for each one. Each hexadecimal digit represents a number from zero to fifteen, where ten is represented by a, eleven by b, and so on up to f for fifteen. In a forty-digit object ID, each digit stands for four bits of a 160-bit number. Instead of splitting those bits into forty chunks of four bits each, we can split it into twenty blocks of eight bits—and eight bits is one byte. So all that’s happening here is that the 160-bit object ID is being stored in binary as twenty bytes, rather than as forty characters standing for hexadecimal digits.