I have a function that I would like to provide an assembly implementation for
on amd64
architecture. For the sake of discussion let's just suppose it's an
Add
function, but it's actually more complicated than this. I have the
assembly version working but my question concerns getting the godoc to display
correctly. I have a feeling this is currenty impossible, but I wanted to seek
advice.
Some more details:
- The assembly implementation of this function contains only a few instructions. In particular, the mere cost of calling the function is a significant part of the entire cost.
- It makes use of special instructions (
BMI2
) therefore can only be used following aCPUID
capability check.
The implementation is structured like this gist. At a high level:
- In the generic (non-
amd64
case) the function is defined by delegating toaddGeneric
. - In the
amd64
case the function is actually a variable, initially set toaddGeneric
but replaced byaddAsm
in theinit
function if acpuid
check passes.
This approach works. However the godoc output is crappy because in the
amd64
case the function is actually a variable. Note godoc appears to be
picking up the same build tags as the machine it's running on. I'm not sure
what godoc.org
would do.
Alternatives considered:
- The
Add
function delegates toaddImpl
. Then we pull some similar trick to replaceaddImpl
in theamd64
case. The problem with this is (in my experiments) Go doesn't seem to be able to inline the call, and the assembly is now wrapped in two function calls. Since the assembly is so small already this has a noticable impact on performance. - In the
amd64
case we define a plain functionAdd
that has theuseAsm
check inside it, and calls one ofaddGeneric
andaddAsm
depending on the result. This would have an even worse impact on performance.
So I guess the questions are:
- Is there a better way to structure the code to achieve the performance I want, and have it appear properly in documentation.
- If there is no alternative, is there some other way to "trick" godoc?