I've been working on a vpn written in go and I'm starting to try to optimize the data flow. From a cursory glance, the implementation code seems sound as there are no issues with memory leaking and CPU doesn't seem to be a constraint.
So I moved to pprof and the problem I am seeing is that most of the execution time is spent in syscall.Syscall. I did a 6 second profile of a running iperf throughput test and this is what I see:
This test is being run with both the client and server inside of docker containers with the client getting a --link to the server. Running iperf on the base bridge networking yields around 40Gbit of throughput, iperf over this vpn impl over the top of the same, nets about 500Mbit.
A simple htop shows that 3/4 of the time is spent in the system.
I've tried a couple approaches to attempt speeding up the single-client case, but I can't seem to find a way to mitigate writing packets in a vpn server... NB: iperf uses full MTU-sized packets during its test which limits some obvious optimizations.
listing Syscall:
Not sure why this is showing the CMPQ is taking all the time, I'd think that should be attributed to SYSCALL.