I'm going to develop a simple TCP client and server and I want to achieve high throughput (300000 Requests/Second) which is easy to reach with Cpp or C TCP client and server on a server hardware. I mean a server with 48 Cores and 64G Memory.
On my testbed, both client and server have 10G network interface card and I have receive-side-scaling at server side and transmit-packet-steering enabled at the client.
I configure the client to send 10 thousand requests per second. I just run multiple instances of Go go run client.go
from a bash script to increase the throughput. However, in this way, Go is going to create lots of threads at the operating systems and a large number of threads results in high context switching cost, and I could not approach such throughputs. I suspected the number of Go instances I'm running from the command line. The code below is the code snippet for the client in the approach:
func Main(cmd_rate_int int, cmd_port string) {
//runtime.GOMAXPROCS(2) // set maximum number of processes to be used by this applications
//var rate float64 = float64(rate_int)
rate := float64(cmd_rate_int)
port = cmd_port
conn, err := net.Dial("tcp", port)
if err != nil {
fmt.Println("ERROR", err)
os.Exit(1)
}
var my_random_number float64 = nextTime(rate) * 1000000
var my_random_int int = int(my_random_number)
var int_message int64 = time.Now().UnixNano()
byte_message := make([]byte, 8)
go func(conn net.Conn) {
buf := make([]byte, 8)
for true {
_, err = io.ReadFull(conn, buf)
now := time.Now().UnixNano()
if err != nil {
return
}
last := int64(binary.LittleEndian.Uint64(buf))
fmt.Println((now - last) / 1000)
}
return
}(conn)
for true {
my_random_number = nextTime(rate) * 1000000
my_random_int = int(my_random_number)
time.Sleep(time.Microsecond * time.Duration(my_random_int))
int_message = time.Now().UnixNano()
binary.LittleEndian.PutUint64(byte_message, uint64(int_message))
conn.Write(byte_message)
}
}
So I try to run all my Go threads by calling go client()
in the main
so I do not run multiple instances in the Linux command line. I thought it may be a better idea. And it is really a better idea basically and the number of threads doesn't increase toward 700 or so in the operating system. But the throughput still is low and it seems it doesn't employ all capability of the underlying hardware. Actually, you may want to see the code I have run in the second approach:
func main() {
//runtime.GOMAXPROCS(2) // set maximum number of processes to be used by this applications
args := os.Args[1:]
rate_int, _ := strconv.Atoi(args[0])
client_size, _ := strconv.Atoi(args[1])
port := args[2]
i := 0
for i <= client_size {
go client.Main(rate_int, port)
i = i + 1
}
for true {
}
}
I was wondering what is the best practice for in order to reach high throughput? I have always heard that Go is lightweight and performant and pretty comparable with C/Cpp pthread. However, I think in terms of performance still C/Cpp is far far better than Go. I might do something really wrong on this issue, so I would be happy if anybody can help to achieve high throughput with Go.