去syscall v.s. C系统调用

Go, and C both involve system calls directly (Technically, C will call a stub).

Technically, write is both a system call and a C function (at least on many systems). However, the C function is just a stub which invokes the system call. Go does not call this stub, it invokes the system call directly, which means that C is not involved here

From Differences between C write call and Go syscall.Write

My benchmark shows, pure C system call is 15.82% faster than pure Go system call in the latest release (go1.11).

What did I miss? What could be a reason and how to optimize them?

Benchmarks:

Go:

package main_test

import (
    "syscall"
    "testing"
)

func writeAll(fd int, buf []byte) error {
    for len(buf) > 0 {
        n, err := syscall.Write(fd, buf)
        if n < 0 {
            return err
        }
        buf = buf[n:]
    }
    return nil
}

func BenchmarkReadWriteGoCalls(b *testing.B) {
    fds, _ := syscall.Socketpair(syscall.AF_UNIX, syscall.SOCK_STREAM, 0)
    message := "hello, world!"
    buffer := make([]byte, 13)
    for i := 0; i < b.N; i++ {
        writeAll(fds[0], []byte(message))
        syscall.Read(fds[1], buffer)
    }
}

#include <time.h>
#include <stdio.h>
#include <unistd.h>
#include <sys/socket.h>

int write_all(int fd, void* buffer, size_t length) {
    while (length > 0) {
        int written = write(fd, buffer, length);
        if (written < 0)
            return -1;
        length -= written;
        buffer += written;
    }
    return length;
}

int read_call(int fd, void *buffer, size_t length) {
    return read(fd, buffer, length);
}

struct timespec timer_start(){
    struct timespec start_time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &start_time);
    return start_time;
}

long timer_end(struct timespec start_time){
    struct timespec end_time;
    clock_gettime(CLOCK_PROCESS_CPUTIME_ID, &end_time);
    long diffInNanos = (end_time.tv_sec - start_time.tv_sec) * (long)1e9 + (end_time.tv_nsec - start_time.tv_nsec);
    return diffInNanos;
}

int main() {
    int i = 0;
    int N = 500000;
    int fds[2];
    char message[14] = "hello, world!\0";
    char buffer[14] = {0};

    socketpair(AF_UNIX, SOCK_STREAM, 0, fds);
    struct timespec vartime = timer_start();
    for(i = 0; i < N; i++) {
        write_all(fds[0], message, sizeof(message));
        read_call(fds[1], buffer, 14);
    }
    long time_elapsed_nanos = timer_end(vartime);
    printf("BenchmarkReadWritePureCCalls\t%d\t%.2ld ns/op
", N, time_elapsed_nanos/N);
}

340 different running, each C running contains 500000 executions, and each Go running contains b.N executions (mostly 500000, few times executed in 1000000 times):

T-Test for 2 Independent Means: The t-value is -22.45426. The p-value is < .00001. The result is significant at p < .05.

T-Test Calculator for 2 Dependent Means: The value of t is 15.902782. The value of p is < 0.00001. The result is significant at p ≤ 0.05.

Update: I managed the proposal in the answers and wrote another benchmark, it shows the proposed approach significantly drops the performance of massive I/O calls, its performance close to CGO calls.

Benchmark:

func BenchmarkReadWriteNetCalls(b *testing.B) {
    cs, _ := socketpair()
    message := "hello, world!"
    buffer := make([]byte, 13)
    for i := 0; i < b.N; i++ {
        cs[0].Write([]byte(message))
        cs[1].Read(buffer)
    }
}

func socketpair() (conns [2]net.Conn, err error) {
    fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0)
    if err != nil {
        return
    }
    conns[0], err = fdToFileConn(fds[0])
    if err != nil {
        return
    }
    conns[1], err = fdToFileConn(fds[1])
    if err != nil {
        conns[0].Close()
        return
    }
    return
}

func fdToFileConn(fd int) (net.Conn, error) {
    f := os.NewFile(uintptr(fd), "")
    defer f.Close()
    return net.FileConn(f)
}

The above figure shows, 100 different running, each C running contains 500000 executions, and each Go running contains b.N executions (mostly 500000, few times executed in 1000000 times)

展开全部

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duangenshi9836 2018-09-12 07:14
关注
My benchmark shows, pure C system call is 15.82% faster than pure Go system call in the latest release (go1.11).

What did I miss? What could be a reason and how to optimize them?

The reason is that while both C and Go (on a typical platform Go supports—such as Linux or *BSD or Windows) are compiled down to machine code, Go-native code runs in an environment quite different from that of C.

The two chief differences to C are:

Go code runs in the context of so-called goroutines which are freely scheduled by the Go runtime on different OS threads.

Goroutines use their own (growable and reallocatable) lightweight stacks which have nothing to do with the OS-supplied stack C code uses.

So, when Go code wants to make a syscall, quite a lot should happen:

The goroutine which is about to enter a syscall must be "pinned" to the OS thread on which it's currently running.

The execution must be switched to use the OS-supplied C stack.

The necessary preparation in the Go runtime's scheduler are made.

The goroutine enters the syscall.

Upon exiting the execution of the goroutine has to be resumed, which is a relatively involved process in itself which may be additionaly hampered if the goroutine was in the syscall for too long and the scheduler removed the so-called "processor" from under that goroutine, spawned another OS thread and made that processor run another goroutine ("processors", or Ps are thingies which run goroutines on OS threads).

Update to answer the OP's comment

<…> Thus there is no way to optimize and I must suffer that if I make massive IO calls, mustn't I?

It heavily depends on the nature of the "massive I/O" you're after.

If your example (with socketpair(2)) is not toy, there is simply no reason to use syscalls directly: the FDs returned by socketpair(2) are "pollable" and hence the Go runtime may use its native "netpoller" machinery to perform I/O on them. Here is a working code from one of my projects which properly "wraps" FDs produced by socketpair(2) so that they can be used as "regular" sockets (produced by functions from the net standard package):

func socketpair() (net.Conn, net.Conn, error) { fds, err := syscall.Socketpair(syscall.AF_LOCAL, syscall.SOCK_STREAM, 0) if err != nil { return nil, nil, err } c1, err := fdToFileConn(fds[0]) if err != nil { return nil, nil, err } c2, err := fdToFileConn(fds[1]) if err != nil { c1.Close() return nil, nil, err } return c1, c2, err } func fdToFileConn(fd int) (net.Conn, error) { f := os.NewFile(uintptr(fd), "") defer f.Close() return net.FileConn(f) }

If you're talking about some other sort of I/O, the answer is that yes, syscalls are not really cheap and if you must do lots of them, there are ways to work around their cost (such as offloading to some C code—linked in or hooked up as an external process—which would somehow batch them so that each call to that C code would result in several syscalls done by the C side).
展开全部

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报
编辑

预览
轻敲空格完成输入
显示为

卡片

标题

链接
评论

按下Enter换行，Ctrl+Enter发表内容

编辑

预览

报告相同问题？

关注问题

C写调用和Go syscall.Write之间的区别
2018-08-29 08:06

回答 1 已采纳 With write, there are only two cases to consider: If it fails, the result is -1 and errno is set
syscall.MustLoadDll.MustFindProc引发“找不到指定的过程” c++
2018-12-11 01:37

回答 1 已采纳 Chances are high you've hit the effect of your C++ compiler applying name mangling to the name of
Golang：如何在Linux上使用syscall.Syscall？
2015-12-02 04:54

回答 1 已采纳 Linux syscalls are used directly without loading a library, exactly how depends on which system ca
MIT 6.S081 Lab Two -- 系统调用
2023-06-11 15:27

Binary Oracle的博客 MIT 6.S081 Lab Two
syscall.Errno中的字符串函数
2018-11-27 00:32

回答 1 已采纳 The answer is found in the fmt documentation here: If the format (which is implicitly %v for P
syscall.Sockaddr类型断言 linux
2014-09-03 15:02

回答 1 已采纳 Sockaddr is an interface, so Recvmsg can return different types that fulfill that interface. For
syscall.LoadDLL（）在除一台以外的所有计算机上均失败；我该怎么办？
2017-09-27 05:21

回答 1 已采纳 Your .DLL probably depends on some other .DLL not installed on these other systems. Dependency Wal
MIT 6.S081 LAB2 系统调用笔记
2022-04-10 09:37

我不会code的博客 6.s081 LAB2 实验笔记
golang在syscall.Mount中没有这样的设备
2016-02-18 05:27

回答 1 已采纳 You didn't specify your OS but I think the problem is the same on many implementations. On Linux
尝试安装Go-SQL-Driver时发生错误：未定义：syscall.Conn
2019-06-21 00:22

回答 1 已采纳 Go-MySQL-Driver only supports Go 1.9 or later. You are using Go 1.8, and the syscall.Conn interfac
golang错误：对未定义标识符“ syscall.TUNSETIFF”的引用
2015-07-02 02:10

回答 1 已采纳 So, I found the answer. The problem was that gccgo didn't define TUNSETIFF for my arch. I defined
MIT6.S081 第二章操作系统架构教材翻译
2024-11-22 12:09

为了前进而后退，为了走直路而走弯路的博客 Xv6（和其他Unix操作系统一样）中的隔离单位是一个进程。进程抽象防止一个进程破坏或监视另一个进程的内存、CPU、文件描述符等。它还防止一个进程破坏内核本身，这样一个进程就不能破坏内核的隔离机制。内核必须小心...
c语言调用 .cpp,C语言三方库的调用和编写
2021-05-20 22:54

jacknrose的博客 -x 从静态库文件中抽取文件objfile -t 打印静态库的成员文件列表 -d 从静态库中删除文件objfile -s 重置静态库文件索引 -v 创建文件冗余信息 -c 创建静态库文件 2. 调用三方库使用数学库libm 举例编译：gcc -g -...
MIT-6.S081实验二学习记录
2024-03-01 12:09

Jimmy_cd的博客 MIT6.S081实验二学习心得体会，包含缺失的trace.c和sysinfotest.c文件
syscall 系统调用陷入_入侵检测之syscall监控
2020-12-30 06:08

途大帅的博客 CK矩阵linux系统实践/命令监控4️⃣Linux入侵检测之文件监控5️⃣Linux入侵检测之syscall监控6️⃣linux入侵检测之应急响应0x01:Syscall简介内核提供用户空间程序与内核空间进行交互的一套标准接口，这些接口让用户...
mit6.s081 笔记
2023-11-03 12:20

辣条委员会会长的博客就可以通过里面的函数声明来调系统调用，其函数的具体实现由 user/usys.pl 脚本帮我们生成对应的汇编代码（具体代码查看user/usys.s文件），在汇编中该函数被声明为global，因此我们可以在c语言中直接调用该函数...
MIT 6.S081 系统调用(sys_trace)
2024-04-21 08:25

禾鬼的博客 2.在 kernel/sysproc.c 给出 sys_trace 函数的具体实现，此时a0寄存器存储的为系统调用的传递...8.在 kernel/syscall.c 中定义系统调用的名称，任务中需要打印。的格式打印进程号，系统调用名称，系统调用的返回值。
MIT6.S081中Systemcall的调用流程
2024-04-22 11:59

水更流的博客 MIT6.S081 项目中的许多实验都涉及到了系统调用（System Call）的使用，我在实验过程中往往依葫芦画瓢在原有的System Call基础上使用，没有真正理解System Call的流程，本篇笔记打算以sys_sleep为例子梳理一下Xv6...
MIT 6.S081 Lab1系统调用
2023-03-03 10:35

差不太多先生的博客然后设置当前进程的syscall_trace为该参数（需要在进程结构体中添加该项），同时为了能够保证真正地trace指定的系统调用，那么就不仅仅在该进程，同时也需要相关所有的子进程也传递该参数，就需要在fork.c函数中增加...
没有解决我的问题, 去提问

去syscall v.s. C系统调用

1条回答 默认 最新

1条回答默认最新