为什么“ MOVQ 0x30（SP），DX”比较慢？

Please see the following pprof session. In the treesort.add, line 42, there's an int comparison. I think it accounts for 64% of all cpu time. In disasm the operation is "MOVQ 0x30(SP), DX". Why is it so slow?

File: treesort_bench.test.exe
Type: cpu
Time: Sep 7, 2018 at 3:15pm (EDT)
Duration: 2.60s, Total samples = 2.43s (93.44%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10
Showing nodes accounting for 2.41s, 99.18% of 2.43s total
Dropped 2 nodes (cum <= 0.01s)
      flat  flat%   sum%        cum   cum%
     2.40s 98.77% 98.77%      2.42s 99.59%  gopl.io/ch4/treesort.add
     0.01s  0.41% 99.18%      0.02s  0.82%  runtime.mallocgc
         0     0% 99.18%      0.26s 10.70%  gopl.io/ch4/treesort.Sort
         0     0% 99.18%      0.25s 10.29%  gopl.io/ch4/treesort_bench.BenchmarkSort
         0     0% 99.18%      0.26s 10.70%  gopl.io/ch4/treesort_bench.run
         0     0% 99.18%      0.02s  0.82%  runtime.newobject
         0     0% 99.18%      0.22s  9.05%  testing.(*B).launch
         0     0% 99.18%      0.02s  0.82%  testing.(*B).run1.func1
         0     0% 99.18%      0.25s 10.29%  testing.(*B).runN
(pprof) list add
Total: 2.43s
ROUTINE ======================== gopl.io/ch4/treesort.add in go\src\gopl.io\ch4\treesort\sort.go
     2.40s      4.45s (flat, cum) 183.13% of Total
         .          .     30:           values = appendValues(values, t.right)
         .          .     31:   }
         .          .     32:   return values
         .          .     33:}
         .          .     34:
      90ms       90ms     35:func add(t *tree, value int) *tree {
         .          .     36:   if t == nil {
         .          .     37:           // Equivalent to return &tree{value: value}.
         .       20ms     38:           t = new(tree)
         .          .     39:           t.value = value
         .          .     40:           return t
         .          .     41:   }
     1.55s      1.55s     42:   flag := value < t.value
         .          .     43:   if flag {
         .      240ms     44:           t.left = add(t.left, value)
         .          .     45:   } else {
     630ms      2.42s     46:           t.right = add(t.right, value)
         .          .     47:   }
     130ms      130ms     48:   return t
         .          .     49:}
         .          .     50:
         .          .     51://!-
(pprof) disasm add
Total: 2.43s
ROUTINE ======================== gopl.io/ch4/treesort.add
     2.40s      5.08s (flat, cum) 209.05% of Total
      50ms       50ms     4fcb66: MOVQ 0(AX), CX                          ;gopl.io/ch4/treesort.add sort.go:42
     1.48s      1.48s     4fcb69: MOVQ 0x30(SP), DX
      20ms       20ms     4fcb6e: CMPQ CX, DX
         .          .     4fcb71: JGE 0x4fcbbb                            ;sort.go:43

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
dounao4179 2018-09-08 02:58
关注
Why is “MOVQ 0x30(SP), DX” slow?

You have provided insufficient evidence to show that the instruction is slow.

MOVQ — Move Quadword - is an instruction from the Intel 64 and IA-32 architectures instruction set. See Intel® 64 and IA-32 Architectures Software Developer Manuals

The MOVQ 0x30(SP), DX instruction moves the 8 bytes of a tree.value variable from memory to the DX register.

Performance measurement, like any other scientific endeavor, relies on reproducible results. You have provided insufficient information to reproduce your results. For example, where is the code for treesort_bench.test.exe, what processor, what memory, what operating system?.

I've tried, but I'm unable to reproduce your results. Add your code and the steps to reproduce your results to your question.

本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

为什么“ MOVQ 0x30（SP），DX”比较慢？
2018-09-07 19:33

回答 1 已采纳 Why is “MOVQ 0x30(SP), DX” slow? You have provided insufficient evidence to show that the ins
将Go汇编程序翻译为NASM
2017-12-21 21:11

回答 2 已采纳 Basic docs about Go's asm: https://golang.org/doc/asm. It's not totally equivalent to NASM or AT&
Golang atomic.LoadUint32是否必要？
2017-10-04 04:07

回答 1 已采纳 which basically load the value from the memory address and return it. That is the case in you
slice是什么时候决定要扩张？
2019-04-16 08:58

轩脉刃的博客 slice是什么时候决定要扩张？ 2019-04-16 08:58 by 轩脉刃, ... 阅读, ... 评论, 收藏, 编辑 slice是什么时候决定要扩张？网上说slice的文章已经很多了，大都已经把slice的内存扩张原理都说清楚了。但是是...
Golang是这样执行多重分配的吗？
2016-03-11 23:35

回答 2 已采纳 a, b := 10, 5 b, a = a, b 0x0028 00040 (swap.go:10) MOVQ $10, CX ; CX = 10 0x002f 0004
阅读内联汇编时遇到了一些问题 c语言 linux 系统架构
2023-01-24 13:51

回答 3 已采纳一：&修饰符在这里表示该操作数既是输入也是输出，它可以保证该操作数在输入和输出之间的值相同。二："=&a"(__d0),"=&d"(__d1) 表示 __d0 和 __d1 是输出操作数，并且它们的值
如何在汇编中的结构指针上定义函数？
2014-09-22 03:40

回答 1 已采纳 This is actually not possible with the current toolchain. The context is explained in issue 4978
Golang写时复制是否是原子性的？
2022-07-07 21:45

衣舞晨风的博客 MOVQ movb(8位)、movw(16位)、movl(32位)、movq(64位)寄存器寻址： https://blog.csdn.net/luoyhang003/article/details/46786591/TEST指令的行为与AND指令一样，除了不改变目的寄存器的值。例如，testq %rax, %...
如何解决函数调用中更改指针值的问题？这是CGO的错误吗？
2015-08-10 18:49

回答 1 已采纳 This is due to a bug in Go 1.4. Fixed in 1.5. https://golang.org/issue/10303
从go二进制文件中的TEXT指令中删除文件路径
2017-07-24 11:25

回答 2 已采纳 Use -trimpath flags to remove path information: CGO_ENABLED=0 go build -v -a -ldflags="-w -s" \
golang asm代码中未知的“ ptr”变量
2019-02-15 04:36

回答 1 已采纳 A Quick Guide to Go's Assembler Symbols The FP pseudo-register is a virtual frame poi
抢占系统调用执行时间过长的goroutine（22）
2021-04-01 20:25

shankusu2017的博客 retake函数发现如果需要抢占，则通过使用cas修改p的状态来获取p的使用权（为什么需要使用cas呢？从后面的分析我们将知道，工作线程此时此刻可能正好从系统调用返回了，也正在获取p的使用权），如果使用权获取成功则...
汇编语言跟C语言的简单转化 c++ c语言
2018-12-23 16:43

回答 1 已采纳 z = y - z; y = x * y; int rax = y; rax = rax << 63; rax = rax << 63; rax = rax ^ 63
怎么查看Go中全局变量存储的位置？
2021-11-07 16:15

1024-iot-SaltIce的博客创建main.go package main import "fmt" var xxx = 0xEE // 通过值来推断类型 var yyy int = 0xFF // 定义类型，并... xxx = 0x11 yyy = 0x22 zzz = 0x33 k := 10 ddd = &k fmt.Println(xxx, yyy, zzz) }
忠于职守 —— sysmon 线程到底做了什么？（九）
2019-09-06 08:20

qcrao的博客 MOVQ $0, DX MOVQ $0, R10 // 将 mp，gp，fn 拷贝到寄存器，对子线程可见 MOVQ mp+16(FP), R8 MOVQ gp+24(FP), R9 MOVQ fn+32(FP), R12 // 系统调用 clone MOVL $56, AX SYSCALL // In ...
Go 语言中的变量究竟是分配在栈上、还是分配在堆上？
2023-03-04 23:18

知其黑、受其白的博客 6) ADDQ DX, CX 0x0037 00055 (.\main.go:6) MOVQ CX, (AX) 0x003a 00058 (.\main.go:7) MOVQ 16(SP), BP 0x003f 00063 (.\main.go:7) ADDQ $24, SP 0x0043 00067 (.\main.go:7) RET 0x0044 00068 (.\main.go:7) NOP...
linux后缀asok是什么意思,Pwn In Kernel（一）：基础知识
2021-05-18 05:17

weixin_39887926的博客 Kernel Pwn In CTF简单分析一下 CTF Kernel Pwn 题目的形式，以 2017 CISCN babydrive 为例。先对文件包解压➜ example lsbabydriver.tar➜ example file babydriver.tarbabydriver.tar: POSIX tar archive➜ ...
Go究竟是否为空切片分配了底层数组
2022-02-15 22:22

Tony Bai的博客  "".sl+80(SP) 0x0057 00087 (layout7.go:5) MOVQ DX, "".sl+88(SP) 0x005c 00092 (layout7.go:5) MOVQ CX, "".sl+96(SP) 0x0061 00097 (layout7.go:6) MOVQ 104(SP), BP 0x0066 00102 (layout7....
【golang】不深入的虐一虐defer
2021-02-08 11:24

DDDDemo的博客原本想写一篇关于defer的...main.go:10 0x4cbcf6 48c78424f000000001000000 MOVQ $0x1, 0xf0(SP) main.go:10 0x4cbd02 c744243030000000 MOVL $0x30, 0x30(SP) main.go:10 0x4cbd0a 488d057f540400 LEAQ go.func.*+15...
你真的了解 defer 吗？(二)
2022-04-08 09:57

魏小言的博客不知道大家是否会觉得很奇怪，deferproc 明明只会隐性的返回 0 值，但为什么上面的 f() 函数在调用了 deferproc 之后还用了一条指令来判断返回值是否是 0 呢，这不多此一举吗？事实上这里主要与 panic 和 recover ...
没有解决我的问题, 去提问

悬赏问题

¥50 comsol稳态求解器找不到解，奇异矩阵有1个空方程返回的解不收敛。没有返回所有参数步长；pid控制
¥15 怎么让wx群机器人发送音乐
¥15 fesafe材料库问题
¥35 beats蓝牙耳机怎么查看日志
¥15 Fluent齿轮搅油
¥15 八爪鱼爬数据为什么自己停了
¥15 交替优化波束形成和ris反射角使保密速率最大化
¥15 树莓派与pix飞控通信
¥15 自动转发微信群信息到另外一个微信群
¥15 outlook无法配置成功

为什么“ MOVQ 0x30（SP），DX”比较慢？

1条回答 默认 最新

悬赏问题

1条回答默认最新