dounuo7954
dounuo7954
2018-09-07 19:33

为什么“ MOVQ 0x30(SP),DX”比较慢?

已采纳

Please see the following pprof session. In the treesort.add, line 42, there's an int comparison. I think it accounts for 64% of all cpu time. In disasm the operation is "MOVQ 0x30(SP), DX". Why is it so slow?

File: treesort_bench.test.exe
Type: cpu
Time: Sep 7, 2018 at 3:15pm (EDT)
Duration: 2.60s, Total samples = 2.43s (93.44%)
Entering interactive mode (type "help" for commands, "o" for options)
(pprof) top 10
Showing nodes accounting for 2.41s, 99.18% of 2.43s total
Dropped 2 nodes (cum <= 0.01s)
      flat  flat%   sum%        cum   cum%
     2.40s 98.77% 98.77%      2.42s 99.59%  gopl.io/ch4/treesort.add
     0.01s  0.41% 99.18%      0.02s  0.82%  runtime.mallocgc
         0     0% 99.18%      0.26s 10.70%  gopl.io/ch4/treesort.Sort
         0     0% 99.18%      0.25s 10.29%  gopl.io/ch4/treesort_bench.BenchmarkSort
         0     0% 99.18%      0.26s 10.70%  gopl.io/ch4/treesort_bench.run
         0     0% 99.18%      0.02s  0.82%  runtime.newobject
         0     0% 99.18%      0.22s  9.05%  testing.(*B).launch
         0     0% 99.18%      0.02s  0.82%  testing.(*B).run1.func1
         0     0% 99.18%      0.25s 10.29%  testing.(*B).runN
(pprof) list add
Total: 2.43s
ROUTINE ======================== gopl.io/ch4/treesort.add in go\src\gopl.io\ch4\treesort\sort.go
     2.40s      4.45s (flat, cum) 183.13% of Total
         .          .     30:           values = appendValues(values, t.right)
         .          .     31:   }
         .          .     32:   return values
         .          .     33:}
         .          .     34:
      90ms       90ms     35:func add(t *tree, value int) *tree {
         .          .     36:   if t == nil {
         .          .     37:           // Equivalent to return &tree{value: value}.
         .       20ms     38:           t = new(tree)
         .          .     39:           t.value = value
         .          .     40:           return t
         .          .     41:   }
     1.55s      1.55s     42:   flag := value < t.value
         .          .     43:   if flag {
         .      240ms     44:           t.left = add(t.left, value)
         .          .     45:   } else {
     630ms      2.42s     46:           t.right = add(t.right, value)
         .          .     47:   }
     130ms      130ms     48:   return t
         .          .     49:}
         .          .     50:
         .          .     51://!-
(pprof) disasm add
Total: 2.43s
ROUTINE ======================== gopl.io/ch4/treesort.add
     2.40s      5.08s (flat, cum) 209.05% of Total
      50ms       50ms     4fcb66: MOVQ 0(AX), CX                          ;gopl.io/ch4/treesort.add sort.go:42
     1.48s      1.48s     4fcb69: MOVQ 0x30(SP), DX
      20ms       20ms     4fcb6e: CMPQ CX, DX
         .          .     4fcb71: JGE 0x4fcbbb                            ;sort.go:43
  • 点赞
  • 写回答
  • 关注问题
  • 收藏
  • 复制链接分享
  • 邀请回答

1条回答

  • dounao4179 dounao4179 3年前

    Why is “MOVQ 0x30(SP), DX” slow?

    You have provided insufficient evidence to show that the instruction is slow.


    MOVQ — Move Quadword - is an instruction from the Intel 64 and IA-32 architectures instruction set. See Intel® 64 and IA-32 Architectures Software Developer Manuals

    The MOVQ 0x30(SP), DX instruction moves the 8 bytes of a tree.value variable from memory to the DX register.


    Performance measurement, like any other scientific endeavor, relies on reproducible results. You have provided insufficient information to reproduce your results. For example, where is the code for treesort_bench.test.exe, what processor, what memory, what operating system?.

    I've tried, but I'm unable to reproduce your results. Add your code and the steps to reproduce your results to your question.

    点赞 评论 复制链接分享

相关推荐