dpka7974 2016-03-12 03:24
浏览 89
已采纳

分布式工作同步和fork-join并行编程方法有什么区别

In this article on wikipedia: https://en.wikipedia.org/wiki/Go_(programming_language)#Suitability_for_parallel_programming

It is claimed that the go expert used a distribute-work-synchronize pattern to organize his parallel programs, while the non-expert used a fork-join: https://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Fork_join.svg/2000px-Fork_join.svg.png

I am familiar with the fork-join from school, but I was wondering what the distribute-work-synchronize pattern is, and what the differences are between it and the classical fork-join model that i'm familiar with?

When I do fork join I typically run as many threads as I do cores, but in the paper they say the go expert did this as well, and they briefly mention the overhead of creating new threads being one of the ways the go expert optimized the code, but don't seem to go into detail.

  • 写回答

1条回答 默认 最新

  • drmq16019 2016-03-12 10:56
    关注

    I would be very careful taking the statement you've quoted from https://en.wikipedia.org/wiki/Go_(programming_language)#Suitability_for_parallel_programming as a general truth for Go programming.

    I assume that what's described in the study as distribute-work-synchronize is the approach of dividing a problem into subtasks, that are determined mainly by the parallelization that can be achieved in hardware, and less by the natural way the problem decomposes into smaller tasks.

    This approach may give you some performance benefits depending on your specific problem and your expertise, but it may not be trivial to apply even for embarrassingly-parallel problems. In addition, this approach is more dependant on the specific hardware you're using (e.g. 2-32 vs 64-1024 cores, regular CAS vs LL/SC), on the specific problem size (e.g. fits in L1 vs barely fits in RAM), and most importantly, on your expertise with that approach and with the tool that you're using to solve your problem.

    The above is standard "premature optimization is the root of all evil" / "simple, adequate and correct trumps complex, super-fast and with an insidious bug" advice, but the actual experiment cited in the paper also gives some examples why you should use your own judgment which approach to use.

    1. The study used Go 1.0.3. The significant improvements done on the scheduling, garbage collection and goroutine/channel performance fronts may have effectively made the results obsolete.

      1.1. Ironically, the paper mentions how for one of the Chapel solutions, expert version was ~68% slower when Chapel 1.6 (instead of 1.5) was used.

    2. The study does not claim to provide statistically significant results - for each of the 4 platforms, a single non-expert solved 6 synthetic problems that fit a specific blueprint, then rewrote his solutions according to the advice of a single expert.

    3. The expert should not be held responsible for applying his advice outside of the specific context. distribute-work-synchronize was the better approach for these specific problems, if you are a Staff Software Engineer on the Golang team (Luuk van Dijk), and your alternative was using plain divide-and-conquer with Go 1.0.3.

    When I do fork join I typically run as many threads as I do cores, but in the paper they say the go expert did this as well, and they briefly mention the overhead of creating new threads being one of the ways the go expert optimized the code, but don't seem to go into detail.

    I'm assuming the overhead of creating new threads is related to the proliferation of tasks when approaching the bottom of the recursion tree.

    I would think that algorithms that fall into Case 2 and 3 of the Master Theorem will be particularly affected, but even algorithms that fall in Case 1 (where the work done on the leaf level of the tree is most significant, i.e. the overhead of the spawned threads is diluted the most) will be affected. For example, making a new logical task for the comparison of each pair of elements in a top-down merge sort is probably redundant.

    IMHO running as many threads as you do cores, but logically dividing the work naturally/on each recursion level is a great compromise between my understanding for distribute-work-synchronize and a plain divide-and-conquer.

    You are still paying some price in complexity (and probably, but not necessarily, in running time) for the scheduling of your N tasks on the K worker threads. It is possible for this price to cost you in missed parallelization opportunities at runtime, for example because of cache thrashing, because of sub-optmal scheduling, because of thread or core affinity in play, etc. However this may be abstracted away from your program at the language platform level, or at the Fork-Join framework level (in case you are using a framework), or maybe at the OS level. In this case, your solution is not fully responsible for adapting to changes in your language platform, in your problem size, or in your machine hardware, and you should be able to benefit from optimizations in the underlying layers without touching your solution.

    My bet is that the increased complexity and the reduced portability of a custom-tailored distribute-work-synchronize solution is worth it only if you can prove that your natural divide-and-conquer solution is not enough and you are aware of the consequences.

    本回答被题主选为最佳回答 , 对您是否有帮助呢?
    评论

报告相同问题?

悬赏问题

  • ¥15 关于#hadoop#的问题
  • ¥15 (标签-Python|关键词-socket)
  • ¥15 keil里为什么main.c定义的函数在it.c调用不了
  • ¥50 切换TabTip键盘的输入法
  • ¥15 可否在不同线程中调用封装数据库操作的类
  • ¥15 微带串馈天线阵列每个阵元宽度计算
  • ¥15 keil的map文件中Image component sizes各项意思
  • ¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
  • ¥15 划分vlan后,链路不通了?
  • ¥20 求各位懂行的人,注册表能不能看到usb使用得具体信息,干了什么,传输了什么数据