分布式工作同步和fork-join并行编程方法有什么区别

In this article on wikipedia: https://en.wikipedia.org/wiki/Go_(programming_language)#Suitability_for_parallel_programming

It is claimed that the go expert used a distribute-work-synchronize pattern to organize his parallel programs, while the non-expert used a fork-join: https://upload.wikimedia.org/wikipedia/commons/thumb/f/f1/Fork_join.svg/2000px-Fork_join.svg.png

I am familiar with the fork-join from school, but I was wondering what the distribute-work-synchronize pattern is, and what the differences are between it and the classical fork-join model that i'm familiar with?

When I do fork join I typically run as many threads as I do cores, but in the paper they say the go expert did this as well, and they briefly mention the overhead of creating new threads being one of the ways the go expert optimized the code, but don't seem to go into detail.

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
drmq16019 2016-03-12 10:56
关注
I would be very careful taking the statement you've quoted from https://en.wikipedia.org/wiki/Go_(programming_language)#Suitability_for_parallel_programming as a general truth for Go programming.

I assume that what's described in the study as distribute-work-synchronize is the approach of dividing a problem into subtasks, that are determined mainly by the parallelization that can be achieved in hardware, and less by the natural way the problem decomposes into smaller tasks.

This approach may give you some performance benefits depending on your specific problem and your expertise, but it may not be trivial to apply even for embarrassingly-parallel problems. In addition, this approach is more dependant on the specific hardware you're using (e.g. 2-32 vs 64-1024 cores, regular CAS vs LL/SC), on the specific problem size (e.g. fits in L1 vs barely fits in RAM), and most importantly, on your expertise with that approach and with the tool that you're using to solve your problem.

The above is standard "premature optimization is the root of all evil" / "simple, adequate and correct trumps complex, super-fast and with an insidious bug" advice, but the actual experiment cited in the paper also gives some examples why you should use your own judgment which approach to use.

The study used Go 1.0.3. The significant improvements done on the scheduling, garbage collection and goroutine/channel performance fronts may have effectively made the results obsolete.

1.1. Ironically, the paper mentions how for one of the Chapel solutions, expert version was ~68% slower when Chapel 1.6 (instead of 1.5) was used.

The study does not claim to provide statistically significant results - for each of the 4 platforms, a single non-expert solved 6 synthetic problems that fit a specific blueprint, then rewrote his solutions according to the advice of a single expert.

The expert should not be held responsible for applying his advice outside of the specific context. distribute-work-synchronize was the better approach for these specific problems, if you are a Staff Software Engineer on the Golang team (Luuk van Dijk), and your alternative was using plain divide-and-conquer with Go 1.0.3.

When I do fork join I typically run as many threads as I do cores, but in the paper they say the go expert did this as well, and they briefly mention the overhead of creating new threads being one of the ways the go expert optimized the code, but don't seem to go into detail.

I'm assuming the overhead of creating new threads is related to the proliferation of tasks when approaching the bottom of the recursion tree.

I would think that algorithms that fall into Case 2 and 3 of the Master Theorem will be particularly affected, but even algorithms that fall in Case 1 (where the work done on the leaf level of the tree is most significant, i.e. the overhead of the spawned threads is diluted the most) will be affected. For example, making a new logical task for the comparison of each pair of elements in a top-down merge sort is probably redundant.

IMHO running as many threads as you do cores, but logically dividing the work naturally/on each recursion level is a great compromise between my understanding for distribute-work-synchronize and a plain divide-and-conquer.

You are still paying some price in complexity (and probably, but not necessarily, in running time) for the scheduling of your N tasks on the K worker threads. It is possible for this price to cost you in missed parallelization opportunities at runtime, for example because of cache thrashing, because of sub-optmal scheduling, because of thread or core affinity in play, etc. However this may be abstracted away from your program at the language platform level, or at the Fork-Join framework level (in case you are using a framework), or maybe at the OS level. In this case, your solution is not fully responsible for adapting to changes in your language platform, in your problem size, or in your machine hardware, and you should be able to benefit from optimizations in the underlying layers without touching your solution.

My bet is that the increased complexity and the reduced portability of a custom-tailored distribute-work-synchronize solution is worth it only if you can prove that your natural divide-and-conquer solution is not enough and you are aware of the consequences.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

分布式工作同步和fork-join并行编程方法有什么区别
2016-03-12 03:24

回答 1 已采纳 I would be very careful taking the statement you've quoted from https://en.wikipedia.org/wiki/Go_(
Java进阶串行循环计算、Executors框架和Fork-Join框架三种方法 java
2020-04-15 10:46

回答 1 已采纳 ``` import java.util.Arrays; import java.util.Random; import java.util.concurrent.*; import ja
复刻（fork）、分支（branch）和克隆（clone）之间有什么区别 git linux
2022-08-19 23:07

回答 1 已采纳连百度都不肯，我直接喂你吃饭得了 https://blog.csdn.net/weixin_44475093/article/details/119583643
Java并行流Parallel Stream与Fork-Join线程池
2021-08-22 23:53

njitzyd的博客在Fork-Join中，比如一个拥有4个线程的ForkJoinPool线程池，有一个任务队列，一个大的任务切分出的子任务会提交到线程池的任务队列中，4个线程从任务队列中获取任务执行，哪个线程执行的任务快，哪个线程执行的任务...
关于jbpm4.3 fork 和join传参数的问题
2010-03-22 18:29

回答 1 已采纳问题一：需要指定结束任务的outcome taskService.completeTask(taskId, outcome, map); 问题二：设置join节点的multiplicity=
使用Fork/Join重构代码后，生产环境和重构前效果一样阿里云
2018-07-31 02:11

回答 1 已采纳这样比较没有多大意义, 最好试一下重构前在阿里云上的时间. 因为不同机器, 参数也可能不一样(内存, 磁盘IO, ＪＤＫ版本等等）．
git fork分支会自动同步原分支的更新吗？其他
2021-04-09 15:39

回答 1 已采纳不会自动同步
Java：Java中的Fork/Join框架的并行编程基础
2022-05-27 16:04

sixstar_996的博客随着近年来多核 CPU 的出现，并行编程是充分利用新处理工作资源的方式。并行编程是指由于多个处理核心的可用性，进程的并发执行。从本质上讲，与线性单核执行甚至多线程相比，这会极大地提高程序的性能和效率。Fork/...
Golang中的并行处理
2014-08-03 16:06

回答 4 已采纳 Your code will run concurrently, but not in parallel. You can make it run in parallel by setting G
云服务器安装mongo 出现的 --fork问题 mongodb 腾讯云
2021-09-01 04:01

回答 1 已采纳加上 --fork
git branch -r 没有显示别人fork我形成的分支 git github
2018-10-30 05:27

回答 1 已采纳别人fork你的仓库不会形成你的仓库分支。他们可以发起pull request，然后你审查改动，认为无误即可从他们那里merge。
Fork-Join框架
2017-08-19 22:26

JavaEdge.的博客在JDK1.7引入了一种新的并行编程模式“fork-join”，它是实现了“分而治之”思想的Java并发编程框架。网上关于此框架的各种介绍很多，本文从框架特点出发，通过几个例子来进行实用性的介绍。1 fork-join框架的特点...
实现cp功能另外实现ls和实现cp的功能有什么区别新手哦
2017-10-25 02:10

回答 1 已采纳 cp是拷贝文件，ls是列出文件。你要的程序参考：http://www.mamicode.com/info-detail-1269711.html
【并行与分布式计算】第二章--线程级并行：OpenMP编程
2023-02-18 11:05

耿耿于怀1762616314的博客并行与分布式计算第二章
6.跑步者--并行编程框架 ForkJoin
2015-07-16 08:55

weixin_30409849的博客虽说是Java的ForkJoin并行框架。但不要太在意Java，当中的思想在其他语言环境也是相同适用的。由于并发编程在本质上是一样的。就好像怎样找到优秀的Ruby程序猿？事实上要找的仅仅是一个优秀的程序猿。当然，假设语言...
没有解决我的问题, 去提问

悬赏问题

¥15 关于#hadoop#的问题
¥15 (标签-Python|关键词-socket)
¥15 keil里为什么main.c定义的函数在it.c调用不了
¥50 切换TabTip键盘的输入法
¥15 可否在不同线程中调用封装数据库操作的类
¥15 微带串馈天线阵列每个阵元宽度计算
¥15 keil的map文件中Image component sizes各项意思
¥20 求个正点原子stm32f407开发版的贪吃蛇游戏
¥15 划分vlan后，链路不通了？
¥20 求各位懂行的人，注册表能不能看到usb使用得具体信息，干了什么，传输了什么数据

分布式工作同步和fork-join并行编程方法有什么区别

1条回答 默认 最新

悬赏问题

1条回答默认最新