用于潜在递归任务的工作池（即，每个作业可以排队其他作业）

I'm writing an application that the user can start with a number of "jobs" (URLs actually). At the beginning (main routine), I add these URLs to a queue, then start x goroutines that work on these URLs.

In special cases, the resource a URL points to may contain even more URLs which have to be added to the queue. The 3 workers are waiting for new jobs to come in and process them. The problem is: once EVERY worker is waiting for a job (and none is producing any), the workers should stop altogether. So either all of them work or no one works.

My current implementation looks something like this and I don't think it's elegant. Unfortunately I couldn't think of a better way that wouldn't include race conditions and I'm not entirely sure if this implementation actually works as intended:

var queue // from somewhere
const WORKER_COUNT = 3
var done chan struct{}

func work(working chan int) {
  absent := make(chan struct{}, 1)
  // if x>1 jobs in sequence are popped, send to "absent" channel only 1 struct.
  // This implementation also assumes that the select statement will be evaluated "in-order" (channel 2 only if channel 1 yields nothing) - is this actually correct? EDIT: It is, according to the specs.
  one := false
  for {
    select {
    case u, ok := <-queue.Pop():
      if !ok {
        close(absent)
        return
      }
      if !one {
        // I have started working (delta + 1)
        working <- 1
        absent <- struct{}{}
        one = true
      }
      // do work with u (which may lead to queue.Push(urls...))
    case <-absent: // no jobs at the moment. consume absent => wait
      one = false
      working <- -1
    }
  }
}

func Start() {
  working := make(chan int)
  for i := 0; i < WORKER_COUNT; i++ {
    go work(working)
  }
  // the amount of actually working workers...
  sum := 0
  for {
    delta := <-working
    sum += delta
    if sum == 0 {
      queue.Close() // close channel -> kill workers.
      done <- struct{}{}
      return
    }
  }
}

Is there a better way to tackle this problem?

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
duandianwen1723 2015-04-13 23:17
关注
You can use a sync.WaitGroup (see docs) to control the lifetime of the workers, and use a non-blocking send so workers can't deadlock when they try to queue up more jobs:

package main import "sync" const workers = 4 type job struct{} func (j *job) do(enqueue func(job)) { // do the job, calling enqueue() for subtasks as needed } func main() { jobs, wg := make(chan job), new(sync.WaitGroup) var enqueue func(job) // workers for i := 0; i < workers; i++ { go func() { for j := range jobs { j.do(enqueue) wg.Done() } }() } // how to queue a job enqueue = func(j job) { wg.Add(1) select { case jobs <- j: // another worker took it default: // no free worker; do the job now j.do(enqueue) wg.Done() } } todo := make([]job, 1000) for _, j := range todo { enqueue(j) } wg.Wait() close(jobs) }

It might appear as though buffering the jobs channel would prevent deadlocks adding jobs, but it wouldn't: the buffer could fill up, and then you're back where you started. Buffering is fine, and can be efficient in some scenarios; it's just not necessary or sufficient to prevent a deadlock.

I ran into this situation in this function to kick off a parallel sort that's recursive in the same way your URL fetching is. It's probably harder to read than the example above, since details specific to sorting--like treating small tasks differently from big ones--are mixed in.
本回答被题主选为最佳回答 , 对您是否有帮助呢?

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

java 工作池,潜在递归任务的工作池（即每个作业可以排队其他作业）
2021-04-22 16:44

缺萌的博客我正在编写一个应用程序，用户可以从许多“作业”(实际上是URL)开始 . 在开始(主例程)时，我将这些URL添加到队列中，然后... 问题是：一旦每个 Worker 都在等待工作(并且没有人正在生产环境任何工作)， Worker ...
为什么你的量子作业总失败？Azure CLI日志告诉你真相
2025-12-17 16:12

varchat的博客快速定位量子作业失败原因，掌握Azure CLI量子作业的日志分析方法。适用于Azure Quantum平台上的任务调试，通过命令行提取详细日志，精准识别执行错误与资源瓶颈。高效、直接、无需图形界面，提升排错效率，值得收藏...
18、排队系统性能分析：从基础概念到多负载建模
2025-08-10 10:08

postgres8guard的博客本文深入探讨了闭环排队系统的性能分析方法，包括基础概念、均值分析（MVA）算法、近似解法、访问比率与路由概率的关系，以及多负载建模的应用。通过理论与示例结合，详细解析了如何使用MVA算法进行系统吞吐量和响应...
AutoGPT任务优先级调度算法初探
2025-12-15 06:06

叶深深的博客本文深入探讨AutoGPT的核心组件——任务优先级调度器，揭示其如何通过动态评分、依赖建模与防环机制实现自主决策。调度器基于紧迫性、影响力、依赖深度和执行成本等维度进行实时价值判断，支持AI代理在复杂环境中...
嵌入式工程师必看：C语言周期任务调度的4个致命陷阱及规避方法
2025-12-12 14:53

Instrulink的博客掌握工业C的周期任务调度关键技巧，避开常见设计陷阱。本文详解实时系统中的任务同步、优先级配置与资源竞争解决方案，提升嵌入式系统稳定性与响应效率。适用于工业控制、传感器采集等场景，嵌入式开发者值得收藏。
23、安全关键型Java：任务方法解析
2025-08-24 09:47

web99的博客本文探讨了安全关键型Java（SCJ）的任务方法解析，重点介绍了SCJ的设计理念、作用域内存管理机制及其任务模型。文章还比较了不同语言和标准（如C/CCC、Ada、Ravenscar Ada、SPARK Ada）在安全关键型开发中的特点，并...
54、对抗式排队中计时的必要性
2025-10-23 04:57

pepper的博客本文探讨了对抗式排队中计时的必要性，分析了数据包插入方案导致队列指数增长的现象，并深入研究了WTS排队策略的稳定性。通过引入推绕循环的概念，建立了策略不稳定性与循环结构之间的联系，证明了防止推绕循环可...
嵌入式编程中五个必探的“潜在错误”
2022-07-03 22:08

李肖遥的博客关注、星标公众号，直达精彩内容来源：网络素材在嵌入式开发软件中查找和消除潜在的错误是一项艰巨的任务。通常需要英勇的努力和昂贵的工具才能从观察到的崩溃，死机或其他计划外的运行时行为追溯到根本原因。在最坏...
【C++并发编程必修课】：如何设计一个无锁高效的线程池任务队列？
2025-11-10 13:54

SimTrans的博客掌握高性能并发编程核心，本文深入解析C++线程池的任务队列设计原理，涵盖无锁队列实现、多线程安全调度与任务批量处理技术，适用于高并发服务器与实时系统开发，提升性能显著，值得收藏。
这100个网络基础知识一定要收藏，网络工程师必知！
2024-11-19 18:09

wljslmz的博客网络是由多个节点（如计算机、打印机、路由器等）通过物理或逻辑连接组成的系统，用于数据的传输和共享。这些节点可以通过有线（如以太网）或无线（如Wi-Fi）方式进行连接。工作原理数据传输：网络中的设备通过发送...
没有解决我的问题, 去提问

用于潜在递归任务的工作池（即，每个作业可以排队其他作业）

1条回答 默认 最新

1条回答默认最新