The same RecvTensor (GrpcWorker) request was received twice

tensorflow1.15.5+python3.8/3.7训练深度强化学习算法时报错
该算法采用多进程分布式训练架构，包含1个ps，两个worker，在其中一个worker训练网络时每当遇到第2次
sess.run()会话运行时就报错，报错信息为


tensorflow.python.framework.errors_impl.AbortedError: From /job:train/replica:0/task:0:
The same RecvTensor (GrpcWorker) request was received twice. step_id: 105411384561817065 rendezvous_key: "/job:ps/replica:0/task:0/device:GPU:0;9d0efc4e4612caec;/job:train/replica:0/task:0/device:GPU:0;edge_206_pred_0/d1/bias/read;0:0" request_id: 7357696461822534118
Additional GRPC error information:
{"created":"@1686189090.458307545","description":"Error received from peer","file":"external/grpc/src/core/lib/surface/call.cc","file_line":1039,"grpc_message":"The same RecvTensor (GrpcWorker) request was received twice. step_id: 105411384561817065 rendezvous_key: "/job:ps/replica:0/task:0/device:GPU:0;9d0efc4e4612caec;/job:train/replica:0/task:0/device:GPU:0;edge_206_pred_0/d1/bias/read;0:0" request_id: 7357696461822534118","grpc_status":10}
     [[{{node pred_0/d1/bias/read}}]]

算法详细架构见https://github.com/mrahtz/learning-from-human-preferences

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

2条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
Mebius· 2023-06-13 03:10
关注
这个错误通常表示在多进程分布式训练架构中，两个工作节点（worker）之间的通信出现了问题。具体来说，错误信息中指出了相同的RecvTensor请求被接收了两次，这导致了AbortedError错误。

这种错误可能有几种可能的原因：

网络通信问题：可能是由于网络问题或通信中断导致的。你可以检查网络连接是否正常，确保所有的节点可以正常通信。

TensorFlow版本不兼容：TensorFlow 1.15.5和Python 3.8/3.7的组合可能存在兼容性问题。你可以尝试降低TensorFlow的版本，或者升级Python版本以查看是否可以解决问题。建议使用TensorFlow官方推荐的版本与Python版本进行配合使用。

训练代码中的错误：错误可能源于你的训练代码中。你可以仔细检查训练代码，确保正确处理了多进程分布式训练的设置和通信。

以下是一些可能的解决方案：

确保网络连接正常，并尝试重启相关的节点和服务。

更新或降低TensorFlow版本，以及升级或降级Python版本，以获得更好的兼容性。

仔细检查训练代码，确保正确设置了多进程分布式训练，并处理了相关的通信。

尝试在单个工作节点上运行代码，以排除分布式训练造成的问题。
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

a project model for the FreeBSD Project.7z
2019-08-21 22:40

The total number of committers at that time was 269. Committers fall into three groups: committers who are only concerned with one area of the project (for instance file systems), committers who are...
An existing connection was forcibly closed by the remote host
2017-08-22 10:39

weixin_30604651的博客 https://stackoverflow.com/questions/5420656/unable-to-read-data-from-the-transport-connection-an-existing-connection-was-f https://briancaos.wordpress.com/2012/07/06/unable-to-re...
CEF框架：各种各样的Handle(三)——拦截Http的请求与响应
2024-05-24 09:06

新兴AI民工的博客 If this callback returns NULL the same // method will be called on the associated CefRequestContextHandler, if any. /// /*--cef(optional_param=request_initiator)--*/ virtual CefRefPtr ...
俄罗斯计算机水平_从四点到三十二点。俄罗斯计算机和网络的早期
2020-09-07 09:24

cullen2012的博客俄罗斯计算机水平第一部分：从四到八 (Part I: ... I like to read the memoirs of people who observed the computers taking first steps in their countries. They always have something romantic about them. ...
AntiPHPatterns和AntiPHPractices
2020-07-20 05:09

cunchi8090的博客始终要求任何编程语言告诉您所有错误！ 5. Suppressing Messages with @ 5.用@禁止显示消息 Whenever there is a reason to suppress a warning message, there is an even greater reason to find out...
SitePoint Podcast＃37：社交媒体：坏事与丑陋
2020-08-08 14:55

culi3118的博客 Episode 37 of The SitePoint Podcast is now available! This week, your hosts Patrick O’Keefe (@ifroggy), Stephan Segraves (@ssegraves), Brad Williams (@williamsba) and Kevin Yank (@sentience) discuss ...
电报注册网络代理_如何在电报开放网络（TON）中开发和发布智能合约
2020-09-05 23:36

cullen2012的博客 (What is this article about?...In this article, I will tell about my participation in the first (out of two so far) Telegram blockchain contest. I didn't win any prize. However, de...
aws python库_适用于Alexa的新AWS Python SDK入门指南
2020-07-27 16:34

cumi6497的博客 aws python库by Ralu Bolovan ... 适用于Alexa的新AWS Python SDK入门指南 (A Beginner’s guide to the new AWS Python SDK for Alexa) Amazon Web Services (AWS) recently added a new Python SDK to their Alex...
python 异步数据库_异步Python和数据库
2020-07-13 04:15

cumei1658的博客 python 异步数据库The asynchronous programming topic is difficult to cover. These days, it’s not just about one thing, and I’m mostly an outsider to it. However, because I deal a lot with relational ...
python 异步io_Python中的异步IO：完整的演练
2020-07-15 01:15

cumei1658的博客 python 异步ioAsync IO is a concurrent programming design that has received dedicated support in Python, evolving rapidly from Python 3.4 through 3.7, and probably beyond. 异步IO是一种并发编程设计，已...
macos c#_如何在Linux和macOS下工作的C＃项目中查找错误
2020-09-08 06:51

cullen2012的博客 PVS-Studio is a well-known static code analyzer that allows you to find a lot of tricky errors hidden in the source code. Beta testing of the new version has recently finished. It provides t...
我如何为我的第一个自由客户构建第一个React Native应用程序
2020-08-06 07:30

cumifi2519的博客 ) When I joined the project, my client had already received a few offers from some local digital agencies. Before I was even considering building the app on my own, I was asked to review them as a ...
PHP项目管理
2020-08-12 23:14

culh2177的博客 Whether we like it or not, unless we are doing a hobby project just for our own amusement, even the most technical among us are really just project managers who can code. And, as a project manager, w....
COMP3331 9331 Computer Networks and Applications
2024-09-25 19:01

wechat99515681的博客 When a server receives a request with the HEAD method, it responds with only the message header lines (i.e. the response to the GET method minus the actual requested object). HEAD / ...
RFC1122
2022-03-16 13:41

ztenv的博客 Network Working Group Internet Engineering Task ...Request for Comments: 1122 R. Braden, Editor October 1989 Requirements for Internet Hosts -- Communication Layers Status of This Memo This RFC is an o
物联网lwIP网络开发 1.3RAW API编程模型
2021-02-18 16:04

屿anglersking＇s THU的博客物联网lwIP网络开发 1.3RAW API编程模型Raw API 编程模型Raw APITCP connection setuptcp_newtcp_bind()tcp_listen()tcp_accepterr_t (* accept)(void *arg, struct tcp_pcb *newpcb, err_t err)tcp_connecterr_t (*...
NCE-2
2023-07-17 14:29

BrooksHatlen的博客 It was the last day of the year and a large crowd of people had gathered under the Town Hall clock. It would strike twelve in twenty minutes' time. Fifteen minutes passed and then, at five to twelve,...
Error: Can't set headers after they are sent to the client
2017-07-03 09:58

打杂人的博客 I than received the following error: Error : Can ' t render headers after they are sent to the client . at ServerResponse .< anonymous > ( http . js : 573 : 11 ) at ServerResponse . _...
Nacos源码—3.Nacos集群高可用分析一
2025-07-13 01:23

云曌的博客 @Component("distroMapper") public class DistroMapper extends MemberChangeListener { //List of service nodes, you must ensure that the order of healthyList is the same for all nodes. private volatile ...
Programming Internal Flash Over the Serial Wire Debug Interface
2022-03-26 09:24

清梦云河的博客 Serial Wire Debug (SWD) is a two-wire protocol for accessing the ARM debug interface. It is part ofthe ARM Debug Interface Specification v5 and is an alternative to JTAG. The physical layer of ...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 6月8日

The same RecvTensor (GrpcWorker) request was received twice

2条回答 默认 最新

问题事件

2条回答默认最新