头顶榴莲树 2020-06-24 21:09 采纳率: 0%
浏览 368

Yarn的ResourceManager给某个NodeManager发送Shutdown信号的原因有哪些

事情是这样的,有次重启集群,发现一个nodeManger节点启动不开,检查后发现是因为该节点被列入了禁用名单node_exclude.txt,于是从禁用名单中移除了该节点,再开启nodeManager后也能正常开启。(但是等下一次再重启集群时,该节点又被列入 了黑名单,没找出是什么原因。另外集群是用CDH搭建的,异常节点的机器曾经损坏过,然后在CDH客户端上直接删除了该节点,更换上新服务器后又加入集群代替原先节点,不知道是不是直接删除节点的原因)
这是ResourceManager的一段日志:

2020-05-12 07:16:27,848 INFO org.mortbay.log: Started HttpServer2$SelectChannelConnectorWithSafeStartup@hadoop2:8088
2020-05-12 07:16:27,849 INFO org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: Updating the current master key for generating delegation tokens
2020-05-12 07:16:27,852 INFO org.apache.hadoop.yarn.webapp.WebApps: Web app /cluster started at 8088
2020-05-12 07:16:28,539 INFO org.apache.hadoop.yarn.webapp.WebApps: Registered webapp guice modules
2020-05-12 07:16:28,551 INFO org.apache.hadoop.yarn.server.resourcemanager.rmnode.RMNodeImpl: hadoop1:8041 Node Transitioned from NEW to RUNNING
2020-05-12 07:16:28,554 INFO org.apache.hadoop.yarn.server.resourcemanager.scheduler.fair.FairScheduler: Added node hadoop1:8041 cluster capacity: __
2020-05-12 07:16:28,570 INFO org.apache.hadoop.ipc.CallQueueManager: Using callQueue: class java.util.concurrent.LinkedBlockingQueue queueCapacity: 100
2020-05-12 07:16:28,575 INFO org.apache.hadoop.ipc.Server: Starting Socket Reader #1 for port 8033
2020-05-12 07:16:28,578 INFO org.apache.hadoop.yarn.factories.impl.pb.RpcServerFactoryPBImpl: Adding protocol org.apache.hadoop.yarn.server.api.ResourceManagerAdministrationProtocolPB to the server
2020-05-12 07:16:28,590 INFO org.apache.hadoop.ipc.Server: IPC Server Responder: starting
2020-05-12 07:16:28,599 INFO org.apache.hadoop.ipc.Server: IPC Server listener on 8033: starting
2020-05-12 07:16:29,856 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager from hadoop2, Sending SHUTDOWN signal to the NodeManager.
2020-05-12 07:16:34,624 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager from hadoop2, Sending SHUTDOWN signal to the NodeManager.
2020-05-12 07:16:40,332 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager from hadoop2, Sending SHUTDOWN signal to the NodeManager.
2020-05-12 07:16:51,052 INFO org.apache.hadoop.yarn.server.resourcemanager.ResourceTrackerService: Disallowed NodeManager from hadoop2, Sending SHUTDOWN signal to the NodeManager.

    这是被关闭的NodeManager日志:

2020-05-12 07:16:51,182 INFO org.mortbay.log: Stopped HttpServer2$SelectChannelConnectorWithSafeStartup@hadoop2:8042
2020-05-12 07:16:51,283 INFO org.apache.hadoop.ipc.Server: Stopping server on 8041
2020-05-12 07:16:51,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8041
2020-05-12 07:16:51,285 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService: org.apache.hadoop.yarn.server.nodemanager.containermanager.logaggregation.LogAggregationService waiting for pending aggregation during exit
2020-05-12 07:16:51,285 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-05-12 07:16:51,286 WARN org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl: org.apache.hadoop.yarn.server.nodemanager.containermanager.monitor.ContainersMonitorImpl is interrupted. Exiting.
2020-05-12 07:16:51,299 INFO org.apache.hadoop.ipc.Server: Stopping server on 8040
2020-05-12 07:16:51,299 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server listener on 8040
2020-05-12 07:16:51,301 INFO org.apache.hadoop.ipc.Server: Stopping IPC Server Responder
2020-05-12 07:16:51,301 INFO org.apache.hadoop.yarn.server.nodemanager.containermanager.localizer.ResourceLocalizationService: Public cache exiting
2020-05-12 07:16:51,301 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: Stopping NodeManager metrics system...
2020-05-12 07:16:51,303 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system stopped.
2020-05-12 07:16:51,303 INFO org.apache.hadoop.metrics2.impl.MetricsSystemImpl: NodeManager metrics system shutdown complete.
2020-05-12 07:16:51,304 FATAL org.apache.hadoop.yarn.server.nodemanager.NodeManager: Error starting NodeManager
org.apache.hadoop.yarn.exceptions.YarnRuntimeException: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from hadoop2, Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:215)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.service.CompositeService.serviceStart(CompositeService.java:120)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.serviceStart(NodeManager.java:329)
at org.apache.hadoop.service.AbstractService.start(AbstractService.java:193)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.initAndStartNodeManager(NodeManager.java:563)
at org.apache.hadoop.yarn.server.nodemanager.NodeManager.main(NodeManager.java:609)
Caused by: org.apache.hadoop.yarn.exceptions.YarnRuntimeException: Recieved SHUTDOWN signal from Resourcemanager ,Registration of NodeManager failed, Message from ResourceManager: Disallowed NodeManager from hadoop2, Sending SHUTDOWN signal to the NodeManager.
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.registerWithRM(NodeStatusUpdaterImpl.java:283)
at org.apache.hadoop.yarn.server.nodemanager.NodeStatusUpdaterImpl.serviceStart(NodeStatusUpdaterImpl.java:209)
... 6 more
2020-05-12 07:16:51,305 INFO org.apache.hadoop.yarn.server.nodemanager.NodeManager: SHUTDOWN_MSG:
/************************************************************
SHUTDOWN_MSG: Shutting down NodeManager at hadoop2/192.168.111.102
************************************************************/

  • 写回答

1条回答 默认 最新

  • dabocaiqq 2020-08-14 15:48
    关注
    评论

报告相同问题?

悬赏问题

  • ¥15 r语言蛋白组学相关问题
  • ¥15 Python时间序列如何拟合疏系数模型
  • ¥15 求学软件的前人们指明方向🥺
  • ¥50 如何增强飞上天的树莓派的热点信号强度,以使得笔记本可以在地面实现远程桌面连接
  • ¥20 双层网络上信息-疾病传播
  • ¥50 paddlepaddle pinn
  • ¥20 idea运行测试代码报错问题
  • ¥15 网络监控:网络故障告警通知
  • ¥15 django项目运行报编码错误
  • ¥15 STM32驱动继电器