beat_it 2017-11-21 11:51 采纳率: 0%
浏览 1544

Hadoop NameNode 死亡原因?

情况1:
Remote journal 192.168.8.195:8485 failed to write txns 1698499-1698499. Will try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 48 is less than the last promised epoch 49
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:429)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:457)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:352)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:149)

** 情况2:**
2017-11-21 19:26:01,859 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 43505 milliseconds
2017-11-21 19:26:01,860 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 21624 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2017-11-21 19:26:01,861 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.8.191:8485, 192.168.8.192:8485, 192.168.8.193:8485, 192.168.8.194:8485, 192.168.8.195:8485], stream=QuorumOutputStream starting at txid 110343))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:659)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:593)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4070)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4053)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:845)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:308)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:603)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
2017-11-21 19:26:01,867 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 110343
2017-11-21 19:26:01,863 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 4 millisecond(s).
2017-11-21 19:26:01,870 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.8.191:50010 to delete [blk_1073750549_9840]
2017-11-21 19:26:01,947 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.8.194:50010 to delete [blk_1073750549_9840]
2017-11-21 19:26:01,947 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21713ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.193:8485
2017-11-21 19:26:01,945 WARN org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 21487ms
GC pool 'ParNew' had collection(s): count=1 time=21603ms
2017-11-21 19:26:01,944 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21709ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.192:8485
2017-11-21 19:26:01,938 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21703ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.194:8485
2017-11-21 19:26:01,938 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21703ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.195:8485
2017-11-21 19:26:01,934 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21700ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.191:8485
2017-11-21 19:26:01,960 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 610 Total time for transactions(ms): 110 Number of transactions batched in Syncs: 179 Number of syncs: 0 SyncTimes(ms):
2017-11-21 19:26:01,997 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-11-21 19:26:02,121 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

  • 写回答

1条回答 默认 最新

  • lshen01 2023-03-16 09:53
    关注

    参考GPT和自己的思路:

    在这两种情况下,Hadoop NameNode 均死亡,原因在于与 JournalNode 的通信发生了问题。在情况1中,Hadoop NameNode 由于 Remote Journal 节点未能将 txns 1698499 写入 JournalNode,导致出现了异常并失败了。在情况2中,多个 JournalNode 未能随时响应 NameNode 的写入请求,原因可能是 JournalNode 故障或网络延迟等原因,这也导致了异常和失败。因此,需要检查 JournalNode 节点的状态以确定问题的具体位置,并检查网络和其他运行状况以确保环境是否稳定。当然,在生产环境中,还需要设置 JournalNode 之间的冗余和其它监控机制,保证其可靠性,防止出现单点故障等问题。

    评论

报告相同问题?

悬赏问题

  • ¥30 STM32 INMP441无法读取数据
  • ¥100 求汇川机器人IRCB300控制器和示教器同版本升级固件文件升级包
  • ¥15 用visualstudio2022创建vue项目后无法启动
  • ¥15 x趋于0时tanx-sinx极限可以拆开算吗
  • ¥500 把面具戴到人脸上,请大家贡献智慧
  • ¥15 任意一个散点图自己下载其js脚本文件并做成独立的案例页面,不要作在线的,要离线状态。
  • ¥15 各位 帮我看看如何写代码,打出来的图形要和如下图呈现的一样,急
  • ¥30 c#打开word开启修订并实时显示批注
  • ¥15 如何解决ldsc的这条报错/index error
  • ¥15 VS2022+WDK驱动开发环境