beat_it 2017-11-21 11:51 采纳率: 0%
浏览 1540

Hadoop NameNode 死亡原因?

情况1:
Remote journal 192.168.8.195:8485 failed to write txns 1698499-1698499. Will try to write to this JN again after the next log roll.
org.apache.hadoop.ipc.RemoteException(java.io.IOException): IPC's epoch 48 is less than the last promised epoch 49
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkRequest(Journal.java:429)
at org.apache.hadoop.hdfs.qjournal.server.Journal.checkWriteRequest(Journal.java:457)
at org.apache.hadoop.hdfs.qjournal.server.Journal.journal(Journal.java:352)
at org.apache.hadoop.hdfs.qjournal.server.JournalNodeRpcServer.journal(JournalNodeRpcServer.java:149)

** 情况2:**
2017-11-21 19:26:01,859 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 43505 milliseconds
2017-11-21 19:26:01,860 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 21624 ms (timeout=20000 ms) for a response for sendEdits. No responses yet.
2017-11-21 19:26:01,861 FATAL org.apache.hadoop.hdfs.server.namenode.FSEditLog: Error: flush failed for required journal (JournalAndStream(mgr=QJM to [192.168.8.191:8485, 192.168.8.192:8485, 192.168.8.193:8485, 192.168.8.194:8485, 192.168.8.195:8485], stream=QuorumOutputStream starting at txid 110343))
java.io.IOException: Timed out waiting 20000ms for a quorum of nodes to respond.
at org.apache.hadoop.hdfs.qjournal.client.AsyncLoggerSet.waitForWriteQuorum(AsyncLoggerSet.java:137)
at org.apache.hadoop.hdfs.qjournal.client.QuorumOutputStream.flushAndSync(QuorumOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:113)
at org.apache.hadoop.hdfs.server.namenode.EditLogOutputStream.flush(EditLogOutputStream.java:107)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream$8.apply(JournalSet.java:533)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.mapJournalsAndReportErrors(JournalSet.java:393)
at org.apache.hadoop.hdfs.server.namenode.JournalSet.access$100(JournalSet.java:57)
at org.apache.hadoop.hdfs.server.namenode.JournalSet$JournalSetOutputStream.flush(JournalSet.java:529)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:659)
at org.apache.hadoop.hdfs.server.namenode.FSEditLog.logSync(FSEditLog.java:593)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.deleteInt(FSNamesystem.java:4070)
at org.apache.hadoop.hdfs.server.namenode.FSNamesystem.delete(FSNamesystem.java:4053)
at org.apache.hadoop.hdfs.server.namenode.NameNodeRpcServer.delete(NameNodeRpcServer.java:845)
at org.apache.hadoop.hdfs.server.namenode.AuthorizationProviderProxyClientProtocol.delete(AuthorizationProviderProxyClientProtocol.java:308)
at org.apache.hadoop.hdfs.protocolPB.ClientNamenodeProtocolServerSideTranslatorPB.delete(ClientNamenodeProtocolServerSideTranslatorPB.java:603)
at org.apache.hadoop.hdfs.protocol.proto.ClientNamenodeProtocolProtos$ClientNamenodeProtocol$2.callBlockingMethod(ClientNamenodeProtocolProtos.java)
at org.apache.hadoop.ipc.ProtobufRpcEngine$Server$ProtoBufRpcInvoker.call(ProtobufRpcEngine.java:617)
at org.apache.hadoop.ipc.RPC$Server.call(RPC.java:1073)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2216)
at org.apache.hadoop.ipc.Server$Handler$1.run(Server.java:2212)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1920)
at org.apache.hadoop.ipc.Server$Handler.run(Server.java:2210)
2017-11-21 19:26:01,867 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Aborting QuorumOutputStream starting at txid 110343
2017-11-21 19:26:01,863 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 4 millisecond(s).
2017-11-21 19:26:01,870 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.8.191:50010 to delete [blk_1073750549_9840]
2017-11-21 19:26:01,947 INFO BlockStateChange: BLOCK* BlockManager: ask 192.168.8.194:50010 to delete [blk_1073750549_9840]
2017-11-21 19:26:01,947 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21713ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.193:8485
2017-11-21 19:26:01,945 WARN org.apache.hadoop.util.JvmPauseMonitor: Detected pause in JVM or host machine (eg GC): pause of approximately 21487ms
GC pool 'ParNew' had collection(s): count=1 time=21603ms
2017-11-21 19:26:01,944 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21709ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.192:8485
2017-11-21 19:26:01,938 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21703ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.194:8485
2017-11-21 19:26:01,938 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21703ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.195:8485
2017-11-21 19:26:01,934 WARN org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Took 21700ms to send a batch of 1 edits (127 bytes) to remote journal 192.168.8.191:8485
2017-11-21 19:26:01,960 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 610 Total time for transactions(ms): 110 Number of transactions batched in Syncs: 179 Number of syncs: 0 SyncTimes(ms):
2017-11-21 19:26:01,997 INFO org.apache.hadoop.util.ExitUtil: Exiting with status 1
2017-11-21 19:26:02,121 INFO org.apache.hadoop.hdfs.server.namenode.NameNode: SHUTDOWN_MSG:

  • 写回答

1条回答 默认 最新

  • lshen01 2023-03-16 09:53
    关注

    参考GPT和自己的思路:

    在这两种情况下,Hadoop NameNode 均死亡,原因在于与 JournalNode 的通信发生了问题。在情况1中,Hadoop NameNode 由于 Remote Journal 节点未能将 txns 1698499 写入 JournalNode,导致出现了异常并失败了。在情况2中,多个 JournalNode 未能随时响应 NameNode 的写入请求,原因可能是 JournalNode 故障或网络延迟等原因,这也导致了异常和失败。因此,需要检查 JournalNode 节点的状态以确定问题的具体位置,并检查网络和其他运行状况以确保环境是否稳定。当然,在生产环境中,还需要设置 JournalNode 之间的冗余和其它监控机制,保证其可靠性,防止出现单点故障等问题。

    评论

报告相同问题?

悬赏问题

  • ¥15 数据库数据成问号了,前台查询正常,数据库查询是?号
  • ¥15 算法使用了tf-idf,用手肘图确定k值确定不了,第四轮廓系数又太小才有0.006088746097507285,如何解决?(相关搜索:数据处理)
  • ¥15 彩灯控制电路,会的加我QQ1482956179
  • ¥200 相机拍直接转存到电脑上 立拍立穿无线局域网传
  • ¥15 (关键词-电路设计)
  • ¥15 如何解决MIPS计算是否溢出
  • ¥15 vue中我代理了iframe,iframe却走的是路由,没有显示该显示的网站,这个该如何处理
  • ¥15 操作系统相关算法中while();的含义
  • ¥15 CNVcaller安装后无法找到文件
  • ¥15 visual studio2022中文乱码无法解决