问题描述:
namenode连接journalnode报错,zkfc连接namenode也报错,都是同样的错。
namenode错误日志:
2019-07-16 18:55:52,617 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hostname/ip:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-07-16 18:55:52,616 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hostname/ip:8485. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-07-16 18:55:53,438 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 6001 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
2019-07-16 18:55:53,618 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hostname/ip:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-07-16 18:55:53,618 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hostname/ip:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-07-16 18:55:53,619 INFO org.apache.hadoop.ipc.Client: Retrying connect to server: hostname/ip:8485. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2019-07-16 18:55:54,439 INFO org.apache.hadoop.hdfs.qjournal.client.QuorumJournalManager: Waited 7003 ms (timeout=20000 ms) for a response for selectInputStreams. No responses yet.
journalnode错误日志:
2019-07-16 18:56:10,836 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ip:port:null (GSS initiate failed) with true cause: (GSS initiate failed)
2019-07-16 18:56:11,939 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ip:port:null (GSS initiate failed) with true cause: (GSS initiate failed)
2019-07-16 18:56:12,391 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ip:port:null (GSS initiate failed) with true cause: (GSS initiate failed)
2019-07-16 18:56:13,341 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ip:port:null (GSS initiate failed) with true cause: (GSS initiate failed)
2019-07-16 18:56:16,212 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ip:port:null (GSS initiate failed) with true cause: (GSS initiate failed)
2019-07-16 18:56:17,871 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ip:port:null (GSS initiate failed) with true cause: (GSS initiate failed)
2019-07-16 18:56:20,902 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ip:port:null (GSS initiate failed) with true cause: (GSS initiate failed)
2019-07-16 18:56:21,081 WARN SecurityLogger.org.apache.hadoop.ipc.Server: Auth failed for ip:port:null (GSS initiate failed) with true cause: (GSS initiate failed)
查看了一下kdc的日志:可能问题在这里
Jul 16 17:03:50 hadoop01 krb5kdc[47](info): TGS_REQ (8 etypes {18 17 20 19 16 23 25 26}) 10.10.10.40: LOOKING_UP_SERVER: authtime 0, root/hadoop00@HADOOP.COM for host/hadoop01@HADOOP.COM, Server not found in Kerberos database
Jul 16 17:03:50 hadoop01 krb5kdc[47](info): TGS_REQ (8 etypes {18 17 20 19 16 23 25 26}) 10.10.10.40: LOOKING_UP_SERVER: authtime 0, root/hadoop00@HADOOP.COM for host/hadoop00@HADOOP.COM, Server not found in Kerberos database
Jul 16 17:03:52 hadoop01 krb5kdc[47](info): AS_REQ (3 etypes {17 16 23}) 10.10.10.40: ISSUE: authtime 1563267832, etypes {rep=17 tkt=18 ses=17}, root/hadoop00@HADOOP.COM for krbtgt/HADOOP.COM@HADOOP.COM
Jul 16 17:03:53 hadoop01 krb5kdc[47](info): TGS_REQ (3 etypes {17 16 23}) 10.10.10.40: ISSUE: authtime 1563267832 , etypes {rep=17 tkt=18 ses=17}, root/hadoop00@HADOOP.COM for root/hadoop01@HADOOP.COM
Jul 16 17:03:54 hadoop01 krb5kdc[47](info): TGS_REQ (8 etypes {18 17 20 19 16 23 25 26}) 10.10.10.40: LOOKING_UP_SERVER: authtime 0, root/hadoop00@HADOOP.COM for host/hadoop10@HADOOP.COM, Server not found in Kerberos database
所以怀疑问题处在这里,本地kinit root 和HTTP用户都是可以的,正常情况下应该是访问HTTP/hadoop01@HADOOP.COM 而不是host/hadoop01@HADOOP.COM 不知道这里为什么会出现host,请kerberos的大神指导一下