测试Hadoop手册的'dfs[a-z.]+'例子时在MapReduce执行过程中namenode总是崩溃

配置好了Hadoop高可用集群，我照着Hadoop手册里面的例子进行测试，上传文件没什么问题，但是执行这段代码时总是报错

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep /mr/hw/input /mr/hw/output 'dfs[a-z.]+'

报错信息：

[root@node-2 hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep /mr/hw/input /mr/hw/output 'dfs[a-z.]+'
2020-04-07 13:32:48,240 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1586236965054_0001
2020-04-07 13:32:49,380 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-07 13:32:57,428 INFO input.FileInputFormat: Total input files to process : 9
2020-04-07 13:32:57,766 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-07 13:32:58,762 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-07 13:32:59,219 INFO mapreduce.JobSubmitter: number of splits:9
2020-04-07 13:33:00,319 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-07 13:33:01,025 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1586236965054_0001
2020-04-07 13:33:01,025 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-04-07 13:33:01,983 INFO conf.Configuration: resource-types.xml not found
2020-04-07 13:33:01,985 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-04-07 13:33:03,519 INFO impl.YarnClientImpl: Submitted application application_1586236965054_0001
2020-04-07 13:33:03,604 INFO mapreduce.Job: The url to track the job: http://node-1:8088/proxy/application_1586236965054_0001/
2020-04-07 13:33:03,604 INFO mapreduce.Job: Running job: job_1586236965054_0001
2020-04-07 13:33:21,507 INFO mapreduce.Job: Job job_1586236965054_0001 running in uber mode : false
2020-04-07 13:33:21,536 INFO mapreduce.Job:  map 0% reduce 0%
2020-04-07 13:33:38,721 INFO mapreduce.Job:  map 11% reduce 0%
2020-04-07 13:41:07,792 INFO mapreduce.Job:  map 11% reduce 4%
2020-04-07 13:43:20,887 INFO mapreduce.Job:  map 22% reduce 4%
2020-04-07 13:43:23,020 INFO mapreduce.Job:  map 33% reduce 4%
2020-04-07 13:43:25,509 INFO mapreduce.Job:  map 44% reduce 4%
2020-04-07 13:44:10,305 INFO mapreduce.Job:  map 56% reduce 4%
2020-04-07 13:46:18,587 INFO mapred.ClientServiceDelegate: Could not get Job info from RM for job job_1586236965054_0001. Redirecting to job history server.
2020-04-07 13:46:24,155 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:25,267 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:26,311 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:27,311 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:28,378 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:29,433 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:30,435 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:31,501 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:32,567 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:33,656 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:34,536 INFO mapred.ClientServiceDelegate: Could not get Job info from RM for job job_1586236965054_0001. Redirecting to job history server.
2020-04-07 13:46:35,584 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:36,722 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:37,756 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:38,778 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:39,823 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:40,848 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:41,892 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:42,894 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:43,959 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:45,017 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:45,806 INFO mapred.ClientServiceDelegate: Could not get Job info from RM for job job_1586236965054_0001. Redirecting to job history server.
2020-04-07 13:46:46,972 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:48,017 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:49,074 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:50,131 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:51,187 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:52,299 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:53,395 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:54,440 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:55,464 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:56,508 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
java.io.IOException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see:  http://wiki.apache.org/hadoop/UnsetHostnameOrPort
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:398)
    at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:878)
    at org.apache.hadoop.mapreduce.Job$6.run(Job.java:732)
    at org.apache.hadoop.mapreduce.Job$6.run(Job.java:729)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:729)
    at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1652)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591)
    at org.apache.hadoop.examples.Grep.run(Grep.java:78)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.examples.Grep.main(Grep.java:103)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.net.ConnectException: Your endpoint configuration is wrong; For more details see:  http://wiki.apache.org/hadoop/UnsetHostnameOrPort
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:753)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
    at org.apache.hadoop.ipc.Client.call(Client.java:1491)
    at org.apache.hadoop.ipc.Client.call(Client.java:1388)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
    at com.sun.proxy.$Proxy17.getTaskAttemptCompletionEvents(Unknown Source)
    at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:177)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:326)
    ... 26 more
Caused by: java.net.ConnectException: 拒绝连接
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:804)
    at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:421)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1606)
    at org.apache.hadoop.ipc.Client.call(Client.java:1435)
    ... 35 more

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
大大怪打LZR 2023-08-13 21:46
关注
根据您提供的日志信息，主要错误信息是：

java.io.IOException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see: http://wiki.apache.org/hadoop/UnsetHostnameOrPort

这个错误通常是由于Hadoop在运行作业时无法连接到ResourceManager或JobHistoryServer引起的。错误消息中提到的链接可以提供更多关于此错误的信息，您可以根据那里的建议来尝试解决。

通常，这个错误可能是由以下原因引起的：

Hostname 或端口配置错误： 检查你的Hadoop配置文件，特别是core-site.xml，确保yarn.resourcemanager.hostname和yarn.resourcemanager.address配置正确，指向ResourceManager的正确主机名和端口。

防火墙或网络问题： 如果Hadoop集群中的某些节点之间存在网络阻塞、防火墙或其他连接问题，可能会导致连接失败。确保所有节点之间的网络通信是正常的。

服务未启动： 确保ResourceManager和JobHistoryServer在集群上正常运行。您可以通过访问ResourceManager的Web界面（通常在端口8088上）来验证ResourceManager是否正在运行，通过访问JobHistoryServer的Web界面（通常在端口19888上）来验证JobHistoryServer是否正在运行。

Hostname 解析问题： 确保所有节点上的主机名都可以解析为正确的IP地址。您可以尝试在各个节点上使用ping命令来验证主机名是否正确解析。

节点健康状态： 有时候，节点的健康状态可能会影响它们之间的通信。确保所有节点的健康状态都是正常的。

Hadoop配置文件同步： 如果您的Hadoop配置文件在集群中的各个节点之间不一致，可能会导致通信问题。确保配置文件在所有节点上都是一致的。

节点资源不足： 如果集群节点的资源不足，可能会影响服务的正常运行。确保每个节点都有足够的资源来运行Hadoop服务。

总之，以上是一些常见的导致该错误的原因。您可以逐一排查这些原因，并根据实际情况进行调整和修复。如果问题仍然存在，您可能需要进一步检查集群的配置和状态，以确定更详细的问题所在。
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

大数据-02-Hadoop集群 XML core-site.xml hdfs-site.xml HDFS Yarn MapRedece
2024-06-28 09:09

武子康的博客 HDFS（Hadoop Distributed File System）是Hadoop框架中专为大数据存储设计的分布式文件系统，具备高容错和高扩展能力。系统采用NameNode与DataNode的主从架构，支持文件分块存储与多副本机制，适合批处理和一次写入...
【详解】Hadoop执行start-all.sh时namenode没有启动
2025-01-13 18:23

牛肉胡辣汤的博客脚本时，如果Namenode没有启动，这通常意味着配置文件中存在错误，或者是环境变量设置不当，也可能是端口被占用等问题。通过以上步骤，你可以排查和解决Hadoop集群中Namenode没有启动的问题。脚本时，如果发现...
解决hadoop高可用使用start-dfs.sh脚本启动时namenode启动不了的问题
2021-05-13 16:56

薛定谔的猫不吃猫粮的博客在通过使用hadoop提供的脚本 sbin/start-dfs.sh 启动hdfs时，经常发现高可用的namenode之启动了一个查看错误日志发现 STARTUP_MSG: Starting NameNode STARTUP_MSG: host = hdp14/192.168.204.14 STARTUP_MSG: args...
Hadoop学习-集群配置文件core-site.xml、hdfs-site.xml、yarn-site.xml、mapred-site.xml
2023-02-17 14:03

HaveAGoodDay.的博客集群配置文件，hadoop
【Hadoop】hdfs-site.xml配置文件参数说明
2021-10-16 19:31

阿龙先生啊的博客 -- 当全部DN被标记为脏DN的比率高于此阀值，停止不写数据到脏DN的策略，以免造成热点问题（有效的，可写的DN太少，压力太大）， dfs.namenode.avoid.write.stale.datanode -->
Hadoop hadoop配置文件yarn-site.xml、mapred-site.xml、hdfs-site.xml 、core-site.xml、hadoop-env.cmd详解
2019-09-13 15:12

ai_64的博客配置hadoop，主要是配置core-site.xml,hdfs-site.xml,mapred-site.xml三个配置文件。上网找的配置可能因为各个hadoop版本不同，导致无法生效，这里需要经验积累。参数繁多，不用过多纠结每个参数的含义，先照搬...
Hadoop ----HDFS MapReduce
2021-11-20 19:45

烟解愁、酒上头的博客 HDFS NameNode ...获取NameNode的数据延后将新的NameNode数据进行合并然后再次写入NameNode中 DataNode 负责数据的存储数据是按块存储的 hadoop 2.XXX以后 128M DataNode定时（心跳机制）将.
Hadoop3.x - 本地安装 + 完全分布式安装 + 集群配置 + xsync分发脚本 (解决root用户启动Hadoop集群的报错问题)
2022-09-02 18:04

现在作诗的博客 ERROR: Attempting to operate on hdfs namenode as root ERROR: but there is no HDFS_NAMENODE_USER defined. Aborting operation. ERROR: Attempting to operate on hdfs secondarynamenode as root ERROR: but ...
解决 hadoop jar hadoop-mapreduce-examples-2.6.0-cdh5.15.1.jar pi 2 3报错
2019-09-22 01:13

我是一只代码狗的博客第一步：停止所有程序,进入/root/app/hadoop-2.6.0-cdh5.15.1/sbin 然后输入./stop-all.sh 这是我的目录(你的自己找) 第二步: hdfs-site.xml 这样写 [root@hadoop hadoop]# cat hdfs-site.xml <configuration&...
[hadoop]org.apache.hadoop.hdfs.server.namenode.SafeModeException: Cannot delete /tmp/hadoop-yarn/...
2021-08-13 13:49

枪枪枪的博客 [hadoop@hadoop102 hadoop-3.1.3]$ hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.1.3.jar wordcount /wcinput/wcinput /wcoutput 2021-08-13 11:57:39,172 INFO client.RMProxy: Connecting to ...
没有解决我的问题, 去提问

测试Hadoop手册的'dfs[a-z.]+'例子时在MapReduce执行过程中namenode总是崩溃

1条回答 默认 最新

1条回答默认最新