返夏 2020-04-07 14:00 采纳率: 50%
浏览 564

测试Hadoop手册的'dfs[a-z.]+'例子时在MapReduce执行过程中namenode总是崩溃

配置好了Hadoop高可用集群,我照着Hadoop手册里面的例子进行测试,上传文件没什么问题,但是执行这段代码时总是报错

bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep /mr/hw/input /mr/hw/output 'dfs[a-z.]+'

报错信息:

[root@node-2 hadoop]# bin/hadoop jar share/hadoop/mapreduce/hadoop-mapreduce-examples-3.2.1.jar grep /mr/hw/input /mr/hw/output 'dfs[a-z.]+'
2020-04-07 13:32:48,240 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/root/.staging/job_1586236965054_0001
2020-04-07 13:32:49,380 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-07 13:32:57,428 INFO input.FileInputFormat: Total input files to process : 9
2020-04-07 13:32:57,766 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-07 13:32:58,762 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-07 13:32:59,219 INFO mapreduce.JobSubmitter: number of splits:9
2020-04-07 13:33:00,319 INFO sasl.SaslDataTransferClient: SASL encryption trust check: localHostTrusted = false, remoteHostTrusted = false
2020-04-07 13:33:01,025 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1586236965054_0001
2020-04-07 13:33:01,025 INFO mapreduce.JobSubmitter: Executing with tokens: []
2020-04-07 13:33:01,983 INFO conf.Configuration: resource-types.xml not found
2020-04-07 13:33:01,985 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2020-04-07 13:33:03,519 INFO impl.YarnClientImpl: Submitted application application_1586236965054_0001
2020-04-07 13:33:03,604 INFO mapreduce.Job: The url to track the job: http://node-1:8088/proxy/application_1586236965054_0001/
2020-04-07 13:33:03,604 INFO mapreduce.Job: Running job: job_1586236965054_0001
2020-04-07 13:33:21,507 INFO mapreduce.Job: Job job_1586236965054_0001 running in uber mode : false
2020-04-07 13:33:21,536 INFO mapreduce.Job:  map 0% reduce 0%
2020-04-07 13:33:38,721 INFO mapreduce.Job:  map 11% reduce 0%
2020-04-07 13:41:07,792 INFO mapreduce.Job:  map 11% reduce 4%
2020-04-07 13:43:20,887 INFO mapreduce.Job:  map 22% reduce 4%
2020-04-07 13:43:23,020 INFO mapreduce.Job:  map 33% reduce 4%
2020-04-07 13:43:25,509 INFO mapreduce.Job:  map 44% reduce 4%
2020-04-07 13:44:10,305 INFO mapreduce.Job:  map 56% reduce 4%
2020-04-07 13:46:18,587 INFO mapred.ClientServiceDelegate: Could not get Job info from RM for job job_1586236965054_0001. Redirecting to job history server.
2020-04-07 13:46:24,155 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:25,267 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:26,311 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:27,311 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:28,378 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:29,433 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:30,435 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:31,501 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:32,567 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:33,656 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:34,536 INFO mapred.ClientServiceDelegate: Could not get Job info from RM for job job_1586236965054_0001. Redirecting to job history server.
2020-04-07 13:46:35,584 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:36,722 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:37,756 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:38,778 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:39,823 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:40,848 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:41,892 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:42,894 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:43,959 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:45,017 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:45,806 INFO mapred.ClientServiceDelegate: Could not get Job info from RM for job job_1586236965054_0001. Redirecting to job history server.
2020-04-07 13:46:46,972 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 0 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:48,017 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 1 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:49,074 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 2 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:50,131 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 3 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:51,187 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 4 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:52,299 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 5 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:53,395 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 6 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:54,440 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 7 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:55,464 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 8 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
2020-04-07 13:46:56,508 INFO ipc.Client: Retrying connect to server: 0.0.0.0/0.0.0.0:10020. Already tried 9 time(s); retry policy is RetryUpToMaximumCountWithFixedSleep(maxRetries=10, sleepTime=1000 MILLISECONDS)
java.io.IOException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see:  http://wiki.apache.org/hadoop/UnsetHostnameOrPort
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:345)
    at org.apache.hadoop.mapred.ClientServiceDelegate.getTaskCompletionEvents(ClientServiceDelegate.java:398)
    at org.apache.hadoop.mapred.YARNRunner.getTaskCompletionEvents(YARNRunner.java:878)
    at org.apache.hadoop.mapreduce.Job$6.run(Job.java:732)
    at org.apache.hadoop.mapreduce.Job$6.run(Job.java:729)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1730)
    at org.apache.hadoop.mapreduce.Job.getTaskCompletionEvents(Job.java:729)
    at org.apache.hadoop.mapreduce.Job.monitorAndPrintJob(Job.java:1652)
    at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:1591)
    at org.apache.hadoop.examples.Grep.run(Grep.java:78)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.hadoop.examples.Grep.main(Grep.java:103)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:71)
    at org.apache.hadoop.util.ProgramDriver.run(ProgramDriver.java:144)
    at org.apache.hadoop.examples.ExampleDriver.main(ExampleDriver.java:74)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:323)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:236)
Caused by: java.net.ConnectException: Your endpoint configuration is wrong; For more details see:  http://wiki.apache.org/hadoop/UnsetHostnameOrPort
    at sun.reflect.NativeConstructorAccessorImpl.newInstance0(Native Method)
    at sun.reflect.NativeConstructorAccessorImpl.newInstance(NativeConstructorAccessorImpl.java:62)
    at sun.reflect.DelegatingConstructorAccessorImpl.newInstance(DelegatingConstructorAccessorImpl.java:45)
    at java.lang.reflect.Constructor.newInstance(Constructor.java:423)
    at org.apache.hadoop.net.NetUtils.wrapWithMessage(NetUtils.java:833)
    at org.apache.hadoop.net.NetUtils.wrapException(NetUtils.java:753)
    at org.apache.hadoop.ipc.Client.getRpcResponse(Client.java:1549)
    at org.apache.hadoop.ipc.Client.call(Client.java:1491)
    at org.apache.hadoop.ipc.Client.call(Client.java:1388)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:233)
    at org.apache.hadoop.ipc.ProtobufRpcEngine$Invoker.invoke(ProtobufRpcEngine.java:118)
    at com.sun.proxy.$Proxy17.getTaskAttemptCompletionEvents(Unknown Source)
    at org.apache.hadoop.mapreduce.v2.api.impl.pb.client.MRClientProtocolPBClientImpl.getTaskAttemptCompletionEvents(MRClientProtocolPBClientImpl.java:177)
    at sun.reflect.GeneratedMethodAccessor5.invoke(Unknown Source)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.mapred.ClientServiceDelegate.invoke(ClientServiceDelegate.java:326)
    ... 26 more
Caused by: java.net.ConnectException: 拒绝连接
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:717)
    at org.apache.hadoop.net.SocketIOWithTimeout.connect(SocketIOWithTimeout.java:206)
    at org.apache.hadoop.net.NetUtils.connect(NetUtils.java:533)
    at org.apache.hadoop.ipc.Client$Connection.setupConnection(Client.java:700)
    at org.apache.hadoop.ipc.Client$Connection.setupIOstreams(Client.java:804)
    at org.apache.hadoop.ipc.Client$Connection.access$3800(Client.java:421)
    at org.apache.hadoop.ipc.Client.getConnection(Client.java:1606)
    at org.apache.hadoop.ipc.Client.call(Client.java:1435)
    ... 35 more
  • 写回答

1条回答 默认 最新

  • 大大怪打LZR 2023-08-13 21:46
    关注

    根据您提供的日志信息,主要错误信息是:

    java.io.IOException: java.net.ConnectException: Your endpoint configuration is wrong; For more details see:  http://wiki.apache.org/hadoop/UnsetHostnameOrPort
    

    这个错误通常是由于Hadoop在运行作业时无法连接到ResourceManager或JobHistoryServer引起的。错误消息中提到的链接可以提供更多关于此错误的信息,您可以根据那里的建议来尝试解决。

    通常,这个错误可能是由以下原因引起的:

    1. Hostname 或端口配置错误: 检查你的Hadoop配置文件,特别是core-site.xml,确保yarn.resourcemanager.hostnameyarn.resourcemanager.address配置正确,指向ResourceManager的正确主机名和端口。

    2. 防火墙或网络问题: 如果Hadoop集群中的某些节点之间存在网络阻塞、防火墙或其他连接问题,可能会导致连接失败。确保所有节点之间的网络通信是正常的。

    3. 服务未启动: 确保ResourceManager和JobHistoryServer在集群上正常运行。您可以通过访问ResourceManager的Web界面(通常在端口8088上)来验证ResourceManager是否正在运行,通过访问JobHistoryServer的Web界面(通常在端口19888上)来验证JobHistoryServer是否正在运行。

    4. Hostname 解析问题: 确保所有节点上的主机名都可以解析为正确的IP地址。您可以尝试在各个节点上使用ping命令来验证主机名是否正确解析。

    5. 节点健康状态: 有时候,节点的健康状态可能会影响它们之间的通信。确保所有节点的健康状态都是正常的。

    6. Hadoop配置文件同步: 如果您的Hadoop配置文件在集群中的各个节点之间不一致,可能会导致通信问题。确保配置文件在所有节点上都是一致的。

    7. 节点资源不足: 如果集群节点的资源不足,可能会影响服务的正常运行。确保每个节点都有足够的资源来运行Hadoop服务。

    总之,以上是一些常见的导致该错误的原因。您可以逐一排查这些原因,并根据实际情况进行调整和修复。如果问题仍然存在,您可能需要进一步检查集群的配置和状态,以确定更详细的问题所在。

    评论

报告相同问题?

悬赏问题

  • ¥100 求数学坐标画圆以及直线的算法
  • ¥100 c语言,请帮蒟蒻写一个题的范例作参考
  • ¥15 名为“Product”的列已属于此 DataTable
  • ¥15 安卓adb backup备份应用数据失败
  • ¥15 eclipse运行项目时遇到的问题
  • ¥15 关于#c##的问题:最近需要用CAT工具Trados进行一些开发
  • ¥15 南大pa1 小游戏没有界面,并且报了如下错误,尝试过换显卡驱动,但是好像不行
  • ¥15 自己瞎改改,结果现在又运行不了了
  • ¥15 链式存储应该如何解决
  • ¥15 没有证书,nginx怎么反向代理到只能接受https的公网网站