刚搭建了一套hadoop3.3.6的测试集群,遇到一个怪事,格式化没有问题,namenode、datanode、yarn的页面都能正常打开,测试hadoop等一系列自带命令时也没啥问题。
但是在提交一个MR跑一下,想看看是否能正常提交MR任务的时候,发现,任务可以正常到yarn里面,但是任务从队列里面出来task的applicationmaster被下放到datanode节点启动之后就会立马宕掉,完整日志如下
[root@hdp4 hadoop]# hadoop jar /opt/hadoop-3.3.6/share/hadoop/mapreduce/hadoop-mapreduce-examples-3.3.6.jar wordcount /input /output
2024-06-20 00:02:49 INFO JobResourceUploader:907 - Disabling Erasure Coding for path: /hisdata/staging/root/.staging/job_1718812910805_0001
2024-06-20 00:02:50 INFO FileInputFormat:300 - Total input files to process : 1
2024-06-20 00:02:50 INFO JobSubmitter:202 - number of splits:1
2024-06-20 00:02:50 INFO JobSubmitter:298 - Submitting tokens for job: job_1718812910805_0001
2024-06-20 00:02:50 INFO JobSubmitter:299 - Executing with tokens: []
2024-06-20 00:02:50 INFO Configuration:2854 - resource-types.xml not found
2024-06-20 00:02:50 INFO ResourceUtils:476 - Unable to find 'resource-types.xml'.
2024-06-20 00:02:51 INFO YarnClientImpl:338 - Submitted application application_1718812910805_0001
2024-06-20 00:02:51 INFO Job:1682 - The url to track the job: http://hdp5:8088/proxy/application_1718812910805_0001/
2024-06-20 00:02:51 INFO Job:1727 - Running job: job_1718812910805_0001
2024-06-20 00:03:17 INFO Job:1748 - Job job_1718812910805_0001 running in uber mode : false
2024-06-20 00:03:17 INFO Job:1755 - map 0% reduce 0%
2024-06-20 00:03:17 INFO Job:1768 - Job job_1718812910805_0001 failed with state FAILED due to: Application application_1718812910805_0001 failed 2 times due to AM Container for appattempt_1718812910805_0001_000002 exited with exitCode: 1
Failing this attempt.Diagnostics: [2024-06-20 00:03:17.049]Exception from container-launch.
Container id: container_e19_1718812910805_0001_02_000001
Exit code: 1
[2024-06-20 00:03:17.078]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
[2024-06-20 00:03:17.081]Container exited with a non-zero exit code 1. Error file: prelaunch.err.
Last 4096 bytes of prelaunch.err :
Last 4096 bytes of stderr :
log4j:WARN No appenders could be found for logger (org.apache.hadoop.mapreduce.v2.app.MRAppMaster).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.
For more detailed output, check the application tracking page: http://hdp5:8088/cluster/app/application_1718812910805_0001 Then click on links to logs of each attempt.
. Failing the application.
2024-06-20 00:03:17 INFO Job:1773 - Counters: 0
我排查了namenode的状态发现是正常的,也没有在安全模式下,我也检查了配置文件没有发现出入,要不然格式化也过不了,还排查了网络通信也是ping的通的,最后我想是不是资源不够的问题,可是我尝试了把集群资源给到了96G/45C,甚至关闭了检查都是老样子在MapReduce刚开始要运行之后applicationmaster就会宕掉,想不通怎么回事,有什么建议点吗,hadoop版本是3.3.6