Hadoop自定义分组和多ReductTask出现异常

我现在有三个节点程序在windows下编写,并将Job提交到了集群的Yarn上去执行,出现异常.但是在Linux下使用Hadoop jar 执行是可以的.之前在执行WordCount和其他小程序时候, 并没有出错,我认为错误原因在于这个ReductTask.请大牛指导一下.万分感谢..

 2015-12-04 15:33:43,100 INFO  [main] client.RMProxy (RMProxy.java:createRMProxy(92)) - Connecting to ResourceManager at hadoop01/10.5.110.250:8032
2015-12-04 15:33:43,458 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(150)) - Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
2015-12-04 15:33:43,478 WARN  [main] mapreduce.JobSubmitter (JobSubmitter.java:copyAndConfigureFiles(259)) - No job jar file set.  User classes may not be found. See Job or Job#setJar(String).
2015-12-04 15:33:43,525 INFO  [main] input.FileInputFormat (FileInputFormat.java:listStatus(280)) - Total input paths to process : 1
2015-12-04 15:33:43,573 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:submitJobInternal(396)) - number of splits:1
2015-12-04 15:33:43,655 INFO  [main] mapreduce.JobSubmitter (JobSubmitter.java:printTokens(479)) - Submitting tokens for job: job_1449213919153_0002
2015-12-04 15:33:43,744 INFO  [main] mapred.YARNRunner (YARNRunner.java:createApplicationSubmissionContext(369)) - Job jar is not present. Not adding any jar to the list of resources.
2015-12-04 15:33:43,778 INFO  [main] impl.YarnClientImpl (YarnClientImpl.java:submitApplication(204)) - Submitted application application_1449213919153_0002
2015-12-04 15:33:43,807 INFO  [main] mapreduce.Job (Job.java:submit(1289)) - The url to track the job: http://hadoop01:8088/proxy/application_1449213919153_0002/
2015-12-04 15:33:43,808 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1334)) - Running job: job_1449213919153_0002
2015-12-04 15:33:46,823 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1355)) - Job job_1449213919153_0002 running in uber mode : false
2015-12-04 15:33:46,825 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1362)) -  map 0% reduce 0%
2015-12-04 15:33:46,833 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1375)) - Job job_1449213919153_0002 failed with state FAILED due to: Application application_1449213919153_0002 failed 2 times due to AM Container for appattempt_1449213919153_0002_000002 exited with  exitCode: -1000 due to: File file:/tmp/hadoop-yarn/staging/lixiwei/.staging/job_1449213919153_0002/job.splitmetainfo does not exist
.Failing this attempt.. Failing the application.
2015-12-04 15:33:46,861 INFO  [main] mapreduce.Job (Job.java:monitorAndPrintJob(1380)) - Counters: 0

程序如下:

 public class FlowSumArea
{
    public static class FlowSumAreaMapper
            extends Mapper<LongWritable, Text, Text, FlowBean>
    {
        @Override
        protected void map(LongWritable key, Text value,
                Mapper<LongWritable, Text, Text, FlowBean>.Context context)
                        throws IOException, InterruptedException
        {
            String line = value.toString();
            String[] fields = StringUtils.split(line, "\t");
            String phoneNo = fields[1];
            long upFlow = Long.parseLong(fields[7]);
            long downFLow = Long.parseLong(fields[8]);

            context.write(new Text(phoneNo),
                    new FlowBean(phoneNo, upFlow, downFLow));
        }
    }

    public static class FlowSumAreaReducer
            extends Reducer<Text, FlowBean, Text, FlowBean>
    {
        @Override
        protected void reduce(Text key, Iterable<FlowBean> values,
                Reducer<Text, FlowBean, Text, FlowBean>.Context context)
                        throws IOException, InterruptedException
        {
            long upFlowCounter = 0;
            long downFlowCounter = 0;
            for (FlowBean bean : values)
            {
                upFlowCounter += bean.getUpFlow();
                downFlowCounter += bean.getDownFlow();
            }

            context.write(key, new FlowBean(key.toString(), upFlowCounter,
                    downFlowCounter));
        }
    }

    public static void main(String[] args)
            throws IOException, ClassNotFoundException, InterruptedException
    {
        // 1.获取配置文件
        Configuration conf = new Configuration();
        // 2.设置Job
        Job job = Job.getInstance();
        job.setJarByClass(FlowSumArea.class);
        job.setMapperClass(FlowSumAreaMapper.class);
        job.setReducerClass(FlowSumAreaReducer.class);

        job.setPartitionerClass(AreaPartitioner.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(FlowBean.class);

        // 设置Reduce的任务并发数,应该跟分组的数量保持一致
        job.setNumReduceTasks(6);
        // 3.设置输入输出路径
         FileInputFormat.setInputPaths(job, new Path("C:\\Users\\51195\\Desktop\\flow\\flowarea\\srcdata"));
         FileOutputFormat.setOutputPath(job, new Path("C:\\Users\\51195\\Desktop\\flow\\flowarea\\outputdata6"));
//      FileInputFormat.setInputPaths(job, new Path(args[0]));
//      FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true)?0:1);
    }
}

这个是分组程序

 public class AreaPartitioner<KEY, VALUE> extends Partitioner<KEY, VALUE>{

    private static HashMap<String,Integer> areaMap = new HashMap<>();

    static{
        areaMap.put("135", 0);
        areaMap.put("136", 1);
        areaMap.put("137", 2);
        areaMap.put("138", 3);
        areaMap.put("139", 4);
    }





    @Override
    public int getPartition(KEY key, VALUE value, int numPartitions) {
        //从key中拿到手机号，查询手机归属地字典，不同的省份返回不同的组号

        int areaCoder  = areaMap.get(key.toString().substring(0, 3))==null?5:areaMap.get(key.toString().substring(0, 3));

        return areaCoder;
    }

}

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

1条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
liznzn0632 2016-01-18 10:14
关注
不知道楼主后来解决了没有

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

大数据Hadoop（二十二）：MapReduce的自定义分组
2021-05-31 20:40

Lansonli的博客 MapReduce的自定义分组 GroupingComparator是mapreduce当中reduce端的一个功能组件，主要的作用是决定哪些数据作为一组，调用一次reduce的逻辑，默认是每个不同的key，作为多个不同的组，每个组调用一次reduce逻辑...
大数据技术11：Hadoop 原理与运行机制
2023-12-14 10:17

AI何哥的博客主要包含两大核心组件：HDFS 分布式文件系统和 MapReduce 分布式并行计算框架，这两大核心组件是 Hadoop 进行大数据处理的基础和基石，此外，Hadoop 的重要组件还包括：Hadoop Common 和 YARN 框架。目前，Hadoop ...
hadoop 自定义分区
2022-01-03 11:23

小码农叔叔的博客 hadoop 自定义分区总结
基于Spark平台的分布式数据采集与ETL处理系统_支持多数据源接入与自定义函数处理_用于企业级大数据ETL流程自动化_包含Spark_Hadoop_Hive_Kafka_Scal.zip
2025-07-22 09:22

基于Spark平台的分布式数据采集与ETL处理系统_支持多数据源接入与自定义函数处理_用于企业级大数据ETL流程自动化_包含Spark_Hadoop_Hive_Kafka_Scal.zip
Hadoop 与 Spark：大数据框架的对比与融合
2025-05-12 14:46

Cloud Traveler的博客 Hadoop 和 Spark 是大数据处理领域的两大主流框架，各自具有独特的优势和适用场景。Hadoop 以分布式文件系统（HDFS）和 MapReduce 计算模型为核心，适合处理大规模批处理任务，尤其在成本效益和容错性方面表现突出。...
【大数据2025】Hadoop 万字讲解
2025-01-18 18:33

言之。的博客定义：为满足海量数据存储与计算的技术或架构。4V特征数据规模巨大（Volume）：如达到10PB、50PB等海量规模。数据生成和处理速度快（Velocity）：如鞋厂...满足这四个特征的场景即为大数据场景，也称大数据的4V特性。
hadoop及大数据技术生态体系介绍
2025-08-01 11:46

嘉禾望岗503的博客 Hadoop是由Apache基金会所开发的分布式系统基础架构，。广义上来说，Hadoop通常是指围绕Hadoop打造的大数据生态圈Hadoop官网：hadoop.apache.org。
【大数据】深入了解Hadoop
2024-09-18 10:39

编码人生_的博客所有的元数据会先存储在内存上文件信息,datanode信息,块信息元数据文件为了避免内存上的元数据丢失,会将内存的上的元数据保存在磁盘上secondarynamenode完成元数据文件的保存存储位置在hadoop的指定数据edits_...
Hadoop大数据资料集锦
2018-02-01 10:00

Hadoop大数据资料集锦Hadoop大数据资料集锦Hadoop大数据资料集锦Hadoop大数据资料集锦
大数据之Hadoop平台的搭建
2024-07-09 16:51

永恒之月的博客 [root@master ~]# 出现上述 Hadoop 帮助信息就说明 Hadoop 已经安装好了 2.2.3步骤三：修改目录所有者和所有者组上述安装完成的 Hadoop 软件只能让 root 用户使用，要让 hadoop 用户能够运行 Hadoop 软件，需要将...
没有解决我的问题, 去提问

Hadoop自定义分组和多ReductTask出现异常

1条回答 默认 最新

1条回答默认最新