mapreduce中combiner的作用

代码是求这几个数字的最大值

图片说明

下面是我的代码

Mapper


import java.io.IOException;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;

public class NumMapper extends Mapper<LongWritable, Text, LongWritable, LongWritable> {

    public void map(LongWritable ikey, Text ivalue, Context context) throws IOException, InterruptedException {
        String line=ivalue.toString();
        long num = Long.parseLong(line);
        context.write(new LongWritable(1), new LongWritable(num));
    }

}

Combiner

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

import io.netty.handler.codec.http.HttpHeaders.Values;

public class NumCombiner extends Reducer<LongWritable, LongWritable, LongWritable, LongWritable> {
    @Override
    protected void reduce(LongWritable key, Iterable<LongWritable> value,
            Reducer<LongWritable, LongWritable, LongWritable, LongWritable>.Context context)
            throws IOException, InterruptedException {
        Iterator<LongWritable> iter=value.iterator();
        long max=Long.MIN_VALUE;
        while(iter.hasNext()) {
            long tmp=iter.next().get();
            max =tmp>max?tmp:max;
        }
        context.write(new LongWritable(1), new LongWritable(max));
    }
    }

Reducer

import java.io.IOException;
import java.util.Iterator;

import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;

public class NumReducer extends Reducer<LongWritable, LongWritable, LongWritable, NullWritable> {

    public void reduce(Text _key, Iterable<LongWritable> values, Context context) throws IOException, InterruptedException {

        Iterator<LongWritable> ite =values.iterator();

        long num=0;
        if(ite.hasNext()) {
            num=ite.next().get();
            long now =ite.next().get();
            num=now>num?now:num;
        }

        context.write(new LongWritable(num), NullWritable.get());
    }

}

Driver

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.NullWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class NumDriver {

    public static void main(String[] args) throws Exception {
        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf, "JobName");
        job.setJarByClass(MaxMin.NumDriver.class);
        job.setMapperClass(MaxMin.NumMapper.class);
        job.setReducerClass(MaxMin.NumReducer.class);
        job.setCombinerClass(NumCombiner.class);

        job.setMapOutputKeyClass(LongWritable.class);
        job.setMapOutputValueClass(LongWritable.class);
        job.setOutputKeyClass(LongWritable.class);
        job.setOutputValueClass(NullWritable.class);

        FileInputFormat.setInputPaths(job, new Path("hdfs://192.168.77.81:9000/park1/num.txt"));
        FileOutputFormat.setOutputPath(job, new Path("hdfs://192.168.77.81:9000/park2/MaxMin"));

        if (!job.waitForCompletion(true))
            return;
    }

}

我的问题是如果我注销了在Driver中Combiner的那行代码我的输出结果就会变成这样
图片说明

正常如果写了combiner的输出是这样的
图片说明

我想问问为啥会这样因为按照我的理解是combiner只是进入reduce之前的一个本地聚合并不是一个会影响输出结果的东西麻烦来个大神解释一下~

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

2条回答

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
吃鸡王者 2019-05-22 14:28
关注
Reducer类中的reducer函数的第一个参数类型应该是LongWritable，还有就是你reducer函数中最大值的比较有问题，建议参考combiner中的最大值比较方法。

解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

MapReduce中的Golang GlobalSign mgo查询 mapreduce mongodb
2019-07-17 12:44

回答 2 已采纳 MapReduce is a function of the Query struct returned by Find. So in order to apply your MapReduce
在mapreduce编程实践中 name node处于安全模式 hadoop mapreduce
2022-11-01 09:17

回答 2 已采纳你试试hadoop dfsadmin -safemode leave或者你cd到hadoop/bin目录下，再执行命令
storm 或mapreduce中警告信息Hbase租约超期人工智能
2019-09-02 15:57

回答 1 已采纳 hbase.regionserver.lease.period过期了（不再进行维护并且可能存在性能等方面的问题），应该使用hbase.client.scanner.timeout.period替换
基于MapReduce的词频统计程序及其重构、MapReduce编程之Combiner、Partitioner组件应用.zip
2024-03-13 17:24

人工智能-hadoop
MapReduce中执行HTMLUnit 报nosuchfielderror：INSTANCE html5 mapreduce
2015-09-01 09:17

回答 1 已采纳 http://blog.csdn.net/izgnaw/article/details/18045879
MapReduce分析气象数据 java linux mapreduce
2022-06-19 22:26

回答 3 已采纳是只能使用mapreduce写吗，你把他整成sql，操作不是简单多了吗
mongoTemplate 中的mapreduce 的详细用法 java mapreduce mongodb
2017-05-16 07:36

回答 1 已采纳 http://www.open-open.com/lib/view/open1394002780318.html
【combiner使用及错误】案例分析
2021-04-12 22:25

BUG_GUB的博客 MapReduce中的Combiner是为了避免map任务和reduce任务之间的数据传输而设置的。Hadoop允许用户针对maptask的输出指定一个合并函数。即为了减少传输到Reduce中的数据量。它主要是为了削减Mapper的输出从而减少...
MapReduce，hadoop,eclipse hadoop java mapreduce
2022-12-15 20:47

回答 1 已采纳这些都是日志，包含处理进度、数据处理的一些统计信息，比如数据条数、所占空间大小等。
MapReduce Unable to initialize MapOutputCollector hadoop java mapreduce 有问必答
2022-04-22 14:34

回答 2 已采纳 public class StudentScore implements Serializable, WritableComparable<StudentScore> 类要实现Seria
Hadoop mapreduce传值问题 hadoop mapreduce 推荐算法
2018-04-25 00:52

回答 1 已采纳 step4输出的是 UserId\tItemId,Score，也就是Step5的Map的step4数据KEY是UserId，map的step2的数据KEY是itemID，肯定没办法走到同一个循环。
MapReduce基础入门4
2022-10-04 20:29

陈万君Allen的博客 MapReduce基础入门4
Mapreduce集思功能实现，想不出啦实在 java mapreduce
2023-03-11 23:23

回答 2 已采纳链接：https://pan.baidu.com/s/1Tg-5DX0uuyM1m_mqLOV_JQ提取码：7alt
从单词统计看MapReduce算法内聚合
2020-04-25 22:19

j_thame_myhome的博客对MapReduce统计单词出现次数在从单词统计看MapReduce一文中已经做了简单的介绍。对此给出了一个较为简单的统计算法: Map函数输入:(key：文档a，value：文档内容d) 输出:(key：单词t，value：单词t在文档d中出现...
mapReduce知识点总结
2020-11-16 18:28

青春季风暴的博客 mapreduce概述 mapreduce定义： MapReduce是一个分布式运算程序的编程框架,是用户开发" 基于Hadoop的数据分析应用”的核心框架。 MapReduce核心功能是将用户编写的业务逻辑代码和自带默认组件整合成一个完整...
没有解决我的问题, 去提问

悬赏问题

¥15 delta降尺度计算的一些细节，有偿
¥15 Arduino红外遥控代码有问题
¥15 数值计算离散正交多项式
¥30 数值计算均差系数编程
¥15 redis-full-check比较两个集群的数据出错
¥15 Matlab编程问题
¥15 训练的多模态特征融合模型准确度很低怎么办
¥15 kylin启动报错log4j类冲突
¥15 超声波模块测距控制点灯，灯的闪烁很不稳定，经过调试发现测的距离偏大
¥15 import arcpy出现importing _arcgisscripting 找不到相关程序