MapReduce的输出错误问题

为什么最后的输出并没有输出我理想的班级学科平均分呢？请求指导！

这里是原始数据

这里是输出结果


package experiment.big101;


import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Mapper;
import java.io.IOException;
import java.util.Arrays;


public class SMapper extends Mapper<LongWritable, Text, Text, Text> {

    public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {

        String line = value.toString();

        String[] str = line.split(",");
        int studentID = Integer.valueOf(str[0]);                //学号


        String[] score = Arrays.copyOfRange(str, 2, 5);            //数学、英语、语文成绩

        String scorestr = String.join(",",score);

        context.write(new Text(String.valueOf(studentID)), new Text(String.valueOf(scorestr)));

    }

}


package experiment.big101;

import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Reducer;
import java.io.IOException;


public class SReducer extends Reducer<Text, Text, Text, Text> {
    public void reduce(Text key, Iterable<Text> values, Context context) throws IOException, InterruptedException {
        int subject1Total = 0;
        int subject2Total = 0;
        int subject3Total = 0;
        int studentCount = 0;
        StringBuilder studentAverages = new StringBuilder();
        for (Text value : values) {
            String[] scores = value.toString().split(",");
            int s1 = Integer.parseInt(scores[0].trim());
            int s2 = Integer.parseInt(scores[1].trim());
            int s3 = Integer.parseInt(scores[2].trim());
            int studentAvg = (s1 + s2 + s3) / 3;
            if (studentAverages.length() > 0) {
                studentAverages.append(",");
            }
            studentAverages.append(studentAvg);
            subject1Total += s1;
            subject2Total += s2;
            subject3Total += s3;
            studentCount++;
        }
        int classAvg1 = subject1Total / studentCount;
        int classAvg2 = subject2Total / studentCount;
        int classAvg3 = subject3Total / studentCount;
        String classAverages = classAvg1 + "," + classAvg2 + "," + classAvg3;
        String output = classAverages + " | " + studentAverages.toString();
        context.write(key, new Text(output));
    }
}


package experiment.big101;

import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;

public class SMain {

    public static void main(String[] args) throws Exception {

        args = new String[] { "hdfs://master:9000/root/experiment/datas", "hdfs://master:9000/root/experiment/output" };

        Configuration conf = new Configuration();
        Job job = Job.getInstance(conf);

        job.setJarByClass(SMain.class);
        job.setMapperClass(SMapper.class);
        job.setReducerClass(SReducer.class);

        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);

        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));

        job.waitForCompletion(true);

    }

}

写回答
好问题 0 提建议
关注问题
分享
邀请回答
编辑收藏删除结题
收藏举报

5条回答默认最新

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
阿里嘎多学长 2025-06-03 10:20
关注
阿里嘎多学长整理AIGC生成，因移动端显示问题导致当前答案未能完全显示，请使用PC端查看更加详细的解答过程

MapReduce输出错误问题

你遇到了MapReduce输出错误的问题，最后的输出结果与你期望的班级学科平均分不符。下面是可能的解决方案：

检查MapReduce程序的逻辑：请检查你的MapReduce程序是否正确地计算了班级学科平均分。如果你的MapReduce程序使用了错误的逻辑或算法，可能会导致输出结果错误。
检查数据处理过程：请检查数据处理过程是否正确地处理了原始数据。如果数据处理过程中出现了错误，可能会导致输出结果错误。
检查MapReduce程序的配置：请检查MapReduce程序的配置是否正确。如果MapReduce程序的配置错误，可能会导致输出结果错误。
检查输出结果：请检查输出结果是否正确。如果输出结果正确，但是与你期望的班级学科平均分不符，可能是因为数据处理过程或MapReduce程序的逻辑错误。

以下是一个简单的MapReduce程序示例，用于计算班级学科平均分：

public class AverageScore { public static class MapClass extends Mapper<LongWritable, Text, Text, Float> { @Override public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { String[] fields = value.toString().split(","); String className = fields[0]; float score = Float.parseFloat(fields[1]); context.write(new Text(className), new Float(score)); } } public static class ReduceClass extends Reducer<Text, Float, Text, Float> { @Override public void reduce(Text key, Iterable<Float> values, Context context) throws IOException, InterruptedException { float sum = 0; int count = 0; for (Float value : values) { sum += value; count++; } float average = sum / count; context.write(key, new Float(average)); } } public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = Job.getInstance(conf, "AverageScore"); job.setJarByClass(AverageScore.class); job.setMapperClass(MapClass.class); job.setReducerClass(ReduceClass.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(Float.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }

这个示例程序使用MapReduce计算班级学科平均分，输入数据格式为className,score，输出结果为className,averageScore。
解决无用
评论打赏
分享
举报

评论

按下Enter换行，Ctrl+Enter发表内容

报告相同问题？

关注问题

大数据实战——MapReduce案例实践
2024-11-22 21:27

ZShiJ的博客 大数据实战——MapReduce案例实践
MapReduce输出至hbase共16页.pdf.zip
2022-10-29 05:00

《MapReduce输出至HBase详解》 MapReduce与HBase，两者都是大数据处理的重要组成部分，它们在大数据领域中各自扮演着关键角色。MapReduce作为分布式计算框架，擅长处理大规模数据的批处理任务；而HBase则是一个基于...
大数据-MapReduce基本介绍
2019-12-23 11:23

cwl_java的博客 1. MapReduce 介绍 MapReduce思想在生活中处处可见。或多或少都曾接触过这种思想。MapReduce的思想核心是“分而治之”，适用于大量复杂的任务处理场景（大规模数据处理场景）。 Map负责“分”，即把复杂的任务分解...
MapReduce输出数据
2022-10-07 17:29

悠然予夏的博客介绍了MapReduce输出数据OutputFormat类，以及如何自定义输出数据类型
大数据知识总结（二）：Hadoop之MapReduce重点架构原理
2024-07-10 23:15

Lansonli的博客 Reduce端每个Reduce task会从每个map ...第一次排序发生在Map阶段的磁盘溢写时：当MapReduce的环形缓冲区达到溢写阈值时，在数据刷写到磁盘之前，会对数据按照key的字典序进行快速排序，以确保每个分区内的数据有序。
解锁大数据领域 MapReduce 的强大功能
2025-04-24 23:07

光子AI的博客 MapReduce作为Google于2004年提出的分布式计算模型，通过“分而治之”思想将复杂任务拆解为可并行执行的子任务，成功解决了大规模数据的分布式处理难题。MapReduce的核心概念与架构设计Map/Shuffle/Reduce三阶段的...
大数据技术之Hadoop（MapReduce）
2023-03-01 01:00

好运仔dzl的博客 大数据技术之Hadoop（MapReduce）
大数据领域 MapReduce 架构设计要点
2026-01-17 19:58

AI架构全栈开发实战笔记的博客在大数据时代，单台计算机处理PB级数据已力不从心。MapReduce作为分布式计算的“开山鼻祖”（2004年由Google提出），通过“分而治之”的思想，将海量数据拆解为可并行处理的小任务，是理解分布式计算架构的核心模型...
大数据学习笔记-MapReduce（一）入门基础理论
2022-11-29 12:13

天码村的博客 MapReduce是Hadoop中生态圈重要组件，目前该组件随退到二线，但是其思想依然是很多框架的来源
大数据示例：使用MapReduce实现TopN分析
2025-07-15 16:37

csdn_tom_168的博客本文介绍了使用MapReduce模型实现TopN分析的算法设计。该方案通过两阶段处理：Map阶段计算局部TopN，Reduce阶段聚合全局TopN，适用于电商排行、社交网络分析等场景。文章详细展示了Java实现代码，包括使用优先队列...
没有解决我的问题, 去提问

问题事件

关注

码龄粉丝数原力等级 --

被采纳

被点赞

采纳率
创建了问题 6月3日

MapReduce的输出错误问题

5条回答 默认 最新

MapReduce输出错误问题

问题事件

5条回答默认最新