在MapReduce处理数据时根据ip获取国家名称和国家码等信息

在Windows下可以获取数据，在linux开发环境下获取不到数据是为什么？代码如下：

package com.ctsig.cdn.log.util;

import com.fasterxml.jackson.databind.JsonNode;
import com.maxmind.db.Reader;

import java.io.File;
import java.io.IOException;
import java.net.InetAddress;
import java.net.UnknownHostException;

import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

/**

@copyright:
@create: 2018-05-14 17:54
*/
public class IP2CCUtil {
private static Logger logger = LoggerFactory.getLogger(IpAddressService.class);

public String getCountry(String ipAddress) {
File database = new File("/home/hadoop/GeoLite2-City.mmdb");
```
if(isIP(ipAddress)) {
    try {
        InetAddress address = InetAddress.getByName(ipAddress);
        Reader reader = new Reader(database);
        JsonNode response = reader.get(address);
        JsonNode country = response.get("country");
        reader.close();
        return String.format("%s %s",
                // 国家编码
                country.get("iso_code").asText(),
                // 国家英文名
                country.get("names").get("en").asText());
    }catch (UnknownHostException e1) {
        e1.printStackTrace();
    }catch (IOException e) {
        logger.debug("未查到地址ip为："+ipAddress+"的国家码和国家名等信息！", e);
    } 
}
return null;
```
}

/**
- 判断是否是有效的IP *
- @param ip IP
- @return true or false */ public static boolean isIP(String ip) { return ip.matches("\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}"); }
public static void main(String[] args) {
String result = new IP2CCUtil().getCountry("83.83.205.108") ;
if (result != null) {
String code = result.split(" ")[0] ;
String name = result.split(" ")[1] ;
System.out.println(result);
System.out.println("Country Code: "+code);
System.out.println("Country Name: "+name);
}

}
}

写回答
好问题 0 提建议
追加酬金
关注问题
分享
邀请回答
编辑收藏删除
收藏举报

2条回答默认最新

TaroLee 2018-06-11 06:39

关注

package com.ctsig.cdn.log;

import com.ctsig.cdn.log.util.DealDate;
import com.ctsig.cdn.log.util.IP2CCUtil;
import com.ctsig.cdn.log.util.PropsUtil;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.conf.Configured;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.io.Writable;
import org.apache.hadoop.mapreduce.InputSplit;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.input.FileSplit;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.LazyOutputFormat;
import org.apache.hadoop.mapreduce.lib.output.MultipleOutputs;
import org.apache.hadoop.mapreduce.lib.output.TextOutputFormat;
import org.apache.hadoop.util.Tool;
import org.apache.hadoop.util.ToolRunner;
import org.slf4j.Logger;
import org.slf4j.LoggerFactory;

import java.io.DataInput;
import java.io.DataOutput;
import java.io.IOException;
import java.text.ParseException;
import java.text.SimpleDateFormat;
import java.util.Date;
import java.util.TimeZone;

public class CleanData extends Configured implements Tool {
private static Logger logger = LoggerFactory.getLogger(CleanData.class);

public static class CleanMapper extends Mapper<Object, Text, Text, CleanBean> {

    @Override
    protected void map(Object key, Text value, Context context) throws IOException, InterruptedException {

        //获取文件名称，根据文件名称获取域名
        InputSplit inputSplit = context.getInputSplit();
        String filename = ((FileSplit) inputSplit).getPath().getName();
        String[] str = filename.split("_");
        String fileName = str[0];

        String line = fileName + " " + value.toString();
        context.write(new Text(), new CleanBean(line));
    }
}

public static class CleanReducer extends Reducer<Text, CleanBean, Text, CleanBean> {
    private MultipleOutputs<Text, CleanBean> multipleOutputs;
    private static IP2CCUtil ip2CCUtil;

    @Override
    protected void setup(Context context) {
        multipleOutputs = new MultipleOutputs<Text, CleanBean>(context);
        ip2CCUtil = new IP2CCUtil();
    }

    @Override
    protected void reduce(Text key, Iterable<CleanBean> Values, Context context) {

        for (CleanBean value : Values) {
            String[] line = value.getLine().split("\\s+");
            String ipvalue = "";
            String hit = "-1";
            String logInfo = "";
            String name = "";
            String UserAgent = "-";
            String doamin = "-";
            String countryCode = "-";
            String countryName = "-";
            String timeTaken ="0";
            String referer = "-";

            if (line[0].equals("swiftserve")) {
                //判断该行数据的长度是否满足需求
                if(line.length <12 && line.length>0) {
                    logger.info("错误的日志数据格式："+value.getLine());
                } else {
                    //调用ss清洗程序
                    String datetime = line[4];
                    ipvalue = line[1];
                    try {
                        //根据ip获取国家码和国家名称
                        String couStr = ip2CCUtil.getCountry(ipvalue);
                        if(couStr != null) {
                            countryCode = couStr.split(" ")[0];
                            countryName = couStr.split(" ")[1];
                        }

                        //开始处理日志格式
                        //首先处理时区问题，level3的时区为GMT+0；改为GMT+8；
                        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                        sdf.setTimeZone(TimeZone.getTimeZone("GMT+0000")); // 设置北京时区
                        datetime = DealDate.Swiftservedate(datetime);
                        try {
                            Date d = sdf.parse(datetime.toString());
                            Date date = new Date(d.getTime());
                            SimpleDateFormat newsdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                            //时区处理完毕
                            String newdatetime = newsdf.format(date);
                            String d1[] = newdatetime.split(" ");
                            String namedata[] = newdatetime.split(" ")[0].split("-");
                            name = namedata[0] + namedata[1] + namedata[2];
                            String[] datavalue = value.toString().split("\"");
                            //判断截取的数据是否符合要求
                            if(datavalue.length<12 && datavalue.length >0) {
                                logger.info("错误的日志数据格式："+value.getLine());
                            } else {
                                //处理cs(User-Agent)参数
                                UserAgent = datavalue[5];
                                //判断该行是否命中
                                String hitStr = datavalue[11];
                                if (hitStr.indexOf("HIT") != -1) {
                                    hit = "HIT";
                                } else {
                                    hit = "MISS";
                                }
                            }

                            String [] doStr = line[6].split("/");
                            if(doStr.length <3 && doStr.length>0) {
                                logger.info("错误的日志数据格式："+value.getLine());
                            } else {
                                doamin = line[6].split("/")[2];
                                logInfo = d1[0] + "\001" + d1[1] + "\001" + line[1] + "\001" + countryName + "\001" + countryCode + "\001" + doamin +
                                        "\001" + line[5].replace("\"", "") + "\001" + line[6].replace("\"", "") + "\001" + line[7] +
                                        "\001" + line[10] + "\001" + hit + "\001" + line[8] + "\001" + line[11].replace("\"", "") +
                                        "\001" + UserAgent;
                                multipleOutputs.write(key, new CleanBean(logInfo), name);
                            }
                        } catch (ParseException e) {
                            logger.info("错误的日志数据格式："+value.getLine());
                        }
                    } catch (Exception e) {
                        logger.info("错误的日志数据格式："+value.getLine());
                    }
                }

            } else if (line[0].equals("tata")) {
                //判断该行数据是否符合要求
                if(line.length >0 && line.length <12) {
                    logger.info("错误的日志数据格式："+value.getLine());
                } else {
                    //调用tata清洗程序
                    String datetime = line[4];
                    String timeZone = line[5].replace("]", "");
                    ipvalue = line[1];
                    try {
                        //根据ip获取国家码和国家名称
                        String couStr = ip2CCUtil.getCountry(ipvalue);
                        if(couStr != null) {
                             countryCode = couStr.split(" ")[0];
                             countryName = couStr.split(" ")[1];
                        }

                        //开始处理日志格式
                        //首先处理时区问题，level3的时区为GMT+0；改为GMT+8；
                        SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                        sdf.setTimeZone(TimeZone.getTimeZone("GMT" + timeZone)); // 设置北京时区
                        datetime = DealDate.Tatadate(datetime);

                        try {
                            Date d = sdf.parse(datetime.toString());
                            Date date = new Date(d.getTime());
                            SimpleDateFormat newsdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                            //时区处理完毕
                            String newdatetime = newsdf.format(date);
                            String d1[] = newdatetime.split(" ");
                            String namedata[] = newdatetime.split(" ")[0].split("-");
                            name = namedata[0] + namedata[1] + namedata[2];
                            String[] datavalue = value.toString().split("\"");
                            //判断截取的数据是否符合要求
                            if(datavalue.length >0 && datavalue.length <7) {
                                logger.info("错误的日志数据格式："+value.getLine());
                            } else {
                                //处理cs(User-Agent)参数
                                timeTaken = datavalue[6];
                                UserAgent = datavalue[5];
                                logInfo = d1[0] + "\001" + d1[1] + "\001" + line[1] + "\001" + countryName + "\001" + countryCode + "\001" + line[2] + "\001"
                                        + line[6].replace("\"", "") + "\001" + line[7] + "\001" + line[9] + "\001"
                                        + line[10] + "\001" + "-1" + "\001" + timeTaken.replaceAll(" ", "") + "\001"
                                        + line[11].replace("\"", "") + "\001" + UserAgent;
                                multipleOutputs.write(key, new CleanBean(logInfo), name);
                            }
                        } catch(ParseException e) {
                            logger.info("错误的日志数据格式："+value.getLine());
                        }
                    } catch (Exception e) {
                        logger.info("错误的日志数据格式："+value.getLine());
                    }
                }

            } else {
                //判断该行数据是否符合要求
                if(line.length >0 && line.length <9) {
                    logger.info("错误的日志数据格式："+value.getLine());
                } else {
                    //循环一调参数根据下标确定列
                    String datetime = line[1] + " " + line[2];
                    ipvalue = line[3];
                    //判断该行是否命中
                    String hitStr = line[line.length - 2];
                    String hitlast = hitStr.substring(hitStr.length() - 1, hitStr.length());
                    if (hitlast.equals("0") || hitlast.equals("3")) {
                        hit = "MISS";
                    } else if (hitlast.equals("1") || hitlast.equals("2")) {
                        hit = "HIT";
                    } else {
                        hit = "-1";
                    }
                    try {
                        //根据ip获取国家码和国家名称
                        String couStr = ip2CCUtil.getCountry(ipvalue);
                        if(couStr != null) {
                            countryCode = couStr.split(" ")[0];
                            countryName = couStr.split(" ")[1];
                        }

                        if (ipvalue.length() > 0) {
                            //开始处理日志格式
                            //首先处理时区问题，level3的时区为GMT+0；改为GMT+8;
                            SimpleDateFormat sdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                            sdf.setTimeZone(TimeZone.getTimeZone("GMT+0000")); // 设置北京时区
                            try {
                                Date d = sdf.parse(datetime.toString());
                                Date date = new Date(d.getTime());
                                SimpleDateFormat newsdf = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
                                //时区处理完毕
                                String newdatetime = newsdf.format(date);
                                String d1[] = newdatetime.split(" ");

                                String namedata[] = newdatetime.split(" ")[0].split("-");
                                name = namedata[0] + namedata[1] + namedata[2];
                                //处理cs(User-Agent)参数
                                String[] datavalue = value.toString().split("\"");
                                if(datavalue.length >0 && datavalue.length <4) {
                                    logger.info("错误的日志数据格式："+value.getLine());
                                } else {
                                    UserAgent = datavalue[3];
                                    referer = datavalue[1];
                                    logInfo = d1[0] + "\001" + d1[1] + "\001" + line[3] + "\001" + countryName +
                                            "\001" + countryCode + "\001" + line[0] + "\001" + line[4] +
                                            "\001" + line[5] + "\001" + line[6] + "\001" + line[7] +
                                            "\001" + hit + "\001" + line[8] + "\001" + referer + "\001" + UserAgent;
                                    multipleOutputs.write(key, new CleanBean(logInfo), name);
                                }

                            } catch (ParseException e) {
                                logger.info("错误的日志数据格式："+value.getLine());
                            }
                        }
                    } catch (Exception e) {
                        logger.info("错误的日志数据格式："+value.getLine());
                    }
                }
            }
        }
    }

    @Override
    protected void cleanup(Context context) throws IOException, InterruptedException {
        multipleOutputs.close();
    }
}


@Override
public int run(String[] args) throws Exception {
    //读取配置文件
    Configuration conf = new Configuration();
    // 解决java.io.IOException: No FileSystem for scheme: hdfs异常
    conf.set("fs.hdfs.impl", org.apache.hadoop.hdfs.DistributedFileSystem.class.getName());
    //判断该天的数据是否已经清洗完成（即是否有_SUCCESS文件）
    Path fileSucc = new Path(args[1]+"/_SUCCESS");
    FileSystem fsSucc = fileSucc.getFileSystem(conf);
    if(fsSucc.exists(fileSucc)) {
        logger.info("该天的数据已经清洗完成！");
        return 0;
    } else {
        //判断目录是否存在，如果存在，则删除
        Path output = new Path(args[1]);
        FileSystem fs = output.getFileSystem(conf);
        if (fs.exists(output)) {
            fs.delete(output, true);
        }
    }

    //新建一个任务
    Job job = Job.getInstance(conf);

    //主类
    job.setJarByClass(CleanData.class);

    //输入路径
    FileInputFormat.addInputPath(job, new Path(args[0]));
    //输出路径
    FileOutputFormat.setOutputPath(job, new Path(args[1]));

    //Mapper
    job.setMapperClass(CleanMapper.class);
    //Reducer
    job.setReducerClass(CleanReducer.class);

    //key输出类型
    job.setOutputKeyClass(Text.class);
    //value输出类型
    job.setOutputValueClass(CleanBean.class);

    //去掉job设置outputFormatClass,改成通过LazyOutputFormat设置
    LazyOutputFormat.setOutputFormatClass(job, TextOutputFormat.class);

    return job.waitForCompletion(true) ? 0 : 1;
}


public static class CleanBean implements Writable {

    /*
     * 成员变量
     */
    private String line;  //每行的数据

    public CleanBean() {
        super();
    }

    public CleanBean(String line) {
        super();
        this.line = line;
    }

    public String getLine() {
        return line;
    }

    public void setLine(String line) {
        this.line = line;
    }

    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(line);
    }

    @Override
    public void readFields(DataInput in) throws IOException {
        line = in.readUTF();
    }

    @Override
    public String toString() {
        return new String(this.line);
    }
}

public static void main(String[] args) throws Exception {
    // 从配置文件中读取属性
    /**  服务器运行报：java.lang.NullPointerException
     PropertyReader reader = new PropertyReader("config.properties");
     String hdfsUrl = reader.getProperty("hdfs.default.url") ;
     String originPath = reader.getProperty("hdfs.default.origin.path") ;
     String cleanedPath = reader.getProperty("hdfs.default.cleaned.path");
     logger.debug("HDFS URL: {}",hdfsUrl);
     **/
    PropsUtil r = new PropsUtil("config.properties");
    String hdfsUrl = r.getString("hdfs.default.url");
    String originPath = r.getString("hdfs.default.origin.path");
    String cleanedPath = r.getString("hdfs.default.cleaned.path");

    //数据输入路径和输出路径
    String[] ag = {hdfsUrl + originPath + "/tata/" + args[0],
            hdfsUrl + cleanedPath + "/" + args[0]};

    int ec = ToolRunner.run(new Configuration(), new CleanData(), ag);
    System.exit(ec);
}

}

报告相同问题？

关注问题

MapReduce分析气象数据 java linux mapreduce
2022-06-19 22:26

回答 3 已采纳是只能使用mapreduce写吗，你把他整成sql，操作不是简单多了吗
在mapreduce编程实践中 name node处于安全模式 hadoop mapreduce
2022-11-01 09:17

回答 2 已采纳你试试hadoop dfsadmin -safemode leave或者你cd到hadoop/bin目录下，再执行命令
mapreduce新手求助问题，麻烦了 mapreduce
2017-05-24 09:27

回答 1 已采纳 1、map和reduce是分开的。比如每个学生所有科目的平均值，使用学生id为key，学生的分数作为value。在shuffle的过程中，相关学生id的value将合并在一起。返回给reduce。
大数据离线处理数据项目（二）数据清洗 ETL 编写MapReduce程序实现数据清洗
2021-12-03 15:36

'一生所爱的博客数据清洗 ETL 编写MapReduce程序实现数据清洗简介：实现的功能：对采集到的日志数据进行清洗，过滤无效数据、静态资源方法：编写MapReduce进行处理涉及到的类： 1）实体类Bean 描述日志数据的各个字段：如...
MapReduce将本地数据读入数据库报错 java.io.IOException big data java 其他开发语言
2021-03-31 08:37

回答 3 已采纳你要贴在代码框里，大家可以拷贝了运行，不然，纯人肉看，成本太高，大家就不愿意回答你的问题，你可能就要花很长的时间自己研究。
debug调试无法运行 Method threw.Cannot evaluate org.apache.hadoop.mapreduce.Job.toString() hadoop mapreduce 大数据有问必答
2023-04-05 23:37

回答 2 已采纳参考这篇文章试下https://blog.csdn.net/weixin_37895026/article/details/125660368
MapReduce统计单词出现次数，但是结果显示出多个 hdfs mapreduce
2022-04-21 17:49

回答 2 已采纳发一下你的mr程序可能reduce阶段的代码写错了吧
大数据技术之MapReduce
2023-02-08 06:44

wespten的博客 MapReduce是一个分布式运算程序的编程框架，是基于Hadoop的...在运行MR程序时，I/O操作、网络数据传输、Shuffle 和Merge要花大量的时间，尤其是数据规模很大和工作负载密集的情况下，因此，使用数据压缩显得非常重要。
在eclipse运行hadoop mapreduce例子报错 eclipse hadoop mapreduce
2017-09-06 01:16

回答 1 已采纳 http://blog.csdn.net/jack85986370/article/details/51902871
MapReduce，hadoop,eclipse hadoop java mapreduce
2022-12-15 20:47

回答 1 已采纳这些都是日志，包含处理进度、数据处理的一些统计信息，比如数据条数、所占空间大小等。
MapReduce Unable to initialize MapOutputCollector hadoop java mapreduce 有问必答
2022-04-22 14:34

回答 2 已采纳 public class StudentScore implements Serializable, WritableComparable<StudentScore> 类要实现Seria
基于Hadoop的MapReduce网站日志大数据分析（含预处理MapReduce程序、hdfs、flume、sqoop、hive、mysql、hbase组件、echarts）
2023-07-04 08:00

王小王-123的博客通过使用Hive进行大数据分析，我们能够对网站的PV、独立IP、用户注册数和跳出用户数等重要指标进行统计分析。最后，我们使用Sqoop将分析结果导出到MySQL数据库，并使用Python搭建可视化界面，以方便用户对分析结果...
Mapreduce集思功能实现，想不出啦实在 java mapreduce
2023-03-11 23:23

回答 2 已采纳链接：https://pan.baidu.com/s/1Tg-5DX0uuyM1m_mqLOV_JQ提取码：7alt
大数据实训笔记4：mapreduce
2022-07-03 20:02

Roslin_v的博客介绍了mapreduce的核心思想，序列化与反序列化，并给出多个实际应用案例，包括过滤、排序、分区、组合、Join、在MySql中的读写操作。
大数据技术之Hadoop（MapReduce）
2023-03-01 01:00

骚戴的博客 大数据技术之Hadoop（MapReduce）
没有解决我的问题, 去提问

悬赏问题

¥15 python变量和列表之间的相互影响
¥15 写一个方法checkPerson，入参实体类Person，出参布尔值
¥15 我想咨询一下路面纹理三维点云数据处理的一些问题，上传的坐标文件里是怎么对无序点进行编号的，以及xy坐标在处理的时候是进行整体模型分片处理的吗
¥15 CSAPPattacklab
¥15 一直显示正在等待HID—ISP
¥15 Python turtle 画图
¥15 关于大棚监测的pcb板设计
¥15 stm32开发clion时遇到的编译问题
¥15 lna设计源简并电感型共源放大器
¥15 如何用Labview在myRIO上做LCD显示？(语言-开发语言)

码龄粉丝数原力等级 --

在MapReduce处理数据时根据ip获取国家名称和国家码等信息

2条回答默认最新

码龄粉丝数原力等级 --

悬赏问题

在MapReduce处理数据时根据ip获取国家名称和国家码等信息

2条回答 默认 最新

悬赏问题

2条回答默认最新