Eumenides-Z 2018-05-03 07:46 采纳率: 100%
浏览 812
已结题

在hadoop上用mahout的随机森林算法训练机器学习分类器

我搭了三台CentOS7虚拟机,安装好了hadoop-3.0.0和mahout-distirution-0.9。
准备用随机森林算法建立模型。第一步生成好了描述文件(/KDDdes.info);第二步建立好了模型(/user/hadoop/nsl-forest);但第三步就会报错。
这是HDFS:

[hadoop@hadoop1 ~]$ hadoop fs -ls /
Found 5 items
-rw-r--r--   1 hadoop supergroup    3365886 2018-05-02 23:58 /KDDTest+.arff
-rw-r--r--   1 hadoop supergroup   18742306 2018-05-02 23:40 /KDDTrain+.arff
-rw-r--r--   1 hadoop supergroup       2795 2018-05-02 23:42 /KDDdes.info
drwx------   - hadoop supergroup          0 2018-04-29 20:05 /tmp
drwxr-xr-x   - hadoop supergroup          0 2018-04-29 20:05 /user

这是第三步

[hadoop@hadoop1 ~]$ hadoop jar /opt/mahout-distribution-0.9/mahout-examples-0.9-job.jar org.apache.mahout.classifier.df.mapreduce.TestForest -i /KDDTest+.arff -ds /KDDdes.info -m /user/hadoop/nsl-forest -a -mr -o nsl-prediction
2018-05-03 00:01:28,469 INFO mapreduce.Classifier: Adding the dataset to the DistributedCache
2018-05-03 00:01:28,471 INFO mapreduce.Classifier: Adding the decision forest to the DistributedCache
2018-05-03 00:01:28,475 INFO mapreduce.Classifier: Configuring the job...
2018-05-03 00:01:28,479 INFO mapreduce.Classifier: Running the job...
2018-05-03 00:01:28,568 INFO client.RMProxy: Connecting to ResourceManager at hadoop1/192.168.80.100:8032
2018-05-03 00:01:28,960 INFO mapreduce.JobResourceUploader: Disabling Erasure Coding for path: /tmp/hadoop-yarn/staging/hadoop/.staging/job_1525056498669_0005
2018-05-03 00:01:29,462 INFO input.FileInputFormat: Total input files to process : 1
2018-05-03 00:01:29,968 INFO mapreduce.JobSubmitter: number of splits:1
2018-05-03 00:01:30,080 INFO Configuration.deprecation: yarn.resourcemanager.system-metrics-publisher.enabled is deprecated. Instead, use yarn.system-metrics-publisher.enabled
2018-05-03 00:01:30,685 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1525056498669_0005
2018-05-03 00:01:30,686 INFO mapreduce.JobSubmitter: Executing with tokens: []
2018-05-03 00:01:31,277 INFO conf.Configuration: resource-types.xml not found
2018-05-03 00:01:31,278 INFO resource.ResourceUtils: Unable to find 'resource-types.xml'.
2018-05-03 00:01:31,330 INFO impl.YarnClientImpl: Submitted application application_1525056498669_0005
2018-05-03 00:01:31,358 INFO mapreduce.Job: The url to track the job: http://hadoop1:8088/proxy/application_1525056498669_0005/
2018-05-03 00:01:31,359 INFO mapreduce.Job: Running job: job_1525056498669_0005
2018-05-03 00:01:38,470 INFO mapreduce.Job: Job job_1525056498669_0005 running in uber mode : false
2018-05-03 00:01:38,470 INFO mapreduce.Job:  map 0% reduce 0%
2018-05-03 00:01:42,604 INFO mapreduce.Job: Task Id : attempt_1525056498669_0005_m_000000_0, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 946827879
    at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
    at org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
    at org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
    at org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
    at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:209)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-05-03 00:01:46,641 INFO mapreduce.Job: Task Id : attempt_1525056498669_0005_m_000000_1, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 946827879
    at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
    at org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
    at org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
    at org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
    at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:209)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-05-03 00:01:51,687 INFO mapreduce.Job: Task Id : attempt_1525056498669_0005_m_000000_2, Status : FAILED
Error: java.lang.ArrayIndexOutOfBoundsException: 946827879
    at org.apache.mahout.classifier.df.node.Node.read(Node.java:58)
    at org.apache.mahout.classifier.df.DecisionForest.readFields(DecisionForest.java:197)
    at org.apache.mahout.classifier.df.DecisionForest.read(DecisionForest.java:203)
    at org.apache.mahout.classifier.df.DecisionForest.load(DecisionForest.java:225)
    at org.apache.mahout.classifier.df.mapreduce.Classifier$CMapper.setup(Classifier.java:209)
    at org.apache.hadoop.mapreduce.Mapper.run(Mapper.java:143)
    at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:794)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:342)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:174)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1962)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:168)

2018-05-03 00:01:55,745 INFO mapreduce.Job:  map 100% reduce 0%
2018-05-03 00:01:57,766 INFO mapreduce.Job: Job job_1525056498669_0005 failed with state FAILED due to: Task failed task_1525056498669_0005_m_000000
Job failed as tasks failed. failedMaps:1 failedReduces:0 killedMaps:0 killedReduces: 0

2018-05-03 00:01:57,844 INFO mapreduce.Job: Counters: 9
    Job Counters 
        Failed map tasks=4
        Launched map tasks=4
        Other local map tasks=3
        Data-local map tasks=1
        Total time spent by all maps in occupied slots (ms)=10027
        Total time spent by all reduces in occupied slots (ms)=0
        Total time spent by all map tasks (ms)=10027
        Total vcore-milliseconds taken by all map tasks=10027
        Total megabyte-milliseconds taken by all map tasks=10267648
Exception in thread "main" java.lang.IllegalStateException: Job failed!
    at org.apache.mahout.classifier.df.mapreduce.Classifier.run(Classifier.java:127)
    at org.apache.mahout.classifier.df.mapreduce.TestForest.mapreduce(TestForest.java:188)
    at org.apache.mahout.classifier.df.mapreduce.TestForest.testForest(TestForest.java:174)
    at org.apache.mahout.classifier.df.mapreduce.TestForest.run(TestForest.java:146)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:76)
    at org.apache.mahout.classifier.df.mapreduce.TestForest.main(TestForest.java:315)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:239)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:153)

这是什么问题,求助大神!!!

  • 写回答

2条回答

  • Eumenides-Z 2018-05-14 05:14
    关注

    这是hadoop和mahout版本不兼容,mahout0.9只能在hadoop1.x上运行

    评论

报告相同问题?

悬赏问题

  • ¥15 C++使用Gunplot
  • ¥15 这个电路是如何实现路灯控制器的,原理是什么,怎么求解灯亮起后熄灭的时间如图?
  • ¥15 matlab数字图像处理频率域滤波
  • ¥15 在abaqus做了二维正交切削模型,给刀具添加了超声振动条件后输出切削力为什么比普通切削增大这么多
  • ¥15 ELGamal和paillier计算效率谁快?
  • ¥15 file converter 转换格式失败 报错 Error marking filters as finished,如何解决?
  • ¥15 Arcgis相交分析无法绘制一个或多个图形
  • ¥15 关于#r语言#的问题:差异分析前数据准备,报错Error in data[, sampleName1] : subscript out of bounds请问怎么解决呀以下是全部代码:
  • ¥15 seatunnel-web使用SQL组件时候后台报错,无法找到表格
  • ¥15 fpga自动售货机数码管(相关搜索:数字时钟)