运行的环境是:Ubuntu14.04+hadoop2.6.1
用的是virtualBox虚拟机,然后安装了一个master和三个slave节点
hadoop是可以成功启动的,没有任何问题
在Ubuntu安装了eclipse,用java写了word count的程序,源码如下:
package wordcount;
import java.io.IOException;
import java.util.StringTokenizer;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.io.IntWritable;
import org.apache.hadoop.io.LongWritable;
import org.apache.hadoop.io.Text;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.mapreduce.Mapper;
import org.apache.hadoop.mapreduce.Reducer;
import org.apache.hadoop.mapreduce.lib.input.FileInputFormat;
import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat;
/**
* @author
* @version 创建时间:2017年9月9日 上午8:50:51 类说明
*/
public class Wordcount {
public static class TokenizerMapper extends Mapper<LongWritable, Text, Text, IntWritable> {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
protected void map(LongWritable key, Text value, Mapper<LongWritable, Text, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
StringTokenizer line = new StringTokenizer(value.toString());
while (line.hasMoreTokens()) {
word.set(line.nextToken());
context.write(word, one);
}
}
}
public static class IntSumReducer extends Reducer<Text, IntWritable, Text, IntWritable> {
private IntWritable result = new IntWritable();
protected void reduce(Text key, Iterable<IntWritable> values,
Reducer<Text, IntWritable, Text, IntWritable>.Context context)
throws IOException, InterruptedException {
int sum = 0;
for (IntWritable obj : values) {
sum += obj.get();
}
result.set(sum);
context.write(key, result);
}
}
public static void main(String[] args) throws Exception {
Configuration conf = new Configuration();
Job job = Job.getInstance(conf, "word count");
job.setJarByClass(Wordcount.class);
job.setMapperClass(TokenizerMapper.class);
job.setCombinerClass(IntSumReducer.class);
job.setReducerClass(IntSumReducer.class);
job.setOutputKeyClass(Text.class);
job.setOutputValueClass(IntWritable.class);
FileInputFormat.addInputPath(job, new Path("hdfs://master:9000/user/hduser/demo/test.txt"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://master:9000/user/hduser/demo/wordcount"));
//FileInputFormat.addInputPath(job, new Path(args[0]));
//FileOutputFormat.setOutputPath(job, new Path(args[1]));
System.exit(job.waitForCompletion(true) ? 0 : 1);
}
}
启动hadoop后,在eclipse中直接运行上面的程序,运行成功,生成了wordcount文件夹,里面有_SUCCESS文件,也有统计的结果文件
然后我想把程序打包成jar文件来运行,先把上面程序中的:
FileInputFormat.addInputPath(job, new Path("hdfs://master:9000/user/hduser/demo/test.txt"));
FileOutputFormat.setOutputPath(job, new Path("hdfs://master:9000/user/hduser/demo/wordcount"));
改成如下:
FileInputFormat.addInputPath(job, new Path(args[0]));
FileOutputFormat.setOutputPath(job, new Path(args[1]));
就是通过终端输入这两个参数
用eclipse的export打包成jar文件,后在终端输入:
hadoop jar wordcount.jar wordcount.Wordcount hdfs://master:9000/user/hduser/demo/test.txt hdfs://master:9000/user/hduser/demo/wordcount
运行就报错了,报错情况如下:
17/09/09 11:18:53 INFO client.RMProxy: Connecting to ResourceManager at master/192.168.56.100:8050
17/09/09 11:18:54 WARN mapreduce.JobResourceUploader: Hadoop command-line option parsing not performed. Implement the Tool interface and execute your application with ToolRunner to remedy this.
17/09/09 11:18:55 INFO input.FileInputFormat: Total input paths to process : 1
17/09/09 11:18:55 INFO mapreduce.JobSubmitter: number of splits:1
17/09/09 11:18:55 INFO mapreduce.JobSubmitter: Submitting tokens for job: job_1504926710828_0001
17/09/09 11:18:56 INFO impl.YarnClientImpl: Submitted application application_1504926710828_0001
17/09/09 11:18:56 INFO mapreduce.Job: The url to track the job: http://master:8088/proxy/application_1504926710828_0001/
17/09/09 11:18:56 INFO mapreduce.Job: Running job: job_1504926710828_0001
17/09/09 11:19:14 INFO mapreduce.Job: Job job_1504926710828_0001 running in uber mode : false
17/09/09 11:19:14 INFO mapreduce.Job: map 0% reduce 0%
17/09/09 11:19:14 INFO mapreduce.Job: Job job_1504926710828_0001 failed with state FAILED due to: Application application_1504926710828_0001 failed 2 times due to AM Container for appattempt_1504926710828_0001_000002 exited with exitCode: 1
For more detailed output, check application tracking page:http://master:8088/proxy/application_1504926710828_0001/Then, click on links to logs of each attempt.
Diagnostics: Exception from container-launch.
Container id: container_1504926710828_0001_02_000001
Exit code: 1
Stack trace: ExitCodeException exitCode=1:
at org.apache.hadoop.util.Shell.runCommand(Shell.java:538)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.yarn.server.nodemanager.DefaultContainerExecutor.launchContainer(DefaultContainerExecutor.java:211)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:302)
at org.apache.hadoop.yarn.server.nodemanager.containermanager.launcher.ContainerLaunch.call(ContainerLaunch.java:82)
at java.util.concurrent.FutureTask.run(FutureTask.java:262)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1152)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:622)
at java.lang.Thread.run(Thread.java:748)
Container exited with a non-zero exit code 1
Failing this attempt. Failing the application.
17/09/09 11:19:14 INFO mapreduce.Job: Counters: 0
去查了下日志文件,
2017-09-09 11:18:55,869 INFO org.apache.hadoop.hdfs.StateChange: DIR* completeFile: /tmp/hadoop-yarn/staging/hduser/.staging/job_1504926710828_0001/job.xml is closed by DFSClient_NONMAPREDUCE_-1306163227_1
2017-09-09 11:18:59,502 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2017-09-09 11:18:59,503 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 1 millisecond(s).
2017-09-09 11:19:12,241 INFO org.apache.hadoop.ipc.Server: IPC Server handler 2 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 192.168.56.102:53610 Call#7 Retry#0: java.io.FileNotFoundException: File does not exist: /tmp/hadoop-yarn/staging/hduser/.staging/job_1504926710828_0001/job_1504926710828_0001_1.jhist
2017-09-09 11:19:12,293 INFO org.apache.hadoop.ipc.Server: IPC Server handler 4 on 9000, call org.apache.hadoop.hdfs.protocol.ClientProtocol.getBlockLocations from 192.168.56.102:53610 Call#8 Retry#0: java.io.FileNotFoundException: File does not exist: /tmp/hadoop-yarn/staging/hduser/.staging/job_1504926710828_0001/job_1504926710828_0001_1.jhist
2017-09-09 11:19:29,502 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30000 milliseconds
2017-09-09 11:19:29,502 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2017-09-09 11:19:42,634 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 192.168.56.100
2017-09-09 11:19:42,634 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2017-09-09 11:19:42,634 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 29
2017-09-09 11:19:42,635 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 40 Total time for transactions(ms): 6 Number of transactions batched in Syncs: 0 Number of syncs: 27 SyncTimes(ms): 545
2017-09-09 11:19:42,704 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 40 Total time for transactions(ms): 6 Number of transactions batched in Syncs: 0 Number of syncs: 28 SyncTimes(ms): 613
2017-09-09 11:19:42,704 INFO org.apache.hadoop.hdfs.server.namenode.FileJournalManager: Finalizing edits file /usr/local/hadoop/hadoop_data/hdfs/namenode/current/edits_inprogress_0000000000000000029 -> /usr/local/hadoop/hadoop_data/hdfs/namenode/current/edits_0000000000000000029-0000000000000000068
2017-09-09 11:19:42,704 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Starting log segment at 69
2017-09-09 11:19:59,503 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2017-09-09 11:19:59,503 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2017-09-09 11:20:29,504 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Rescanning after 30001 milliseconds
2017-09-09 11:20:29,504 INFO org.apache.hadoop.hdfs.server.blockmanagement.CacheReplicationMonitor: Scanned 0 directive(s) and 0 block(s) in 0 millisecond(s).
2017-09-09 11:20:42,759 INFO org.apache.hadoop.hdfs.server.namenode.FSNamesystem: Roll Edit Log from 192.168.56.100
2017-09-09 11:20:42,759 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Rolling edit logs
2017-09-09 11:20:42,759 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Ending log segment 69
2017-09-09 11:20:42,759 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 2 SyncTimes(ms): 24
2017-09-09 11:20:42,791 INFO org.apache.hadoop.hdfs.server.namenode.FSEditLog: Number of transactions: 2 Total time for transactions(ms): 0 Number of transactions batched in Syncs: 0 Number of syncs: 3 SyncTimes(ms): 56
在这里面报了一个错误:
java.io.FileNotFoundException: File does not exist: /tmp/hadoop-yarn/staging/hduser/.staging/job_1504926710828_0001/job_1504926710828_0001_1.jhist
新手,不知道怎么办了