Details
-
Bug
-
Status: Open
-
Major
-
Resolution: Unresolved
-
1.0.3
-
None
-
None
-
ubuntu
Description
Hi,
I am new to Hadoop concepts.
While practicing with one custom MapReduce program, I found the result is not as expected after executing the code on HDFS based file. Please note that when I execute the same program using Unix based file,getting expected result.
Below are the details of my code.
MapReduce in java
==================
import java.io.IOException;
import java.util.*;
import org.apache.hadoop.fs.Path;
import org.apache.hadoop.conf.*;
import org.apache.hadoop.io.*;
import org.apache.hadoop.mapred.*;
import org.apache.hadoop.mapreduce.Job;
import org.apache.hadoop.util.*;
public class WordCount1 {
public static class Map extends MapReduceBase implements Mapper {
private final static IntWritable one = new IntWritable(1);
private Text word = new Text();
public void map(LongWritable key, Text value, OutputCollector output, Reporter reporter) throws IOException {
String line = value.toString();
String tokenedZone=null;
StringTokenizer tokenizer = new StringTokenizer(line);
while (tokenizer.hasMoreTokens())
}
}
public static class Reduce extends MapReduceBase implements Reducer {
public void reduce(Text key, Iterator values, OutputCollector output, Reporter reporter) throws IOException {
int sum = 0;
int val = 0;
while (values.hasNext())
if(sum>1)
output.collect(key, new IntWritable(sum));
}
}
public static void main(String[] args) throws Exception
{ JobConf conf = new JobConf(); conf.setJarByClass(WordCount1.class); conf.setJobName("wordcount1"); conf.setOutputKeyClass(Text.class); conf.setOutputValueClass(IntWritable.class); conf.setMapperClass(Map.class); conf.setCombinerClass(Reduce.class); conf.setReducerClass(Reduce.class); conf.setInputFormat(TextInputFormat.class); conf.setOutputFormat(TextOutputFormat.class); Path inPath = new Path(args[0]); Path outPath = new Path(args[0]); FileInputFormat.setInputPaths(conf,inPath ); FileOutputFormat.setOutputPath(conf, outPath); JobClient.runJob(conf); }}
input File
===========
test my program
during test and my hadoop
your during
get program
hadoop generated output file on HDFS file system
=======================================
during 2
my 2
test 2
hadoop generated output file on local file system
=======================================
during 2
my 2
program 2
test 2
Please help me on this issue