Details
-
Bug
-
Status: Resolved
-
Major
-
Resolution: Not A Problem
-
3.0.0-alpha1
-
None
-
None
-
hadoop2.7.3
Description
Failed to traverse Iterable values the second time in reduce() method
The following code is a reduce() method (of WordCount):
WordCount.java
public static class WcReducer extends Reducer<Text, IntWritable, Text, IntWritable> { @Override protected void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { // print some logs List<String> vals = new LinkedList<>(); for(IntWritable i : values) { vals.add(i.toString()); } System.out.println(String.format(">>>> reduce(%s, [%s])", key, String.join(", ", vals))); // sum of values int sum = 0; for(IntWritable i : values) { sum += i.get(); } System.out.println(String.format(">>>> reduced(%s, %s)", key, sum)); context.write(key, new IntWritable(sum)); } }
After running it, we got the result that all sums were zero!
After debugging, it was found that the second foreach-loop was not executed, and the root cause was the returned value of Iterable.iterator(), it returned the same instance in the two calls called by foreach-loop. In general, Iterable.iterator() should return a new instance in each call, such as ArrayList.iterator().
Attachments
Issue Links
- links to