Uploaded image for project: 'Hadoop Common'
  1. Hadoop Common
  2. HADOOP-3522

ValuesIterator.next() doesn't return a new object, thus failing many equals() tests.

    XMLWordPrintableJSON

Details

    • Bug
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • 0.17.0
    • 0.17.1
    • None
    • None
    • Reviewed

    Description

      Problem

      The ValuesIterator.next() doesn't return a new object representing the next element.

      It is expected that this is the case. Example from java.lang.String source code:

      public boolean equals(Object anObject) {
          if (this == anObject) {
              return true;
          }
          if (anObject instanceof String) {
              // custom code here
          }
      }
      

      Changing the private fields of the object and returning it from next() will make this object fail the equals() test, because this==anObject will always be true.

      What is affected

      The reduce() method presents (a subclass of) ValuesIterator to the user. So every user-defined class can be affected.

      Manifestations of this bug in 0.17.0:

      • The Text class checks for equality similarly to String.equals(), which is shown above.
      • The contrib/data_join breaks because it stores tags in a Map. The behavior of the next() method makes Object.equals() be true for all tags.

      JUnit test

      Patch against hadoop-0.17.0:

      Index: src/test/org/apache/hadoop/mapred/TestReduceTask.java
      ===================================================================
      --- src/test/org/apache/hadoop/mapred/TestReduceTask.java       (revision 665935)
      +++ src/test/org/apache/hadoop/mapred/TestReduceTask.java       (working copy)
      @@ -122,4 +122,52 @@
             runValueIterator(tmpDir, testCase, conf);
           }
         }
      +
      +  public void runValueIterator2(Path tmpDir, Pair[] vals, 
      +                               Configuration conf) throws IOException {
      +    FileSystem fs = tmpDir.getFileSystem(conf);
      +    Path path = new Path(tmpDir, "data.in");
      +    SequenceFile.Writer writer = new SequenceFile.Writer(fs, conf, path,
      +                                                         Text.class, 
      +                                                         Text.class);
      +    for(Pair p: vals) {
      +      writer.append(new Text(p.key), new Text(p.value));
      +    }
      +    writer.close();
      +    SequenceFile.Sorter sorter = new SequenceFile.Sorter(fs, Text.class, 
      +                                                         Text.class, conf);
      +    SequenceFile.Sorter.RawKeyValueIterator rawItr = 
      +      sorter.merge(new Path[]{path}, false, tmpDir);
      +    @SuppressWarnings("unchecked") // WritableComparators are not generic
      +    ReduceTask.ValuesIterator<Text,Text> valItr = 
      +      new ReduceTask.ValuesIterator<Text,Text>(rawItr,
      +          WritableComparator.get(Text.class), Text.class, Text.class,
      +          conf, new NullProgress());
      +    while (valItr.more()) {
      +      Object key = valItr.getKey();
      +      String keyString = key.toString();
      +      // make sure it matches!
      +      assertEquals(vals[0].key, keyString);
      +      // must have at least 1 value!
      +      assertTrue(valItr.hasNext());
      +      Text value1 = valItr.next();
      +      // must have at least 2 values!
      +      assertTrue(valItr.hasNext());
      +      Text value2 = valItr.next();
      +      // do test
      +      assertNotSame(value1, value2);
      +      assertTrue(!value1.equals(value2));
      +      assertTrue(!value2.equals(value1));
      +      // make sure the key hasn't changed under the hood
      +      assertEquals(keyString, valItr.getKey().toString());
      +      valItr.nextKey();
      +    }
      +  }
      +
      +  public void testValueIterator2() throws Exception {
      +    Path tmpDir = new Path("build/test/test.reduce.task");
      +    Configuration conf = new Configuration();
      +    Pair[] test = new Pair[]{ new Pair("k1", "v1"), new Pair("k1", "v2") };
      +    runValueIterator2(tmpDir, test, conf);
      +  }
       }
      

      A program with a bug

      I would imagine that a programmer would be really confused when everything is equal in the example below, for any text input:

      package test;
      
      import java.io.*;
      import java.util.*;
      
      import org.apache.hadoop.fs.Path;
      import org.apache.hadoop.conf.*;
      import org.apache.hadoop.io.*;
      import org.apache.hadoop.util.*;
      import org.apache.hadoop.mapred.*;
      import org.apache.hadoop.mapred.lib.*;
      
      public class TestNewReducer extends Configured implements Tool {
          public int run(String[] args) throws Exception {
              if (args.length < 2) {
                  System.out.println("TestNewReducer <inDir> <outDir>");
                  return -1;
              }
      
              JobConf conf = new JobConf(getConf(), TestNewReducer.class);
              conf.setJobName("test-newreducer");
      
              conf.setInputFormat(TextInputFormat.class);
      
              conf.setMapperClass(Map.class);
              conf.setReducerClass(Reduce.class);
      
              conf.setOutputFormat(TextOutputFormat.class);
              conf.setOutputKeyClass(LongWritable.class);
              conf.setOutputValueClass(Text.class);
      
              FileInputFormat.setInputPaths(conf, new Path(args[0]));
              FileOutputFormat.setOutputPath(conf, new Path(args[1]));
      
              JobClient.runJob(conf);
              return 0;
          }
      
          public static void main(String[] args) throws Exception {
              int res = ToolRunner.run(new Configuration(), new TestNewReducer(), args);
              System.exit(res);
          }
      
          public static class Map extends MapReduceBase implements Mapper <LongWritable, Text, LongWritable, Text> {
              private final static LongWritable nill = new LongWritable(1);
      
              public void map (LongWritable key, Text value,
                      OutputCollector<LongWritable, Text> output,
                      Reporter reporter) throws IOException {
                  output.collect(nill, value);
              }
          }
      
          public static class Reduce extends MapReduceBase implements Reducer<LongWritable, Text, LongWritable, Text> {
              private final static LongWritable nill = new LongWritable(1);
      
              public void reduce(LongWritable key, Iterator<Text> values,
                      OutputCollector<LongWritable, Text> output,
                      Reporter reporter) throws IOException {
                  Text t1 = null;
                  if (values.hasNext())
                      t1 = values.next(); // t1 is the first element in values
      
                  while(values.hasNext()) {
                      // is this element equal to t1?
                      Text t = values.next();
                      output.collect(nill, t.equals(t1) ? new Text("Equal") : new Text("Not equal"));
                  }
              }
          }
      }
      

      Possible fixes

      1. Return a new object each time for next(). This might have significant overhead.
      2. Return a new object for unknown object types and reuse the same object for known types (like Text). Remove "if (this==anObject) return true;" check from all equals() methods for known objects.
      3. Document clearly that all user-defined classes must implement an equals() method, which doesn't do the "if (this==anObject) return true;" check (ie. push the problem to the user).

      Attachments

        1. reduce-doc.patch
          0.9 kB
          Owen O'Malley
        2. reduce-doc-19.patch
          0.9 kB
          Owen O'Malley

        Activity

          People

            omalley Owen O'Malley
            sblanas Spyros Blanas
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: