Hadoop Common
  1. Hadoop Common
  2. HADOOP-5571

TupleWritable can return incorrect results if it contains more than 32 values

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.1
    • Fix Version/s: 0.20.0
    • Component/s: None
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      When attempting to do an outer join on 45 files with the CompositeInputFormat, I've been encountering unexpected results in the TupleWritable returned by the record reader. On closer inspection, it seems to be because TupleWritable.setWritten(int) is incorrectly setting some tuple positions as written, i.e when you set setWritten(42), it also sets position 10.

      The following Junit test demonstrates the problem:

        public void testWideTuple() throws Exception {
          Text emptyText = new Text("Should be empty");
          Writable[] values = new Writable[64];
          Arrays.fill(values,emptyText);
          values[42] = new Text("Number 42");
                                           
          TupleWritable tuple = new TupleWritable(values);
          tuple.setWritten(42);
          
          for (int pos=0; pos<tuple.size();pos++) {
            boolean has = tuple.has(pos);
            if (pos == 42) {
              assertTrue(has);
            }
            else {
              assertFalse("Tuple position is incorrectly labelled as set: " + pos, has);
            }
          }
      }
      

      Similarly, TupleWritable.setWritten(9) also causes TupleWritable.has(41) to incorrectly return true.

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #796 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/796/ )
        Hide
        Chris Douglas added a comment -

        I committed this. Thanks Jingkei

        Show
        Chris Douglas added a comment - I committed this. Thanks Jingkei
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12403606/HADOOP-5571-1.patch
        against trunk revision 758593.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 3 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 Eclipse classpath. The patch retains Eclipse classpath integrity.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/142/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/142/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/142/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12403606/HADOOP-5571-1.patch against trunk revision 758593. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 3 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 Eclipse classpath. The patch retains Eclipse classpath integrity. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/142/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/142/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/142/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-vesta.apache.org/142/console This message is automatically generated.
        Hide
        Chris Douglas added a comment -

        Sorry, I don't know what I was thinking about the unit test. +1 on the patch

        I was also thinking of raising a separate JIRA on replacing the written field in TupleWritable with a java.util.BitSet so that you can do joins over 64 datasets - do you have an opinion on this?

        It might motivate some long-deferred work on memory consumption as well, but I think that's a good idea.

        Show
        Chris Douglas added a comment - Sorry, I don't know what I was thinking about the unit test. +1 on the patch I was also thinking of raising a separate JIRA on replacing the written field in TupleWritable with a java.util.BitSet so that you can do joins over 64 datasets - do you have an opinion on this? It might motivate some long-deferred work on memory consumption as well, but I think that's a good idea.
        Hide
        Jingkei Ly added a comment -

        I was also thinking of raising a separate JIRA on replacing the written field in TupleWritable with a java.util.BitSet so that you can do joins over 64 datasets - do you have an opinion on this?

        Show
        Jingkei Ly added a comment - I was also thinking of raising a separate JIRA on replacing the written field in TupleWritable with a java.util.BitSet so that you can do joins over 64 datasets - do you have an opinion on this?
        Hide
        Jingkei Ly added a comment -

        There should be a unit test added to TestTupleWritable as part of the original patch.

        Show
        Jingkei Ly added a comment - There should be a unit test added to TestTupleWritable as part of the original patch.
        Hide
        Chris Douglas added a comment -

        Would it be possible to add a unit test for this?

        Show
        Chris Douglas added a comment - Would it be possible to add a unit test for this?
        Hide
        Jingkei Ly added a comment -

        I think the problem is that some of the bit-shift operations in TupleWritable with the Long field, written, are done with Integers. I've attached a patch which I think fixes this problem and unit tests to test it.

        Show
        Jingkei Ly added a comment - I think the problem is that some of the bit-shift operations in TupleWritable with the Long field, written, are done with Integers. I've attached a patch which I think fixes this problem and unit tests to test it.

          People

          • Assignee:
            Jingkei Ly
            Reporter:
            Jingkei Ly
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development