Hadoop Common
  1. Hadoop Common
  2. HADOOP-6834

TFile.append compares initial key against null lastKey

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.20.1, 0.20.2
    • Fix Version/s: 0.22.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      The following code in TFile.KeyReigster.close:

      byte[] lastKey = lastKeyBufferOS.getBuffer();
      int lastLen = lastKeyBufferOS.size();
      if (tfileMeta.getComparator().compare(key, 0, len, lastKey, 0,
      lastLen) < 0)

      { throw new IOException("Keys are not added in sorted order"); }

      compares the initial key (passed in via TFile.Writer.append) against a technically NULL lastKey. lastKey is not initialized until after the first call to TFile.Writer.append. The underlying RawComparator interface used for comparisons does not stipulate the proper behavior when either length 1 or length 2 is zero. In the case of LongWritable, its WritableComparator implementation does an unsafe read on the passed in byte arrays b1 and b2. Since TFile pre-allocates the buffer used for storing lastKey, this passes a valid buffer with zero count to LongWritable's comparator, which ignores length and thus produces incorrect results.

        Activity

        Hide
        Hong Tang added a comment -

        In the first append, lastLen is 0, lastKey refers to a valid byte array whose bytes are all zero. So the behavior is determined. However, I do agree that it is possible for a customary comparator, an zero-length byte array may not be a legal key type. So it would be nice to do a first-key check.

        Ahad, would you be interested in providing a patch, or perhaps share with us a sample program with customized comparator that fails TFile.Writer.append?

        Show
        Hong Tang added a comment - In the first append, lastLen is 0, lastKey refers to a valid byte array whose bytes are all zero. So the behavior is determined. However, I do agree that it is possible for a customary comparator, an zero-length byte array may not be a legal key type. So it would be nice to do a first-key check. Ahad, would you be interested in providing a patch, or perhaps share with us a sample program with customized comparator that fails TFile.Writer.append?
        Hide
        Ahad Rana added a comment -

        Hi Hong,

        Yes I would be happy to provide a patch. As far as triggering the failure, if you append a LongWritable with a negative value as the first key value and use the LongWritable's default Comparator, this will trigger the bug. I would be happy to provide a code snippet if required. In the case of LongWritable's comparator, it interprets the uninitialized zero bytes of the lastKey as a zero (since it does not check length), and thus this causes the out-of-order insertion exception. Either LongWritable's comparator is implemented improperly, or the contract implied by RawComparator is not clear on what happens in the case where you compare a non-zero byte array to a zero byte array. Probably both LongWritable's RawComparator should be fixed, but either way, passing in a uninitialized lastKey is going to cause problems for someone else down the road, so probably best to fix this as well ?

        Show
        Ahad Rana added a comment - Hi Hong, Yes I would be happy to provide a patch. As far as triggering the failure, if you append a LongWritable with a negative value as the first key value and use the LongWritable's default Comparator, this will trigger the bug. I would be happy to provide a code snippet if required. In the case of LongWritable's comparator, it interprets the uninitialized zero bytes of the lastKey as a zero (since it does not check length), and thus this causes the out-of-order insertion exception. Either LongWritable's comparator is implemented improperly, or the contract implied by RawComparator is not clear on what happens in the case where you compare a non-zero byte array to a zero byte array. Probably both LongWritable's RawComparator should be fixed, but either way, passing in a uninitialized lastKey is going to cause problems for someone else down the road, so probably best to fix this as well ?
        Hide
        Hong Tang added a comment -

        Yes, I guess LongWritable could be more defensive by throwing an exception indicating illegal format. Either way, TFile needs to be fixed.

        I'd be happy to shepherd you with the patch process.

        Show
        Hong Tang added a comment - Yes, I guess LongWritable could be more defensive by throwing an exception indicating illegal format. Either way, TFile needs to be fixed. I'd be happy to shepherd you with the patch process.
        Hide
        Ahad Rana added a comment -

        Hi Hong,

        Sure, that would be great. What are the requirements these days ? Which version of hadoop should the patch target ?

        Ahad.

        Show
        Ahad Rana added a comment - Hi Hong, Sure, that would be great. What are the requirements these days ? Which version of hadoop should the patch target ? Ahad.
        Hide
        Hong Tang added a comment -

        You should always submit the patch for trunk. If you are using older releases, you may also backport the patch (attach here) and request it to be committed to older branches too.

        Show
        Hong Tang added a comment - You should always submit the patch for trunk. If you are using older releases, you may also backport the patch (attach here) and request it to be committed to older branches too.
        Hide
        Hong Tang added a comment -

        Straightforward patch with a new unit test.

        Show
        Hong Tang added a comment - Straightforward patch with a new unit test.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12449597/hadoop-6834-20100715.patch
        against trunk revision 964134.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        -1 javadoc. The javadoc tool appears to have generated 1 warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12449597/hadoop-6834-20100715.patch against trunk revision 964134. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. -1 javadoc. The javadoc tool appears to have generated 1 warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. +1 contrib tests. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/console This message is automatically generated.
        Hide
        Ahad Rana added a comment -

        I apologize Hong, this was on my todo list, but obviously I did not get to it in time. Your patch and test case look good. One question: Is a NULL key allowed in the TFile API ? It seems that KeyRegister's close does accepts a zero length key.

        Ahad

        Show
        Ahad Rana added a comment - I apologize Hong, this was on my todo list, but obviously I did not get to it in time. Your patch and test case look good. One question: Is a NULL key allowed in the TFile API ? It seems that KeyRegister's close does accepts a zero length key. Ahad
        Hide
        Hong Tang added a comment -
        Show
        Hong Tang added a comment - The javadoc warning reported by test-patch seems to be false positive. (Log: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch-h4.grid.sp2.yahoo.net/617/artifact/trunk/patchprocess/patchJavadocWarnings.txt ).
        Hide
        Hong Tang added a comment -

        @ahad, by "NULL" key, are you referring to zero-length key? It is certainly allowed by TFile (and you can even have multiple of them).

        Show
        Hong Tang added a comment - @ahad, by "NULL" key, are you referring to zero-length key? It is certainly allowed by TFile (and you can even have multiple of them).
        Hide
        Mahadev konar added a comment -

        +1 the patch looks good.

        Show
        Mahadev konar added a comment - +1 the patch looks good.
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #327 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/327/)
        HADOOP-6834. TFile.append compares initial key against null lastKey (hong tang via mahadev)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #327 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk-Commit/327/ ) HADOOP-6834 . TFile.append compares initial key against null lastKey (hong tang via mahadev)
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk #395 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/395/)
        HADOOP-6834. TFile.append compares initial key against null lastKey (hong tang via mahadev)

        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk #395 (See http://hudson.zones.apache.org/hudson/job/Hadoop-Common-trunk/395/ ) HADOOP-6834 . TFile.append compares initial key against null lastKey (hong tang via mahadev)

          People

          • Assignee:
            Hong Tang
            Reporter:
            Ahad Rana
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development