Hadoop Common
  1. Hadoop Common
  2. HADOOP-6925

BZip2Codec incorrectly implements read()

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 0.21.0, 0.22.0
    • Fix Version/s: 0.21.1, 0.22.0
    • Component/s: io
    • Labels:
      None
    • Hadoop Flags:
      Reviewed

      Description

      HADOOP-4012 added an implementation of read() in BZip2InputStream that doesn't work correctly when reading bytes > 0x80. This causes EOFExceptions when working with BZip2 compressed data inside of sequence files in some datasets.

        Issue Links

          Activity

          Todd Lipcon created issue -
          Hide
          Todd Lipcon added a comment -

          Patch fixes the read() implementation to correctly mask with 0xff before upcasting to int. Also augments the unit test to check the single-byte read() function - the new test fails before this patch.

          Show
          Todd Lipcon added a comment - Patch fixes the read() implementation to correctly mask with 0xff before upcasting to int. Also augments the unit test to check the single-byte read() function - the new test fails before this patch.
          Todd Lipcon made changes -
          Field Original Value New Value
          Attachment hadoop-6925.txt [ 12452978 ]
          Todd Lipcon made changes -
          Status Open [ 1 ] Patch Available [ 10002 ]
          Hide
          Greg Roelofs added a comment -

          This may fix HADOOP-6852, as well. (The unit test in question is available as a standalone patch in MAPREDUCE-1927, which is awaiting review.) If so, I can update the unit test to uncomment the relevant bzip2 test.

          I'll make a note to check this soon...

          Show
          Greg Roelofs added a comment - This may fix HADOOP-6852 , as well. (The unit test in question is available as a standalone patch in MAPREDUCE-1927 , which is awaiting review.) If so, I can update the unit test to uncomment the relevant bzip2 test. I'll make a note to check this soon...
          Hide
          Eli Collins added a comment -

          +1

          Looks good. Nice find.

          Show
          Eli Collins added a comment - +1 Looks good. Nice find.
          Eli Collins made changes -
          Hadoop Flags [Reviewed]
          Hide
          Chris Douglas added a comment -

          +1

          sigh This was introduced between v9 of the patch and v11. Good catch

          Show
          Chris Douglas added a comment - +1 sigh This was introduced between v9 of the patch and v11. Good catch
          Hide
          Eli Collins added a comment -

          Committed to trunk and branch-0.20. Thanks Todd.

          Show
          Eli Collins added a comment - Committed to trunk and branch-0.20. Thanks Todd.
          Eli Collins made changes -
          Status Patch Available [ 10002 ] Resolved [ 5 ]
          Fix Version/s 0.21.1 [ 12315270 ]
          Fix Version/s 0.22.0 [ 12314296 ]
          Resolution Fixed [ 1 ]
          Hide
          Eli Collins added a comment -

          Forgot to mention that I verified TestCodec passed on trunk and the merge to branch-0.21. Previous comment has a typo, meant branch-0.21 not branch-0.20.

          Show
          Eli Collins added a comment - Forgot to mention that I verified TestCodec passed on trunk and the merge to branch-0.21. Previous comment has a typo, meant branch-0.21 not branch-0.20.
          Todd Lipcon made changes -
          Link This issue is related to HADOOP-6852 [ HADOOP-6852 ]
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk-Commit #365 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/365/)
          HADOOP-6925. BZip2Codec incorrectly implements read(). Contributed by Todd Lipcon.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #365 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk-Commit/365/ ) HADOOP-6925 . BZip2Codec incorrectly implements read(). Contributed by Todd Lipcon.
          Hide
          Greg Roelofs added a comment -

          This may fix HADOOP-6852, as well.

          Nope, concatenation is still broken. Ah, well.

          Show
          Greg Roelofs added a comment - This may fix HADOOP-6852 , as well. Nope, concatenation is still broken. Ah, well.
          Hide
          Hudson added a comment -

          Integrated in Hadoop-Common-trunk #433 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/433/)
          HADOOP-6925. BZip2Codec incorrectly implements read(). Contributed by Todd Lipcon.

          Show
          Hudson added a comment - Integrated in Hadoop-Common-trunk #433 (See https://hudson.apache.org/hudson/job/Hadoop-Common-trunk/433/ ) HADOOP-6925 . BZip2Codec incorrectly implements read(). Contributed by Todd Lipcon.
          Konstantin Shvachko made changes -
          Status Resolved [ 5 ] Closed [ 6 ]

            People

            • Assignee:
              Todd Lipcon
              Reporter:
              Todd Lipcon
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development