Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-3993

Graceful handling of codec errors during decompression

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.23.1, 1.0.2
    • Fix Version/s: 1.2.0, 2.0.2-alpha
    • Component/s: mrv1, mrv2
    • Labels:
      None
    • Target Version/s:
    • Hadoop Flags:
      Reviewed

      Description

      When using a compression codec for intermediate compression, some cases of corrupt data can cause the codec to throw exceptions other than IOException (eg java.lang.InternalError). This will currently cause the whole reduce task to fail, instead of simply treating it like another case of a failed fetch.

      1. MR-3993_branch1.patch
        9 kB
        Karthik Kambatla
      2. MR-3993_trunk.patch
        6 kB
        Karthik Kambatla
      3. MR-3993_branch1.patch
        5 kB
        Karthik Kambatla
      4. MR-3993_trunk.patch
        4 kB
        Karthik Kambatla

        Activity

        Hide
        Matt Foley added a comment -

        No patches available in 1.0.3 timeframe. Please consider continuing contribution in 1.1 context. Thanks.

        Show
        Matt Foley added a comment - No patches available in 1.0.3 timeframe. Please consider continuing contribution in 1.1 context. Thanks.
        Hide
        Karthik Kambatla added a comment -

        Uploading a preliminary patch for MR1. Need to add tests to ensure correct behavior.

        Any pointers on what might be the best way to force reading corrupt data in tests?

        Show
        Karthik Kambatla added a comment - Uploading a preliminary patch for MR1. Need to add tests to ensure correct behavior. Any pointers on what might be the best way to force reading corrupt data in tests?
        Hide
        Karthik Kambatla added a comment -

        Updated patch for MR1.

        • With the codec being pluggable, we need to catch any possible errors/exceptions thrown by it.
        • Now, catching a Throwable as opposed to IOEx and RTEx.
        • Testing yet to be done.
        Show
        Karthik Kambatla added a comment - Updated patch for MR1. With the codec being pluggable, we need to catch any possible errors/exceptions thrown by it. Now, catching a Throwable as opposed to IOEx and RTEx. Testing yet to be done.
        Hide
        Karthik Kambatla added a comment -

        The updated patch with following changes:

        • Added CompressionUtil for utility methods
        • Implemented CompressionUtil.wrappedRead()
        • Reduce (in-memory shuffle, and on-disk merge) use the wrapper instead of direct read
        • Test for the wrapper method
        Show
        Karthik Kambatla added a comment - The updated patch with following changes: Added CompressionUtil for utility methods Implemented CompressionUtil.wrappedRead() Reduce (in-memory shuffle, and on-disk merge) use the wrapper instead of direct read Test for the wrapper method
        Hide
        Todd Lipcon added a comment -
        • Need license header on CompressionUtil.java and its unit test
        • You can use GenericTestUtils.assertExceptionContains for the test assertion
        • A few style nits:
          • remove the blank lines between @param entries
          • add a blank line before the javadoc for wrappedRead
          • we generally don't bother making constants like READ_ERROR for exception text - not a big deal, just seemed odd to me.
          • The formatting in the wrappedRead javadoc is a little strange. I would reformat something like:
            /**
             * Utility wrapper for reading from a {@link InputStream}. If the underlying
             * stream throws an unchecked exception, it is re-thrown as an IOException.
             */
            

            (i.e use full sentences rather than bullet points for easier readability)

          • The following wrapping is weird – move - bytesRead up a line:
            +            n = CompressionUtil.wrappedRead(input, shuffleData, bytesRead,
            +                shuffleData.length
            +                - bytesRead);
            
          • Spurious change for the "Failed to discard map-output" wrapping.
        • CompressionUtil can be abstract or have a private constructor, since it's just static methods

        Otherwise looks good.

        Show
        Todd Lipcon added a comment - Need license header on CompressionUtil.java and its unit test You can use GenericTestUtils.assertExceptionContains for the test assertion A few style nits: remove the blank lines between @param entries add a blank line before the javadoc for wrappedRead we generally don't bother making constants like READ_ERROR for exception text - not a big deal, just seemed odd to me. The formatting in the wrappedRead javadoc is a little strange. I would reformat something like: /** * Utility wrapper for reading from a {@link InputStream}. If the underlying * stream throws an unchecked exception, it is re-thrown as an IOException. */ (i.e use full sentences rather than bullet points for easier readability) The following wrapping is weird – move - bytesRead up a line: + n = CompressionUtil.wrappedRead(input, shuffleData, bytesRead, + shuffleData.length + - bytesRead); Spurious change for the "Failed to discard map-output" wrapping. CompressionUtil can be abstract or have a private constructor, since it's just static methods Otherwise looks good.
        Hide
        Karthik Kambatla added a comment -

        Thanks a lot for your feedback, Todd. I noticed we have IOUtils/TestIOUtils. So, I intend to move CompressionUtil.wrappedRead() to IOUtils.wrappedReadForCompressedData(). Will make other changes as suggested.

        Show
        Karthik Kambatla added a comment - Thanks a lot for your feedback, Todd. I noticed we have IOUtils/TestIOUtils. So, I intend to move CompressionUtil.wrappedRead() to IOUtils.wrappedReadForCompressedData(). Will make other changes as suggested.
        Hide
        Karthik Kambatla added a comment -

        Patches with Todd-suggested changes.

        Show
        Karthik Kambatla added a comment - Patches with Todd-suggested changes.
        Hide
        Karthik Kambatla added a comment -

        I used the GenericTestUtils in the trunk version, but couldn't find the same in branch-1.

        Show
        Karthik Kambatla added a comment - I used the GenericTestUtils in the trunk version, but couldn't find the same in branch-1.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12535472/MR-3993_trunk.patch
        against trunk revision .

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 1 new or modified test files.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 eclipse:eclipse. The patch built with eclipse:eclipse.

        +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core:

        org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays
        org.apache.hadoop.io.file.tfile.TestTFileByteArrays

        +1 contrib tests. The patch passed contrib unit tests.

        Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2549//testReport/
        Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2549//console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12535472/MR-3993_trunk.patch against trunk revision . +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 1 new or modified test files. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 javadoc. The javadoc tool did not generate any warning messages. +1 eclipse:eclipse. The patch built with eclipse:eclipse. +1 findbugs. The patch does not introduce any new Findbugs (version 1.3.9) warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. -1 core tests. The patch failed these unit tests in hadoop-common-project/hadoop-common hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core: org.apache.hadoop.io.file.tfile.TestTFileJClassComparatorByteArrays org.apache.hadoop.io.file.tfile.TestTFileByteArrays +1 contrib tests. The patch passed contrib unit tests. Test results: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2549//testReport/ Console output: https://builds.apache.org/job/PreCommit-MAPREDUCE-Build/2549//console This message is automatically generated.
        Hide
        Ahmed Radwan added a comment -

        Looks good, thanks Karthik!

        Some minor comments:

        • The javadoc: "If the underlying stream throws an unchecked exception, it is re-thrown as an IOException." is inaccurate as the implementation catches any Throwable (not just unchecked exceptions). There is also a trailing white space in the javadoc.
        • Minor formatting for this same method: move "throws IOException {" in this signature to the line above.
        • The patch unnecessarily shuffles some imports, reformats lines, and creates additional diffs, see for example the diffs in ReduceTask.java.
        Show
        Ahmed Radwan added a comment - Looks good, thanks Karthik! Some minor comments: The javadoc: "If the underlying stream throws an unchecked exception, it is re-thrown as an IOException." is inaccurate as the implementation catches any Throwable (not just unchecked exceptions). There is also a trailing white space in the javadoc. Minor formatting for this same method: move "throws IOException {" in this signature to the line above. The patch unnecessarily shuffles some imports, reformats lines, and creates additional diffs, see for example the diffs in ReduceTask.java.
        Hide
        Karthik Kambatla added a comment -

        Uploading patches addressing Ahmed's comments -

        • Changed javadoc comment
        • Removed all extraneous changes (formatting)
        • Tested TestIOUtils and others that Jenkins reported.
        Show
        Karthik Kambatla added a comment - Uploading patches addressing Ahmed's comments - Changed javadoc comment Removed all extraneous changes (formatting) Tested TestIOUtils and others that Jenkins reported.
        Hide
        Alejandro Abdelnur added a comment -

        +1

        Show
        Alejandro Abdelnur added a comment - +1
        Hide
        Alejandro Abdelnur added a comment -

        Thanks Karthik. Committed to trunk, branch-1 & branch-2

        Show
        Alejandro Abdelnur added a comment - Thanks Karthik. Committed to trunk, branch-1 & branch-2
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2504 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2504/)
        MAPREDUCE-3993. Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2504 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2504/ ) MAPREDUCE-3993 . Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2436 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2436/)
        MAPREDUCE-3993. Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2436 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2436/ ) MAPREDUCE-3993 . Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk-Commit #2454 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2454/)
        MAPREDUCE-3993. Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345)

        Result = FAILURE
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk-Commit #2454 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk-Commit/2454/ ) MAPREDUCE-3993 . Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk #1099 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1099/)
        MAPREDUCE-3993. Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345)

        Result = FAILURE
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk #1099 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk/1099/ ) MAPREDUCE-3993 . Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345) Result = FAILURE tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Mapreduce-trunk #1132 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1132/)
        MAPREDUCE-3993. Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Show
        Hudson added a comment - Integrated in Hadoop-Mapreduce-trunk #1132 (See https://builds.apache.org/job/Hadoop-Mapreduce-trunk/1132/ ) MAPREDUCE-3993 . Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Hdfs-trunk-Commit #2508 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2508/)
        MAPREDUCE-3993. Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Show
        Hudson added a comment - Integrated in Hadoop-Hdfs-trunk-Commit #2508 (See https://builds.apache.org/job/Hadoop-Hdfs-trunk-Commit/2508/ ) MAPREDUCE-3993 . Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Hide
        Hudson added a comment -

        Integrated in Hadoop-Common-trunk-Commit #2441 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2441/)
        MAPREDUCE-3993. Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345)

        Result = SUCCESS
        tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345
        Files :

        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java
        • /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java
        • /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt
        • /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Show
        Hudson added a comment - Integrated in Hadoop-Common-trunk-Commit #2441 (See https://builds.apache.org/job/Hadoop-Common-trunk-Commit/2441/ ) MAPREDUCE-3993 . Graceful handling of codec errors during decompression (kkambatl via tucu) (Revision 1359345) Result = SUCCESS tucu : http://svn.apache.org/viewcvs.cgi/?root=Apache-SVN&view=rev&rev=1359345 Files : /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/main/java/org/apache/hadoop/io/IOUtils.java /hadoop/common/trunk/hadoop-common-project/hadoop-common/src/test/java/org/apache/hadoop/io/TestIOUtils.java /hadoop/common/trunk/hadoop-mapreduce-project/CHANGES.txt /hadoop/common/trunk/hadoop-mapreduce-project/hadoop-mapreduce-client/hadoop-mapreduce-client-core/src/main/java/org/apache/hadoop/mapred/IFile.java
        Hide
        Matt Foley added a comment -

        corrected Target Versions to match Fix Versions.

        Show
        Matt Foley added a comment - corrected Target Versions to match Fix Versions.

          People

          • Assignee:
            Karthik Kambatla
            Reporter:
            Todd Lipcon
          • Votes:
            0 Vote for this issue
            Watchers:
            14 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development