Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 0.17.0
    • Component/s: None
    • Labels:
      None
    • Environment:

      Revision: 645751

    • Hadoop Flags:
      Reviewed
    • Release Note:
      Fixes LocalFSMerger in ReduceTask.java to handle errors/exceptions better. Prior to this all exceptions except IOException would be silently ignored.

      Description

      I haven't used trunk(0.17) so I don't know what would happen when FSError is thrown within LocalFSMerger thread.

      Does it have the same problem as HADOOP-3154?

      1. HADOOP-3204-v1.patch
        3 kB
        Amar Kamat
      2. HADOOP-3204-v2.1.patch
        2 kB
        Amar Kamat
      3. HADOOP-3204-v2.patch
        3 kB
        Amar Kamat
      4. HADOOP-3204-v2.patch
        3 kB
        Amar Kamat

        Activity

        Hide
        Hudson added a comment -
        Show
        Hudson added a comment - Integrated in Hadoop-trunk #458 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/458/ )
        Hide
        Devaraj Das added a comment -

        I just committed this. Thanks, Amar!

        Show
        Devaraj Das added a comment - I just committed this. Thanks, Amar!
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379906/HADOOP-3204-v2.1.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2206/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2206/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2206/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2206/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379906/HADOOP-3204-v2.1.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2206/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2206/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2206/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2206/console This message is automatically generated.
        Hide
        Amar Kamat added a comment -

        Attaching a patch that incorporates Devaraj's comments.

        Show
        Amar Kamat added a comment - Attaching a patch that incorporates Devaraj's comments.
        Hide
        Devaraj Das added a comment -

        Actually, you probably could improve the printing of the exception in the webUI/log file as a separate jira. In this patch just have the necessary changes required for handling Throwable (FSError).

        Show
        Devaraj Das added a comment - Actually, you probably could improve the printing of the exception in the webUI/log file as a separate jira. In this patch just have the necessary changes required for handling Throwable (FSError).
        Hide
        Devaraj Das added a comment -

        One style issue: instead of doing getMergeError, you should probably do getShuffleError. The code should look like

        if (getShuffleError() != null) {
          ioe.initCause(getShuffleError())
        }
        
        Throwable getShuffleError() {
          //returns only mergeThrowable for now
          return mergeThrowable;
        }
        
        Show
        Devaraj Das added a comment - One style issue: instead of doing getMergeError, you should probably do getShuffleError. The code should look like if (getShuffleError() != null) { ioe.initCause(getShuffleError()) } Throwable getShuffleError() { //returns only mergeThrowable for now return mergeThrowable; }
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12379872/HADOOP-3204-v2.patch
        against trunk revision 645773.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs +1. The patch does not introduce any new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2203/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2203/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2203/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2203/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12379872/HADOOP-3204-v2.patch against trunk revision 645773. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs +1. The patch does not introduce any new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2203/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2203/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2203/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/2203/console This message is automatically generated.
        Hide
        Amar Kamat added a comment -

        Incorporated the changes mentioned above. Attaching a patch.

        Show
        Amar Kamat added a comment - Incorporated the changes mentioned above. Attaching a patch.
        Hide
        Devaraj Das added a comment -

        Also, why is the try-finally block in LocalFSMerger.run required? unless you have a reason you should leave that as it was.

        Show
        Devaraj Das added a comment - Also, why is the try-finally block in LocalFSMerger.run required? unless you have a reason you should leave that as it was.
        Hide
        Devaraj Das added a comment -

        Some nits - the method getFailureCause should be called something else. While the name sounds like it can tell us the failure cause whatsoever but in reality you just depend on the mergeThrowable field. Actually at this point of time, it might make sense to do something better - whenever you get a merge error (be it ramfs or localfs), you can probably invoke umbilical.shuffleError and die.

        Show
        Devaraj Das added a comment - Some nits - the method getFailureCause should be called something else. While the name sounds like it can tell us the failure cause whatsoever but in reality you just depend on the mergeThrowable field. Actually at this point of time, it might make sense to do something better - whenever you get a merge error (be it ramfs or localfs), you can probably invoke umbilical.shuffleError and die.
        Hide
        Chris Douglas added a comment -
            public synchronized Throwable getFailureCause() {
              return mergeThrowable == null ? null : mergeThrowable;
            }
        

        Is equivalent to "return mergeThrowable". The synchronization is probably unnecessary for the get.

        Show
        Chris Douglas added a comment - public synchronized Throwable getFailureCause() { return mergeThrowable == null ? null : mergeThrowable; } Is equivalent to "return mergeThrowable". The synchronization is probably unnecessary for the get.
        Hide
        Amar Kamat added a comment -

        Attaching a patch for review.

        Show
        Amar Kamat added a comment - Attaching a patch for review.
        Hide
        Chris Douglas added a comment -

        All the operations done in LocalFSMerger throw IOException and hence LocalFSMerger expects IOException. I think a better approach would be to make FSError extend IOException or something like that. Fixing the code to catch FSerror when IOException is expected is incorrect.

        I disagree. Per HADOOP-3154 and HADOOP-3166, it is critical that any worker threads whose death could result in data loss are handled by (or kill) the main thread, including (perhaps especially) Errors. Besides, FSError cannot extend both Error and IOException.

        Show
        Chris Douglas added a comment - All the operations done in LocalFSMerger throw IOException and hence LocalFSMerger expects IOException. I think a better approach would be to make FSError extend IOException or something like that. Fixing the code to catch FSerror when IOException is expected is incorrect. I disagree. Per HADOOP-3154 and HADOOP-3166 , it is critical that any worker threads whose death could result in data loss are handled by (or kill) the main thread, including (perhaps especially) Errors. Besides, FSError cannot extend both Error and IOException.
        Hide
        Amar Kamat added a comment -

        AFAIK nothing will happen if FSError is thrown in LocalFSMerger. All the operations done in LocalFSMerger throw IOException and hence LocalFSMerger expects IOException. I think a better approach would be to make FSError extend IOException or something like that. Fixing the code to catch FSerror when IOException is expected is incorrect. Thoughts?

        Show
        Amar Kamat added a comment - AFAIK nothing will happen if FSError is thrown in LocalFSMerger . All the operations done in LocalFSMerger throw IOException and hence LocalFSMerger expects IOException. I think a better approach would be to make FSError extend IOException or something like that. Fixing the code to catch FSerror when IOException is expected is incorrect. Thoughts?

          People

          • Assignee:
            Amar Kamat
            Reporter:
            Koji Noguchi
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development