Hadoop Map/Reduce
  1. Hadoop Map/Reduce
  2. MAPREDUCE-1921

IOExceptions should contain the filename of the broken input files

    Details

    • Type: Improvement Improvement
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      If bzip or other decompression fails, the IOException does not contain the name of the broken file that caused the exception.

      It would be nice if such actions could be avoided in the future by having the name of the files that are broken spelled
      out in the exception.

      1. mapreduce-1921.patch
        9 kB
        Krishna Ramachandran
      2. mapred-1921-3.patch
        5 kB
        Krishna Ramachandran
      3. mapred-1921-1.patch
        9 kB
        Krishna Ramachandran

        Activity

        Hide
        Krishna Ramachandran added a comment -

        Tom
        You are right I will update the patch address the lapse. It has been a long time since I looked at this am glad to fix & revive

        regards
        Krishna

        Show
        Krishna Ramachandran added a comment - Tom You are right I will update the patch address the lapse. It has been a long time since I looked at this am glad to fix & revive regards Krishna
        Hide
        Tom White added a comment -

        Also, the current patch swallows exceptions for non-file InputSplits, which is not the desired behaviour.

        Show
        Tom White added a comment - Also, the current patch swallows exceptions for non-file InputSplits, which is not the desired behaviour.
        Hide
        Tom White added a comment -

        We could make this slightly more general by not restricting to FileSplit. By using InputSplit's toString() method we get more information for FileSplit (path plus offset and length), and it works for other InputSplit implementations too.

        Show
        Tom White added a comment - We could make this slightly more general by not restricting to FileSplit. By using InputSplit's toString() method we get more information for FileSplit (path plus offset and length), and it works for other InputSplit implementations too.
        Hide
        Krishna Ramachandran added a comment -

        Thx Dick I will take a look and revise suitably or explain
        Regards
        Krishna

        On 8/3/10 5:16 PM, "Dick King (JIRA)" <jira@apache.org> wrote:

        [ https://issues.apache.org/jira/browse/MAPREDUCE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895096#action_12895096 ]

        Dick King commented on MAPREDUCE-1921:
        --------------------------------------

        MapTask.java

        When creating a new exception to re-throw, perhaps we should duplicate the original exception's type? I realize that there's a lot of code that doesn't do that, but swallowing this information feels wrong. Especially in the case of IO exceptions, I can envision a lot of code that wants to treat certain sub-exceptions a bit differently; some IO exceptions are the caller's fault but some represent oddities that happened in the file system.

        Perhaps something like

            throw (IOException)(ioe.getClass().getConstructor(String.class, ioe.getClass())
                                                   .newInstance("IO error ..." + ... , ioe));
        
        

        ?

        This cliche should occur in two places; perhaps we should pull it out and put it in one of the utilities classes? Perhaps IO utilities, since the main use case where I can see people caring is the large variety of IOException s.


        This message is automatically generated by JIRA.
        -
        You can reply to this email to add a comment to the issue online.

        Show
        Krishna Ramachandran added a comment - Thx Dick I will take a look and revise suitably or explain Regards Krishna On 8/3/10 5:16 PM, "Dick King (JIRA)" <jira@apache.org> wrote: [ https://issues.apache.org/jira/browse/MAPREDUCE-1921?page=com.atlassian.jira.plugin.system.issuetabpanels:comment-tabpanel&focusedCommentId=12895096#action_12895096 ] Dick King commented on MAPREDUCE-1921 : -------------------------------------- MapTask.java When creating a new exception to re-throw, perhaps we should duplicate the original exception's type? I realize that there's a lot of code that doesn't do that, but swallowing this information feels wrong. Especially in the case of IO exceptions, I can envision a lot of code that wants to treat certain sub-exceptions a bit differently; some IO exceptions are the caller's fault but some represent oddities that happened in the file system. Perhaps something like throw (IOException)(ioe.getClass().getConstructor(String.class, ioe.getClass()) .newInstance("IO error ..." + ... , ioe)); ? This cliche should occur in two places; perhaps we should pull it out and put it in one of the utilities classes? Perhaps IO utilities, since the main use case where I can see people caring is the large variety of IOException s. – This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
        Hide
        Dick King added a comment -

        MapTask.java

        When creating a new exception to re-throw, perhaps we should duplicate the original exception's type? I realize that there's a lot of code that doesn't do that, but swallowing this information feels wrong. Especially in the case of IO exceptions, I can envision a lot of code that wants to treat certain sub-exceptions a bit differently; some IO exceptions are the caller's fault but some represent oddities that happened in the file system.

        Perhaps something like

            throw (IOException)(ioe.getClass().getConstructor(String.class, ioe.getClass())
                                                   .newInstance("IO error ..." + ... , ioe));
        
        

        ?

        This cliche should occur in two places; perhaps we should pull it out and put it in one of the utilities classes? Perhaps IO utilities, since the main use case where I can see people caring is the large variety of IOException s.

        Show
        Dick King added a comment - MapTask.java When creating a new exception to re-throw, perhaps we should duplicate the original exception's type? I realize that there's a lot of code that doesn't do that, but swallowing this information feels wrong. Especially in the case of IO exceptions, I can envision a lot of code that wants to treat certain sub-exceptions a bit differently; some IO exceptions are the caller's fault but some represent oddities that happened in the file system. Perhaps something like throw (IOException)(ioe.getClass().getConstructor(String.class, ioe.getClass()) .newInstance("IO error ..." + ... , ioe)); ? This cliche should occur in two places; perhaps we should pull it out and put it in one of the utilities classes? Perhaps IO utilities, since the main use case where I can see people caring is the large variety of IOException s.
        Hide
        Krishna Ramachandran added a comment -

        patch updated to include latest file changes in trunk

        Show
        Krishna Ramachandran added a comment - patch updated to include latest file changes in trunk
        Hide
        Krishna Ramachandran added a comment -

        Yes I will fix the error handling

        Test case - the exception was swallowed somewhere in jobclient - but file name is seen in test logs

        Also manual testing with a broken bz2 correctly displayed the file name

        Let me fix the test (or with a real a broken bz2 copied to dfs) and revise

        Show
        Krishna Ramachandran added a comment - Yes I will fix the error handling Test case - the exception was swallowed somewhere in jobclient - but file name is seen in test logs Also manual testing with a broken bz2 correctly displayed the file name Let me fix the test (or with a real a broken bz2 copied to dfs) and revise
        Hide
        Arun C Murthy added a comment -

        Krishna, some comments:

        1. Minor nit: The exception handling code could be slightly better in both moveToNext and nextKeyValue as:

        protected synchronized boolean moveToNext(K key, V value)
        throws IOException {

        • reporter.setProgress(getProgress());
        • beforePos = getPos();
        • boolean ret = rawIn.next(key, value);
        • afterPos = getPos();
          + boolean ret = false;
          + try { + reporter.setProgress(getProgress()); + beforePos = getPos(); + ret = rawIn.next(key, value); + afterPos = getPos(); + }

          catch (IOException ioe)

          Unknown macro: {+ if (split instanceof FileSplit) { + LOG.error("IO error in map input file " + conf.get("map.input.file")); + throw new IOException("IO error in map input file " + + conf.get("map.input.file"), ioe); + }+ }

          return ret;

        2. The test-case doesn't seem to check for the file name in the failure?

        Show
        Arun C Murthy added a comment - Krishna, some comments: 1. Minor nit: The exception handling code could be slightly better in both moveToNext and nextKeyValue as: protected synchronized boolean moveToNext(K key, V value) throws IOException { reporter.setProgress(getProgress()); beforePos = getPos(); boolean ret = rawIn.next(key, value); afterPos = getPos(); + boolean ret = false; + try { + reporter.setProgress(getProgress()); + beforePos = getPos(); + ret = rawIn.next(key, value); + afterPos = getPos(); + } catch (IOException ioe) Unknown macro: {+ if (split instanceof FileSplit) { + LOG.error("IO error in map input file " + conf.get("map.input.file")); + throw new IOException("IO error in map input file " + + conf.get("map.input.file"), ioe); + }+ } return ret; 2. The test-case doesn't seem to check for the file name in the failure?
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12448874/mapred-1921-1.patch
        against trunk revision 960808.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        +1 findbugs. The patch does not introduce any new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/291/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/291/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/291/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/291/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448874/mapred-1921-1.patch against trunk revision 960808. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. +1 findbugs. The patch does not introduce any new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/291/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/291/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/291/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/291/console This message is automatically generated.
        Hide
        Krishna Ramachandran added a comment -

        revised - fix findbug warning

        Show
        Krishna Ramachandran added a comment - revised - fix findbug warning
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12448846/mapreduce-1921.patch
        against trunk revision 960808.

        +1 @author. The patch does not contain any @author tags.

        +1 tests included. The patch appears to include 2 new or modified tests.

        +1 javadoc. The javadoc tool did not generate any warning messages.

        +1 javac. The applied patch does not increase the total number of javac compiler warnings.

        -1 findbugs. The patch appears to introduce 2 new Findbugs warnings.

        +1 release audit. The applied patch does not increase the total number of release audit warnings.

        +1 core tests. The patch passed core unit tests.

        -1 contrib tests. The patch failed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/289/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/289/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/289/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/289/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12448846/mapreduce-1921.patch against trunk revision 960808. +1 @author. The patch does not contain any @author tags. +1 tests included. The patch appears to include 2 new or modified tests. +1 javadoc. The javadoc tool did not generate any warning messages. +1 javac. The applied patch does not increase the total number of javac compiler warnings. -1 findbugs. The patch appears to introduce 2 new Findbugs warnings. +1 release audit. The applied patch does not increase the total number of release audit warnings. +1 core tests. The patch passed core unit tests. -1 contrib tests. The patch failed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/289/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/289/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/289/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Mapreduce-Patch-h4.grid.sp2.yahoo.net/289/console This message is automatically generated.
        Hide
        Krishna Ramachandran added a comment -

        patch to include filename in i/o exception

        Show
        Krishna Ramachandran added a comment - patch to include filename in i/o exception

          People

          • Assignee:
            Krishna Ramachandran
            Reporter:
            Krishna Ramachandran
          • Votes:
            0 Vote for this issue
            Watchers:
            8 Start watching this issue

            Dates

            • Created:
              Updated:

              Development