Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Critical Critical
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None

      Description

      Currently in HDFS there are a lot of calls to IOUtils.closeStream that are from finally blocks. I'm worried that this can lead to data corruption in the file system. Take the first instance in DataNode.copyBlock: it writes the block and then calls closeStream on the output stream. If there is an error at the end of the file that is detected in the close, it will be completely ignored. Note that logging the error is not enough, the error should be thrown so that the client knows the failure happened.

         try {
           file1.write(...);
           file2.write(...);
         } finally {
            IOUtils.closeStream(file);
        }
      

      is bad. It must be rewritten as:

         try {
           file1.write(...);
           file2.write(...);
           file1.close(...);
           file2.close(...);
         } catch (IOException ie) {
           IOUtils.closeStream(file1);
           IOUtils.closeStream(file2);
           throw ie;
         }
      

      I also think that IOUtils.closeStream should be renamed IOUtils.cleanupFailedStream or something to make it clear it can only be used after the write operation has failed and is being cleaned up.

      1. closeStream.patch
        2 kB
        dhruba borthakur

        Activity

        Hide
        dhruba borthakur added a comment -

        This would need some rework if we want it for 0.17.

        Show
        dhruba borthakur added a comment - This would need some rework if we want it for 0.17.
        Hide
        dhruba borthakur added a comment -

        Looking at the code more closely, it appears that there isn't a bug that this patch addresses. The DatanOde correctly ignores exceptions. This issue is left open to address coding style-related issues. One suggestion is to make it log an error message when the close throws an exception. Another suggestion is to change the name of this method to closeIgnoreExceptions().

        Show
        dhruba borthakur added a comment - Looking at the code more closely, it appears that there isn't a bug that this patch addresses. The DatanOde correctly ignores exceptions. This issue is left open to address coding style-related issues. One suggestion is to make it log an error message when the close throws an exception. Another suggestion is to change the name of this method to closeIgnoreExceptions().
        Hide
        Raghu Angadi added a comment -


        I don't think this is a fix for trunk either (this does not change even if streams need to be closed).

        Show
        Raghu Angadi added a comment - I don't think this is a fix for trunk either (this does not change even if streams need to be closed).
        Hide
        dhruba borthakur added a comment -

        Now that you point it out that streams need not be closed, I agree with you. I cannot see a case when this can cause data corruption. If this is the case, this patch should not go into 0.16.

        Show
        dhruba borthakur added a comment - Now that you point it out that streams need not be closed, I agree with you. I cannot see a case when this can cause data corruption. If this is the case, this patch should not go into 0.16.
        Hide
        Raghu Angadi added a comment -

        Specifically, I am trying to see which part of the patch fixes the potential corruption.

        Show
        Raghu Angadi added a comment - Specifically, I am trying to see which part of the patch fixes the potential corruption.
        Hide
        Raghu Angadi added a comment -

        > This bug has the potential that errors-writing-to-block file go undetected. This can (theoretically) result in a higher probability of data corruption.

        But every close() added in the patch is for a socket or socket stream. Strictly socket streams don't even need to be closed. Am I missing something?

        Show
        Raghu Angadi added a comment - > This bug has the potential that errors-writing-to-block file go undetected. This can (theoretically) result in a higher probability of data corruption. But every close() added in the patch is for a socket or socket stream. Strictly socket streams don't even need to be closed. Am I missing something?
        Hide
        dhruba borthakur added a comment -

        This bug has the potential that errors-writing-to-block file go undetected. This can (theoretically) result in a higher probability of data corruption.

        Show
        dhruba borthakur added a comment - This bug has the potential that errors-writing-to-block file go undetected. This can (theoretically) result in a higher probability of data corruption.
        Hide
        Hadoop QA added a comment -

        -1 overall. Here are the results of testing the latest attachment
        http://issues.apache.org/jira/secure/attachment/12377211/closeStream.patch
        against trunk revision 619744.

        @author +1. The patch does not contain any @author tags.

        tests included -1. The patch doesn't appear to include any new or modified tests.
        Please justify why no tests are needed for this patch.

        javadoc +1. The javadoc tool did not generate any warning messages.

        javac +1. The applied patch does not generate any new javac compiler warnings.

        release audit +1. The applied patch does not generate any new release audit warnings.

        findbugs -1. The patch appears to introduce 2 new Findbugs warnings.

        core tests +1. The patch passed core unit tests.

        contrib tests +1. The patch passed contrib unit tests.

        Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1899/testReport/
        Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1899/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html
        Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1899/artifact/trunk/build/test/checkstyle-errors.html
        Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1899/console

        This message is automatically generated.

        Show
        Hadoop QA added a comment - -1 overall. Here are the results of testing the latest attachment http://issues.apache.org/jira/secure/attachment/12377211/closeStream.patch against trunk revision 619744. @author +1. The patch does not contain any @author tags. tests included -1. The patch doesn't appear to include any new or modified tests. Please justify why no tests are needed for this patch. javadoc +1. The javadoc tool did not generate any warning messages. javac +1. The applied patch does not generate any new javac compiler warnings. release audit +1. The applied patch does not generate any new release audit warnings. findbugs -1. The patch appears to introduce 2 new Findbugs warnings. core tests +1. The patch passed core unit tests. contrib tests +1. The patch passed contrib unit tests. Test results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1899/testReport/ Findbugs warnings: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1899/artifact/trunk/build/test/findbugs/newPatchFindbugsWarnings.html Checkstyle results: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1899/artifact/trunk/build/test/checkstyle-errors.html Console output: http://hudson.zones.apache.org/hudson/job/Hadoop-Patch/1899/console This message is automatically generated.
        Hide
        Raghu Angadi added a comment -


        Why is this a blocker? What is the specific bug that this patch fixes?

        Show
        Raghu Angadi added a comment - Why is this a blocker? What is the specific bug that this patch fixes?
        Hide
        dhruba borthakur added a comment -

        Triggering Hadoop QA tests.

        Show
        dhruba borthakur added a comment - Triggering Hadoop QA tests.
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > The standard idiom for multiple streams, as mentioned above, is nested try blocks
        +1 Nested try blocks probably work in most situations.

        However, it does not work when the number of streams is unknown in compile time, e.g.

        OutputStream[] out = new OutputStream[n];
        ...
        

        Also, it makes the code ugly when the number is large, say > 3.

        Show
        Tsz Wo Nicholas Sze added a comment - > The standard idiom for multiple streams, as mentioned above, is nested try blocks +1 Nested try blocks probably work in most situations. However, it does not work when the number of streams is unknown in compile time, e.g. OutputStream[] out = new OutputStream[n]; ... Also, it makes the code ugly when the number is large, say > 3.
        Hide
        Doug Cutting added a comment - - edited

        The standard idiom for multiple streams, as mentioned above, is nested try blocks, e.g.:

        OutputStream out1 = fs.open(...);
        try {
          OutputStream out2 = fs.open(...);
          try {
            out1.write(...);
            out2.write(...);
          } finally {
            out2.close();
        } finally {
          out1.close();
        }
        
        Show
        Doug Cutting added a comment - - edited The standard idiom for multiple streams, as mentioned above, is nested try blocks, e.g.: OutputStream out1 = fs.open(...); try { OutputStream out2 = fs.open(...); try { out1.write(...); out2.write(...); } finally { out2.close(); } finally { out1.close(); }
        Hide
        Tsz Wo Nicholas Sze added a comment -

        > Shouldn't we try to make this idiom work well with HDFS?

        This idiom is not obvious for multiple IOs. For example, the following codes cannot handle exceptions correctly:

        OutputStream out1 = fs.open(...);
        OutputStream out2 = fs.open(...);
        try {
          out1.write(...);
          out2.write(...);
        } finally {
          out1.close();
          out2.close();  //not called if the previous line throws an exception
        }
        

        We still need something like IOUtils.closeStream to do try-catch if there are more then one IOs.
        I suggest we define the following method in IOUtils

        public static void closeIO(Closeable... io) throws IOException {
          List<IOException> ioexceptions = new ArrayList<IOException>();
          for(Closeable c : io) {
            try {io.close();}
            catch(IOException e) {ioexceptions.add(e);}
          }
          if (!ioexceptions.isEmpty()) {
            throw new IOException(...); //construct an IOException with the list
          }
        }
        

        Then, multiple IOs can be closed together

        OutputStream out1 = fs.open(...);
        OutputStream out2 = fs.open(...);
        try {
          out1.write(...);
          out2.write(...);
        } finally {
         IOUtils.closeIO(out1, out2);
        }
        
        Show
        Tsz Wo Nicholas Sze added a comment - > Shouldn't we try to make this idiom work well with HDFS? This idiom is not obvious for multiple IOs. For example, the following codes cannot handle exceptions correctly: OutputStream out1 = fs.open(...); OutputStream out2 = fs.open(...); try { out1.write(...); out2.write(...); } finally { out1.close(); out2.close(); //not called if the previous line throws an exception } We still need something like IOUtils.closeStream to do try-catch if there are more then one IOs. I suggest we define the following method in IOUtils public static void closeIO(Closeable... io) throws IOException { List<IOException> ioexceptions = new ArrayList<IOException>(); for (Closeable c : io) { try {io.close();} catch (IOException e) {ioexceptions.add(e);} } if (!ioexceptions.isEmpty()) { throw new IOException(...); //construct an IOException with the list } } Then, multiple IOs can be closed together OutputStream out1 = fs.open(...); OutputStream out2 = fs.open(...); try { out1.write(...); out2.write(...); } finally { IOUtils.closeIO(out1, out2); }
        Hide
        Raghu Angadi added a comment -

        > Then shouldn't we use it, instead of using IOUtils.closeStream() at all?

        May be. I am not sure if we should enforce it every where. IMHO the readability becomes really bad when lot of streams are involved as in DataNode (2 sockets and their input out streams, and two file channels).

        Show
        Raghu Angadi added a comment - > Then shouldn't we use it, instead of using IOUtils.closeStream() at all? May be. I am not sure if we should enforce it every where. IMHO the readability becomes really bad when lot of streams are involved as in DataNode (2 sockets and their input out streams, and two file channels).
        Hide
        Doug Cutting added a comment -

        > I am not sure why this would not work now..

        Then shouldn't we use it, instead of using IOUtils.closeStream() at all?

        Show
        Doug Cutting added a comment - > I am not sure why this would not work now.. Then shouldn't we use it, instead of using IOUtils.closeStream() at all?
        Hide
        Tsz Wo Nicholas Sze added a comment -
        • The catch block in IOUtils.closeStream is currently empty. We should at least log a message.
        • BTW, IOUtils.closeSocket has a similar problem.
        Show
        Tsz Wo Nicholas Sze added a comment - The catch block in IOUtils.closeStream is currently empty. We should at least log a message. BTW, IOUtils.closeSocket has a similar problem.
        Hide
        Raghu Angadi added a comment -

        > Shouldn't we try to make this idiom work well with HDFS?

        I am not sure why this would not work now..

        Show
        Raghu Angadi added a comment - > Shouldn't we try to make this idiom work well with HDFS? I am not sure why this would not work now..
        Hide
        Doug Cutting added a comment -

        It would be good if folks can use standard Java i/o idioms with Hadoop.

        http://www.ibm.com/developerworks/java/library/j-jtp03216.html

        OutputStream out = fs.open(...);
        try {
          out.write(...);
        } finally {
          out.close();
        }
        

        When multiple files are involved the best thing is to nest the try blocks.

        Shouldn't we try to make this idiom work well with HDFS?

        Show
        Doug Cutting added a comment - It would be good if folks can use standard Java i/o idioms with Hadoop. http://www.ibm.com/developerworks/java/library/j-jtp03216.html OutputStream out = fs.open(...); try { out.write(...); } finally { out.close(); } When multiple files are involved the best thing is to nest the try blocks. Shouldn't we try to make this idiom work well with HDFS?
        Hide
        Raghu Angadi added a comment -

        i.e. closeStream() should be used if and only if user wants to ignore IOException and its ok for ref to be null.

        Show
        Raghu Angadi added a comment - i.e. closeStream() should be used if and only if user wants to ignore IOException and its ok for ref to be null.
        Hide
        Raghu Angadi added a comment -

        You are right. The example above shows wrong use of closeStream().

        > it clear it can only be used after the write operation has failed and is being cleaned up.
        But this is not true. Many times this used to close sockets where socket needs to be closed because of unrelated error (say i/o failed on a different stream).

        Show
        Raghu Angadi added a comment - You are right. The example above shows wrong use of closeStream(). > it clear it can only be used after the write operation has failed and is being cleaned up. But this is not true. Many times this used to close sockets where socket needs to be closed because of unrelated error (say i/o failed on a different stream).
        Hide
        dhruba borthakur added a comment -

        Yes, that is a good catch. Should be fixed for 0.16 as well.

        Show
        dhruba borthakur added a comment - Yes, that is a good catch. Should be fixed for 0.16 as well.

          People

          • Assignee:
            dhruba borthakur
            Reporter:
            Owen O'Malley
          • Votes:
            2 Vote for this issue
            Watchers:
            6 Start watching this issue

            Dates

            • Created:
              Updated:

              Development