Hadoop Common
  1. Hadoop Common
  2. HADOOP-4760

HDFS streams should not throw exceptions when closed twice

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 0.19.1, 0.20.0, 0.21.0
    • Fix Version/s: 0.19.1
    • Component/s: fs, fs/s3
    • Labels:
      None
    • Environment:

      all

    • Hadoop Flags:
      Reviewed

      Description

      When adding an InputStream via addResource(InputStream) to a Configuration instance, if the stream is a HDFS stream the loadResource(..) method fails with IOException indicating that the stream has already been closed.

      1. closehdfsstream_v3.patch
        5 kB
        Enis Soztutar
      2. closehdfsstream_v2.patch
        11 kB
        Enis Soztutar
      3. closehdfsstream_v1.patch
        11 kB
        Enis Soztutar

        Issue Links

          Activity

          Hide
          Alejandro Abdelnur added a comment - - edited

          The problem is in the Configuration.loadResource(...) method at:

                } else if (name instanceof InputStream) {
                  try {
                    doc = builder.parse((InputStream)name);
                  } finally {
                    ((InputStream)name).close();
                  }
                }
          

          the DocumentBuilder (the

          {builder}

          variable) parse(...) method closes the stream, making the close() in finally to fail.

          Note that the failure does not happen with all stream classes, only with those that check that the stream is not closed before closing it (HDFS stream does that)

          Show
          Alejandro Abdelnur added a comment - - edited The problem is in the Configuration.loadResource(...) method at: } else if (name instanceof InputStream) { try { doc = builder.parse((InputStream)name); } finally { ((InputStream)name).close(); } } the DocumentBuilder (the {builder} variable) parse(...) method closes the stream, making the close() in finally to fail. Note that the failure does not happen with all stream classes, only with those that check that the stream is not closed before closing it (HDFS stream does that)
          Hide
          Doug Cutting added a comment -

          We should fix HDFS to not complain about twice-closed streams, to be compatible with other InputStream implementations.

          Show
          Doug Cutting added a comment - We should fix HDFS to not complain about twice-closed streams, to be compatible with other InputStream implementations.
          Hide
          Alejandro Abdelnur added a comment -

          Doug, you are correct, according to the Closeable API: Closes this stream and releases any system resources associated with it. If the stream is already closed then invoking this method has no effect.

          Show
          Alejandro Abdelnur added a comment - Doug, you are correct, according to the Closeable API: Closes this stream and releases any system resources associated with it. If the stream is already closed then invoking this method has no effect.
          Hide
          Enis Soztutar added a comment - - edited

          We should fix HDFS to not complain about twice-closed streams, to be compatible with other InputStream implementations.

          +1, I have come across several cases, where this was the problem.

          Show
          Enis Soztutar added a comment - - edited We should fix HDFS to not complain about twice-closed streams, to be compatible with other InputStream implementations. +1, I have come across several cases, where this was the problem.
          Hide
          Enis Soztutar added a comment -

          Here is a patch, which checks for the

          {input|output}

          streams to be closed and returns w/o throwing IOException. Test case ensures that HDFS and S3 file systems meet the contract.

          Show
          Enis Soztutar added a comment - Here is a patch, which checks for the {input|output} streams to be closed and returns w/o throwing IOException. Test case ensures that HDFS and S3 file systems meet the contract.
          Hide
          Raghu Angadi added a comment -

          +1. This should have been fixed a long time back.

          regd the patch :

          1. If both the file and DFSclient are closed, it would still throw an exception, checkopen() should probably called only if the stream is not closed.
          2. it has a lot of meta, formatting changes, are those intentional? I know @override is useful, but these changes are spread all over.. one big disadvantage is that it conflicts patches coming from other active branches.
          Show
          Raghu Angadi added a comment - +1. This should have been fixed a long time back. regd the patch : If both the file and DFSclient are closed, it would still throw an exception, checkopen() should probably called only if the stream is not closed. it has a lot of meta, formatting changes, are those intentional? I know @override is useful, but these changes are spread all over.. one big disadvantage is that it conflicts patches coming from other active branches.
          Hide
          Enis Soztutar added a comment -

          Incorporated Raghu's comments, now checkOpen() is called if closed is false. Also fixed the DFSClient.close() method to not throw exception.

          The @Override's are pretty useful (espcecially when refactoring) and they are introduced intentionally by my eclipse save actions. I am in favor of keeping them.

          import statements are reordered and * are converted to actual classes again by save actions. According to our guidelines, the import statements should only contain actual class references. As in this case here, the import statements are constantly switched between actual class names or * between patches. Maybe we should add checking for this to test-patch script.

          Show
          Enis Soztutar added a comment - Incorporated Raghu's comments, now checkOpen() is called if closed is false. Also fixed the DFSClient.close() method to not throw exception. The @Override's are pretty useful (espcecially when refactoring) and they are introduced intentionally by my eclipse save actions. I am in favor of keeping them. import statements are reordered and * are converted to actual classes again by save actions. According to our guidelines, the import statements should only contain actual class references. As in this case here, the import statements are constantly switched between actual class names or * between patches. Maybe we should add checking for this to test-patch script.
          Hide
          Enis Soztutar added a comment -

          ant test-patch results :

          [exec] +1 overall.
          [exec]
          [exec] +1 @author. The patch does not contain any @author tags.
          [exec]
          [exec] +1 tests included. The patch appears to include 6 new or modified tests.
          [exec]
          [exec] +1 javadoc. The javadoc tool did not generate any warning messages.
          [exec]
          [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings.
          [exec]
          [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings.
          [exec]
          [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity.
          [exec]
          [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings.

          All tests pass except TestSetupAndCleanupFailure, which seems unrelated.

          Show
          Enis Soztutar added a comment - ant test-patch results : [exec] +1 overall. [exec] [exec] +1 @author. The patch does not contain any @author tags. [exec] [exec] +1 tests included. The patch appears to include 6 new or modified tests. [exec] [exec] +1 javadoc. The javadoc tool did not generate any warning messages. [exec] [exec] +1 javac. The applied patch does not increase the total number of javac compiler warnings. [exec] [exec] +1 findbugs. The patch does not introduce any new Findbugs warnings. [exec] [exec] +1 Eclipse classpath. The patch retains Eclipse classpath integrity. [exec] [exec] +1 release audit. The applied patch does not increase the total number of release audit warnings. All tests pass except TestSetupAndCleanupFailure, which seems unrelated.
          Hide
          Raghu Angadi added a comment -

          import statements are reordered and * are converted to actual classes again by save actions. As in this case here, the import statements are constantly switched between actual class names or * between patches.

          hmm.. I am pretty sure eclipse can be configured not to do that (in fact, by default it may not do that). If every patch includes a lot of corrections like this, it would be pretty hard to track and maintain. There might even be constants flips committed due to minor variations in different eclipse configurations or JDKs used by eclipse environments. Pretty error prone as well.

          I am -0.5 on these. I might be biased in this since I make sure my patch is not polluted even by minor white space changes. At least two separate patches would be much better. Note that it should be ok to fix the code just around the actual code changes.

          Show
          Raghu Angadi added a comment - import statements are reordered and * are converted to actual classes again by save actions. As in this case here, the import statements are constantly switched between actual class names or * between patches. hmm.. I am pretty sure eclipse can be configured not to do that (in fact, by default it may not do that). If every patch includes a lot of corrections like this, it would be pretty hard to track and maintain. There might even be constants flips committed due to minor variations in different eclipse configurations or JDKs used by eclipse environments. Pretty error prone as well. I am -0.5 on these. I might be biased in this since I make sure my patch is not polluted even by minor white space changes. At least two separate patches would be much better. Note that it should be ok to fix the code just around the actual code changes.
          Hide
          Enis Soztutar added a comment -

          Yes my eclipse configuration is intentionally this way, since I believe we should have the @Override's and imports corrected for ALL of the code. Doing these in a separate issue is a logical option, however I cannot imagine anyone doing this for all the java classes, so I tend to be practical and fix the ones I change for a patch.

          I can remove the [import,@Override] changes, but I'm afraid I could not find time to prepare a patch for them.

          On a related issue I guess the eclipse configurations about import statements should be fixed for everybody who develops code for Hadoop, so that the statements will be correct in the first place.

          Show
          Enis Soztutar added a comment - Yes my eclipse configuration is intentionally this way, since I believe we should have the @Override's and imports corrected for ALL of the code. Doing these in a separate issue is a logical option, however I cannot imagine anyone doing this for all the java classes, so I tend to be practical and fix the ones I change for a patch. I can remove the [import,@Override] changes, but I'm afraid I could not find time to prepare a patch for them. On a related issue I guess the eclipse configurations about import statements should be fixed for everybody who develops code for Hadoop, so that the statements will be correct in the first place.
          Hide
          Enis Soztutar added a comment -

          Removed import and annotation changes as per discussion.

          Show
          Enis Soztutar added a comment - Removed import and annotation changes as per discussion.
          Hide
          Raghu Angadi added a comment -

          Thanks. +1 for v3.

          If you and/or others feel v2 is better, please go ahead, I would not -1 it.

          Show
          Raghu Angadi added a comment - Thanks. +1 for v3. If you and/or others feel v2 is better, please go ahead, I would not -1 it.
          Hide
          Enis Soztutar added a comment -

          I committed this to 0.19.1 and on. The fix for 0.18.4 is not straightforward, so I left that out, we can fix it in a new issue if need be.

          Show
          Enis Soztutar added a comment - I committed this to 0.19.1 and on. The fix for 0.18.4 is not straightforward, so I left that out, we can fix it in a new issue if need be.
          Hide
          Tsz Wo Nicholas Sze added a comment -

          Is this an incompatible change? Hope that there is no codes depend on the old behavior.

          Show
          Tsz Wo Nicholas Sze added a comment - Is this an incompatible change? Hope that there is no codes depend on the old behavior.
          Hide
          Hudson added a comment -
          Show
          Hudson added a comment - Integrated in Hadoop-trunk #756 (See http://hudson.zones.apache.org/hudson/job/Hadoop-trunk/756/ )
          Hide
          Enis Soztutar added a comment -

          Is this an incompatible change? Hope that there is no codes depend on the old behavior.

          I don't think so, Closeable.close() explicitly states that closing more than once should have no effect.

          Show
          Enis Soztutar added a comment - Is this an incompatible change? Hope that there is no codes depend on the old behavior. I don't think so, Closeable.close() explicitly states that closing more than once should have no effect.

            People

            • Assignee:
              Enis Soztutar
              Reporter:
              Alejandro Abdelnur
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development