Lucene - Core
  1. Lucene - Core
  2. LUCENE-669

finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close already closed file

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.1
    • Component/s: core/store
    • Labels:
      None
    • Lucene Fields:
      Patch Available

      Description

      Hi all,

      I found a small problem in FSDirectory: The finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close the underlying file. This is not a problem unless the file has been closed before by calling the close() method. If it has been closed before, the finalize method throws an IOException saying that the file is already closed. Usually this IOException would go unnoticed, because the GarbageCollector, which calls finalize(), just eats it. However, if I use the Eclipse debugger the execution of my code will always be suspended when this exception is thrown.

      Even though this exception probably won't cause problems during normal execution of Lucene, the code becomes cleaner if we apply this small patch. Might this IOException also have a performance impact, if it is thrown very frequently?

      I attached the patch which applies cleanly on the current svn HEAD. All testcases pass and I verfied with the Eclipse debugger that the IOException is not longer thrown.

        Activity

        Hide
        Michael McCandless added a comment -

        This patch looks good to me. It still applies cleanly to the current [Nov 10 2006] svn head, and I think there's very little risk. I think it makes sense to guard against double-closing.

        Show
        Michael McCandless added a comment - This patch looks good to me. It still applies cleanly to the current [Nov 10 2006] svn head, and I think there's very little risk. I think it makes sense to guard against double-closing.
        Hide
        Otis Gospodnetic added a comment -

        Looks fine to me.

        Maybe change this a bit:
        public void close() throws IOException {

        • if (!isClone)
        • file.close();
          + if (!isClone)
          Unknown macro: {+ if (file != null) { + file.close(); + file = null; + }+ }

          }

        That if (file != null) block could be replaced with closeFile() call, I think.

        Also, what was closing the file when you run this in Eclipse?

        Show
        Otis Gospodnetic added a comment - Looks fine to me. Maybe change this a bit: public void close() throws IOException { if (!isClone) file.close(); + if (!isClone) Unknown macro: {+ if (file != null) { + file.close(); + file = null; + }+ } } That if (file != null) block could be replaced with closeFile() call, I think. Also, what was closing the file when you run this in Eclipse?
        Hide
        Michael Busch added a comment -

        The method closeFile() belongs to FSDirectory.FSIndexOutput, so I can't call it in FSDirectory.FSIndexInput.close(). (This is hard to see if you just look at the patch file).

        I added the method closeFile() to FSDirectory.FSIndexOutput, because the behaviour of finalize() and close() is slightly different: finalize() simply closes the file, whereas close() calls super.close() first and closes the file then. I didn't want to change this behavior, thus I can't just call close() from finalize().

        But now I am actually wondering if this behavior is correct. super.close() triggers a flush of the buffer. So in the current Lucene code, FSDirectory.FSIndexOutput.close() triggers a flush, but FSDirectory.FSIndexOutput.finalize() doesn't. Shouldn't we call flush also inside finalize() surrounded by try/catch?

        Show
        Michael Busch added a comment - The method closeFile() belongs to FSDirectory.FSIndexOutput, so I can't call it in FSDirectory.FSIndexInput.close(). (This is hard to see if you just look at the patch file). I added the method closeFile() to FSDirectory.FSIndexOutput, because the behaviour of finalize() and close() is slightly different: finalize() simply closes the file, whereas close() calls super.close() first and closes the file then. I didn't want to change this behavior, thus I can't just call close() from finalize(). But now I am actually wondering if this behavior is correct. super.close() triggers a flush of the buffer. So in the current Lucene code, FSDirectory.FSIndexOutput.close() triggers a flush, but FSDirectory.FSIndexOutput.finalize() doesn't. Shouldn't we call flush also inside finalize() surrounded by try/catch?
        Hide
        Michael Busch added a comment -

        Since the new testcase TestStressIndexing has been added as part of the lockless commits I see this test failing regularly. In ten runs it failed four times with the following exception:

        [junit] Testsuite: org.apache.lucene.index.TestStressIndexing
        [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 33.338 sec
        [junit] ------------- Standard Output ---------------
        [junit] java.io.IOException: The handle is invalid.
        [junit] ------------- ---------------- ---------------
        [junit] ------------- Standard Error -----------------
        [junit] java.io.IOException: The handle is invalid.
        [junit] at java.io.RandomAccessFile.close0(Native Method)
        [junit] at java.io.RandomAccessFile.close(RandomAccessFile.java:573)
        [junit] at org.apache.lucene.store.FSIndexInput.close(FSDirectory.java:537)
        [junit] at org.apache.lucene.index.CompoundFileReader.close(CompoundFileReader.java:111)
        [junit] at org.apache.lucene.index.SegmentReader.doClose(SegmentReader.java:252)
        [junit] at org.apache.lucene.index.IndexReader.close(IndexReader.java:637)
        [junit] at org.apache.lucene.index.MultiReader.doClose(MultiReader.java:235)
        [junit] at org.apache.lucene.index.IndexReader.close(IndexReader.java:637)
        [junit] at org.apache.lucene.search.IndexSearcher.close(IndexSearcher.java:74)
        [junit] at org.apache.lucene.index.TestStressIndexing$SearcherThread.run(TestStressIndexing.java:101)
        [junit] ------------- ---------------- ---------------
        [junit] Testcase: testStressIndexAndSearching(org.apache.lucene.index.TestStressIndexing): FAILED
        [junit] hit unexpected exception in search2
        [junit] junit.framework.AssertionFailedError: hit unexpected exception in search2
        [junit] at org.apache.lucene.index.TestStressIndexing.runStressTest(TestStressIndexing.java:140)
        [junit] at org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching(TestStressIndexing.java:161)
        [junit] Test org.apache.lucene.index.TestStressIndexing FAILED

        It appears to be the same problem for which I opened this issue. After applying the patch I did not see the test failing anymore (I ran it about 20 times...)

        I attach a new version of this patch. Now FSDirectory.FSIndexOutput.finalize() simply calls close() which triggers a flush of the buffer. I don't see a reason not do that. Anybody does?

        All unit tests pass with the new patch.

        Show
        Michael Busch added a comment - Since the new testcase TestStressIndexing has been added as part of the lockless commits I see this test failing regularly. In ten runs it failed four times with the following exception: [junit] Testsuite: org.apache.lucene.index.TestStressIndexing [junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 33.338 sec [junit] ------------- Standard Output --------------- [junit] java.io.IOException: The handle is invalid. [junit] ------------- ---------------- --------------- [junit] ------------- Standard Error ----------------- [junit] java.io.IOException: The handle is invalid. [junit] at java.io.RandomAccessFile.close0(Native Method) [junit] at java.io.RandomAccessFile.close(RandomAccessFile.java:573) [junit] at org.apache.lucene.store.FSIndexInput.close(FSDirectory.java:537) [junit] at org.apache.lucene.index.CompoundFileReader.close(CompoundFileReader.java:111) [junit] at org.apache.lucene.index.SegmentReader.doClose(SegmentReader.java:252) [junit] at org.apache.lucene.index.IndexReader.close(IndexReader.java:637) [junit] at org.apache.lucene.index.MultiReader.doClose(MultiReader.java:235) [junit] at org.apache.lucene.index.IndexReader.close(IndexReader.java:637) [junit] at org.apache.lucene.search.IndexSearcher.close(IndexSearcher.java:74) [junit] at org.apache.lucene.index.TestStressIndexing$SearcherThread.run(TestStressIndexing.java:101) [junit] ------------- ---------------- --------------- [junit] Testcase: testStressIndexAndSearching(org.apache.lucene.index.TestStressIndexing): FAILED [junit] hit unexpected exception in search2 [junit] junit.framework.AssertionFailedError: hit unexpected exception in search2 [junit] at org.apache.lucene.index.TestStressIndexing.runStressTest(TestStressIndexing.java:140) [junit] at org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching(TestStressIndexing.java:161) [junit] Test org.apache.lucene.index.TestStressIndexing FAILED It appears to be the same problem for which I opened this issue. After applying the patch I did not see the test failing anymore (I ran it about 20 times...) I attach a new version of this patch. Now FSDirectory.FSIndexOutput.finalize() simply calls close() which triggers a flush of the buffer. I don't see a reason not do that. Anybody does? All unit tests pass with the new patch.
        Hide
        Michael McCandless added a comment -

        Hmmm. Michael, how does the exception in this unit test tie into this issue? Ie, I thought this issue was that only finalize would be doing a double-close? I'm confused how the two are connected (it's awesome that your patch fixes this, but I'd like to understand why!).

        Show
        Michael McCandless added a comment - Hmmm. Michael, how does the exception in this unit test tie into this issue? Ie, I thought this issue was that only finalize would be doing a double-close? I'm confused how the two are connected (it's awesome that your patch fixes this, but I'd like to understand why!).
        Hide
        Michael Busch added a comment -

        Mike,

        you are right, it is very weird that 1) this problem happens and 2) my patch fixes it. It took me quite a while to figure out what the real problem is but I think I have at least a guess now.

        The problem again only happens on an IBM JVM. It happens rarely with your unmodified TestStressIndexing. If I change this testcase so that is does not perform the stress test on a RAMDirectory but only on a FSDirectory it happens almost always. This is VERY weird, because I did not change the stress test at all... I just commented the lines

        // // First in a RAM directory:
        // Directory directory = new RAMDirectory();
        // runStressTest(directory);
        // directory.close();

        and suddenly every test fails with the IOException "The handle is invalid". So that makes me think it has something to do with garbage collection and the finalize() methods.

        Now if I just comment out the finalize() method of FSDirectory.FSIndexOutput the test passes. This is even stranger, because the IOException occurred in FSDirectory.FSIndexInput.close(), not in FSIndexOutput, so in a different class which uses hence a different file descriptor.

        So I checked how java.io.FileDescriptor is implemented: It simply wraps an int value. I can't see how these int values are computed, because the open() method of RandomAccessFile is native.

        So I believe the following happens:
        (1) FSIndexOutput uses a FileDescriptor with value x
        (2) FSIndexOutput.close() is called, so the underlying file is being closed
        (3) A new FSIndexInput instance is created by a searcher thread. This opens a RandomAccessFile. Because FileDescriptor x is not in use anymore, x is used as the value for the new FileDescriptor.
        (4) Now garbage collection kicks in. It removes the old instance of FSIndexOutput for which close() has been called already. So the garbage collector calls finalize() which calls RandomAccessFile.close() again which still uses the descriptor with value x. So this call of close() actually closes the file used by the IndexInput instance created in (3).
        (5) FSIndexInput.close() is called and tries to close the file which has been closed already in (4) and thus the IOException occurs.

        So it seems to me that the IBM JVM makes file descriptor values available after a file has been closed, whereas Sun waits until the FileDescriptor instance is destroyed. This might be a bug in the JVM, but since this patch is very simple we could just use it to be on the safe side.

        Do you think this makes sense? Or does anybody have a better idea why commenting out the finalize() method in FSIndexOutput prevents FSIndexInput.close() from throwing the IOException?

        Show
        Michael Busch added a comment - Mike, you are right, it is very weird that 1) this problem happens and 2) my patch fixes it. It took me quite a while to figure out what the real problem is but I think I have at least a guess now. The problem again only happens on an IBM JVM. It happens rarely with your unmodified TestStressIndexing. If I change this testcase so that is does not perform the stress test on a RAMDirectory but only on a FSDirectory it happens almost always. This is VERY weird, because I did not change the stress test at all... I just commented the lines // // First in a RAM directory: // Directory directory = new RAMDirectory(); // runStressTest(directory); // directory.close(); and suddenly every test fails with the IOException "The handle is invalid". So that makes me think it has something to do with garbage collection and the finalize() methods. Now if I just comment out the finalize() method of FSDirectory.FSIndexOutput the test passes. This is even stranger, because the IOException occurred in FSDirectory.FSIndexInput.close(), not in FSIndexOutput, so in a different class which uses hence a different file descriptor. So I checked how java.io.FileDescriptor is implemented: It simply wraps an int value. I can't see how these int values are computed, because the open() method of RandomAccessFile is native. So I believe the following happens: (1) FSIndexOutput uses a FileDescriptor with value x (2) FSIndexOutput.close() is called, so the underlying file is being closed (3) A new FSIndexInput instance is created by a searcher thread. This opens a RandomAccessFile. Because FileDescriptor x is not in use anymore, x is used as the value for the new FileDescriptor. (4) Now garbage collection kicks in. It removes the old instance of FSIndexOutput for which close() has been called already. So the garbage collector calls finalize() which calls RandomAccessFile.close() again which still uses the descriptor with value x. So this call of close() actually closes the file used by the IndexInput instance created in (3). (5) FSIndexInput.close() is called and tries to close the file which has been closed already in (4) and thus the IOException occurs. So it seems to me that the IBM JVM makes file descriptor values available after a file has been closed, whereas Sun waits until the FileDescriptor instance is destroyed. This might be a bug in the JVM, but since this patch is very simple we could just use it to be on the safe side. Do you think this makes sense? Or does anybody have a better idea why commenting out the finalize() method in FSIndexOutput prevents FSIndexInput.close() from throwing the IOException?
        Hide
        Michael McCandless added a comment -

        Michael, which OS are you seeing the exception on? I'm trying to repro on Linux w/ IBM's JVM 1.5.0 with no success.

        Show
        Michael McCandless added a comment - Michael, which OS are you seeing the exception on? I'm trying to repro on Linux w/ IBM's JVM 1.5.0 with no success.
        Hide
        Michael Busch added a comment -

        I'm seeing the problem on Windows XP SP2, IBM JVM 1.5 SR3. I'm running the tests in eclipse.

        Show
        Michael Busch added a comment - I'm seeing the problem on Windows XP SP2, IBM JVM 1.5 SR3. I'm running the tests in eclipse.
        Hide
        Michael McCandless added a comment -

        OK I will try to repro.

        In the meantime, I like your theory above! It seems very plausible that the 2nd close (during finalize) could [incorrectly] close what was in fact a newly opened descriptor (in use elsewhere). This also means this bug is more serious that I had thought (I thought it would just throw exceptions up to the GC).

        One way to be sure this theory is true is to instrument the finalize() to see that indeed it called close for the second time, and, the close succeeded (instead of throwing the original exception you saw). Ie, if this event occurs and then corresponds to the above exception in the TestStressIndexing unit test, then we've got this explained, and, it's quite serious since in production this could in theory result in errant IOExceptions like the one above.

        Show
        Michael McCandless added a comment - OK I will try to repro. In the meantime, I like your theory above! It seems very plausible that the 2nd close (during finalize) could [incorrectly] close what was in fact a newly opened descriptor (in use elsewhere). This also means this bug is more serious that I had thought (I thought it would just throw exceptions up to the GC). One way to be sure this theory is true is to instrument the finalize() to see that indeed it called close for the second time, and, the close succeeded (instead of throwing the original exception you saw). Ie, if this event occurs and then corresponds to the above exception in the TestStressIndexing unit test, then we've got this explained, and, it's quite serious since in production this could in theory result in errant IOExceptions like the one above.
        Hide
        Michael Busch added a comment -

        Mike,

        I tried to add some debug to FSIndexOutput and made another observation which makes this issue even odder! I added a boolean variable to FSIndexOutput with the name isOpen. I don't do anything with this variable, I just change the close() method from:

        public void close() throws IOException

        { super.close(); file.close(); }

        to

        public void close() throws IOException

        { super.close(); file.close(); isOpen = false; }

        and suddenly the problem disappears! Now I change close() to

        public void close() throws IOException

        { super.close(); isOpen = false; file.close(); }

        and the IOException occurs again. Notice that I don't use isOpen anywhere else in the code. So it seems that the problem only occurs if file.close() is the last instruction in close(). I bet this is a JVM bug, maybe the compiler makes some kind of optimization (maybe early freeing up the resources of the method's context that are on the stack while the last instruction of the method is being executed). So I'm not completely sure what the real problem is, but I'm pretty sure it is a JVM bug.

        Show
        Michael Busch added a comment - Mike, I tried to add some debug to FSIndexOutput and made another observation which makes this issue even odder! I added a boolean variable to FSIndexOutput with the name isOpen. I don't do anything with this variable, I just change the close() method from: public void close() throws IOException { super.close(); file.close(); } to public void close() throws IOException { super.close(); file.close(); isOpen = false; } and suddenly the problem disappears! Now I change close() to public void close() throws IOException { super.close(); isOpen = false; file.close(); } and the IOException occurs again. Notice that I don't use isOpen anywhere else in the code. So it seems that the problem only occurs if file.close() is the last instruction in close(). I bet this is a JVM bug, maybe the compiler makes some kind of optimization (maybe early freeing up the resources of the method's context that are on the stack while the last instruction of the method is being executed). So I'm not completely sure what the real problem is, but I'm pretty sure it is a JVM bug.
        Hide
        Michael McCandless added a comment -

        Ugh! This bug is clearly a heisenbug.

        OK, I can also reproduce this on Windows when I use the IBM 1.5.0 JRE.
        I can't repro with the Sun 1.5.0_07 JRE.

        When I apply your patch, the IOException goes away.

        Furthermore, my best efforts to get a standalone test to show the
        error have failed. I don't understand what precise tickling is
        required to get the IOException to happen.

        Finally, I found this spooky very recent thread on java-user that
        looks very much like this error (and was never resolved):

        http://www.gossamer-threads.com/lists/lucene/java-user/40357

        I think in this case it was on Sun's JRE. So I tried the test using
        "java -server" but it didn't fail. Sigh.

        My conclusion is: double-closing a RandomAccessFile is dangerous! And
        quite possibly can cause problems for "real" use cases (ie, not just
        when testing under Eclipse).

        I will commit this patch.

        I made a couple of tiny changes: changed the name to "isOpen" and
        moved up the "isOpen = true" to be right after the file is actually
        opened.

        I also like the change to flush the buffer on finalize (if the file is
        still open).

        Thanks Michael!

        Show
        Michael McCandless added a comment - Ugh! This bug is clearly a heisenbug. OK, I can also reproduce this on Windows when I use the IBM 1.5.0 JRE. I can't repro with the Sun 1.5.0_07 JRE. When I apply your patch, the IOException goes away. Furthermore, my best efforts to get a standalone test to show the error have failed. I don't understand what precise tickling is required to get the IOException to happen. Finally, I found this spooky very recent thread on java-user that looks very much like this error (and was never resolved): http://www.gossamer-threads.com/lists/lucene/java-user/40357 I think in this case it was on Sun's JRE. So I tried the test using "java -server" but it didn't fail. Sigh. My conclusion is: double-closing a RandomAccessFile is dangerous! And quite possibly can cause problems for "real" use cases (ie, not just when testing under Eclipse). I will commit this patch. I made a couple of tiny changes: changed the name to "isOpen" and moved up the "isOpen = true" to be right after the file is actually opened. I also like the change to flush the buffer on finalize (if the file is still open). Thanks Michael!
        Hide
        Michael Busch added a comment -

        Wow that was a tough one!

        Thanks for trying so hard to reproduce it, Mike. And thanks for committing, the small changes you made to my patch sound good to me!

        Show
        Michael Busch added a comment - Wow that was a tough one! Thanks for trying so hard to reproduce it, Mike. And thanks for committing, the small changes you made to my patch sound good to me!
        Hide
        Robert Newson added a comment -

        The close() method in RandomAccessFile is defined not to throw IOException if it's merely closed twice. The bug here is with the IBM JDK and not Lucene.

        This stanza;

        final RandomAccessFile raf = new RandomAccessFile("/tmp/raf", "rw");
        for (int i = 0; i < 1000; i++)

        { raf.close(); }

        should run fine everywhere, according to Javadocs;

        file:///home/rnewson/Documents/jdk-1.5/api/java/io/Closeable.html#close()
        "Closes this stream and releases any system resources associated with it. If the stream is already closed then invoking this method has no effect."

        This behavior was clarified with the introduction of the Closeable interface in 1.5, so perhaps IBM are not to blame for this.

        Show
        Robert Newson added a comment - The close() method in RandomAccessFile is defined not to throw IOException if it's merely closed twice. The bug here is with the IBM JDK and not Lucene. This stanza; final RandomAccessFile raf = new RandomAccessFile("/tmp/raf", "rw"); for (int i = 0; i < 1000; i++) { raf.close(); } should run fine everywhere, according to Javadocs; file:///home/rnewson/Documents/jdk-1.5/api/java/io/Closeable.html#close( ) "Closes this stream and releases any system resources associated with it. If the stream is already closed then invoking this method has no effect." This behavior was clarified with the introduction of the Closeable interface in 1.5, so perhaps IBM are not to blame for this.
        Hide
        Michael McCandless added a comment -

        Closing all issues that were resolved for 2.1.

        Show
        Michael McCandless added a comment - Closing all issues that were resolved for 2.1.

          People

          • Assignee:
            Michael Busch
            Reporter:
            Michael Busch
          • Votes:
            1 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development