Issue Details (XML | Word | Printable)

Key: LUCENE-669
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Trivial Trivial
Assignee: Michael Busch
Reporter: Michael Busch
Votes: 1
Watchers: 1
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close already closed file

Created: 07/Sep/06 07:46 AM   Updated: 27/Feb/07 06:10 PM
Return to search
Component/s: Store
Affects Version/s: None
Fix Version/s: 2.1

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works FSDirectory_close_file2.patch 2006-11-26 06:56 AM Michael Busch 2 kB

Lucene Fields: Patch Available
Resolution Date: 30/Nov/06 12:08 AM


 Description  « Hide
Hi all,

I found a small problem in FSDirectory: The finalize()-methods of FSDirectory.FSIndexInput and FSDirectory.FSIndexOutput try to close the underlying file. This is not a problem unless the file has been closed before by calling the close() method. If it has been closed before, the finalize method throws an IOException saying that the file is already closed. Usually this IOException would go unnoticed, because the GarbageCollector, which calls finalize(), just eats it. However, if I use the Eclipse debugger the execution of my code will always be suspended when this exception is thrown.

Even though this exception probably won't cause problems during normal execution of Lucene, the code becomes cleaner if we apply this small patch. Might this IOException also have a performance impact, if it is thrown very frequently?

I attached the patch which applies cleanly on the current svn HEAD. All testcases pass and I verfied with the Eclipse debugger that the IOException is not longer thrown.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Michael McCandless added a comment - 10/Nov/06 04:07 PM
This patch looks good to me. It still applies cleanly to the current [Nov 10 2006] svn head, and I think there's very little risk. I think it makes sense to guard against double-closing.

Otis Gospodnetic added a comment - 10/Nov/06 11:31 PM
Looks fine to me.

Maybe change this a bit:
public void close() throws IOException {

  • if (!isClone)
  • file.close();
    + if (!isClone)
    Unknown macro: {+ if (file != null) { + file.close(); + file = null; + }+ }

    }

That if (file != null) block could be replaced with closeFile() call, I think.

Also, what was closing the file when you run this in Eclipse?


Otis Gospodnetic made changes - 10/Nov/06 11:31 PM
Field Original Value New Value
Lucene Fields [Patch Available]
Michael Busch added a comment - 11/Nov/06 12:35 AM
The method closeFile() belongs to FSDirectory.FSIndexOutput, so I can't call it in FSDirectory.FSIndexInput.close(). (This is hard to see if you just look at the patch file).

I added the method closeFile() to FSDirectory.FSIndexOutput, because the behaviour of finalize() and close() is slightly different: finalize() simply closes the file, whereas close() calls super.close() first and closes the file then. I didn't want to change this behavior, thus I can't just call close() from finalize().

But now I am actually wondering if this behavior is correct. super.close() triggers a flush of the buffer. So in the current Lucene code, FSDirectory.FSIndexOutput.close() triggers a flush, but FSDirectory.FSIndexOutput.finalize() doesn't. Shouldn't we call flush also inside finalize() surrounded by try/catch?


Michael Busch made changes - 11/Nov/06 12:35 AM
Assignee Michael Busch [ michaelbusch ]
Michael Busch added a comment - 26/Nov/06 06:56 AM
Since the new testcase TestStressIndexing has been added as part of the lockless commits I see this test failing regularly. In ten runs it failed four times with the following exception:

[junit] Testsuite: org.apache.lucene.index.TestStressIndexing
[junit] Tests run: 1, Failures: 1, Errors: 0, Time elapsed: 33.338 sec
[junit] ------------- Standard Output ---------------
[junit] java.io.IOException: The handle is invalid.
[junit] ------------- ---------------- ---------------
[junit] ------------- Standard Error -----------------
[junit] java.io.IOException: The handle is invalid.
[junit] at java.io.RandomAccessFile.close0(Native Method)
[junit] at java.io.RandomAccessFile.close(RandomAccessFile.java:573)
[junit] at org.apache.lucene.store.FSIndexInput.close(FSDirectory.java:537)
[junit] at org.apache.lucene.index.CompoundFileReader.close(CompoundFileReader.java:111)
[junit] at org.apache.lucene.index.SegmentReader.doClose(SegmentReader.java:252)
[junit] at org.apache.lucene.index.IndexReader.close(IndexReader.java:637)
[junit] at org.apache.lucene.index.MultiReader.doClose(MultiReader.java:235)
[junit] at org.apache.lucene.index.IndexReader.close(IndexReader.java:637)
[junit] at org.apache.lucene.search.IndexSearcher.close(IndexSearcher.java:74)
[junit] at org.apache.lucene.index.TestStressIndexing$SearcherThread.run(TestStressIndexing.java:101)
[junit] ------------- ---------------- ---------------
[junit] Testcase: testStressIndexAndSearching(org.apache.lucene.index.TestStressIndexing): FAILED
[junit] hit unexpected exception in search2
[junit] junit.framework.AssertionFailedError: hit unexpected exception in search2
[junit] at org.apache.lucene.index.TestStressIndexing.runStressTest(TestStressIndexing.java:140)
[junit] at org.apache.lucene.index.TestStressIndexing.testStressIndexAndSearching(TestStressIndexing.java:161)
[junit] Test org.apache.lucene.index.TestStressIndexing FAILED

It appears to be the same problem for which I opened this issue. After applying the patch I did not see the test failing anymore (I ran it about 20 times...)

I attach a new version of this patch. Now FSDirectory.FSIndexOutput.finalize() simply calls close() which triggers a flush of the buffer. I don't see a reason not do that. Anybody does?

All unit tests pass with the new patch.


Michael Busch made changes - 26/Nov/06 06:56 AM
Attachment FSDirectory_close_file2.patch [ 12345670 ]
Michael Busch made changes - 26/Nov/06 06:57 AM
Attachment FSDirectory_close_file_patch.patch [ 12340338 ]
Michael McCandless added a comment - 26/Nov/06 11:40 AM
Hmmm. Michael, how does the exception in this unit test tie into this issue? Ie, I thought this issue was that only finalize would be doing a double-close? I'm confused how the two are connected (it's awesome that your patch fixes this, but I'd like to understand why!).

Michael Busch added a comment - 28/Nov/06 08:25 AM
Mike,

you are right, it is very weird that 1) this problem happens and 2) my patch fixes it. It took me quite a while to figure out what the real problem is but I think I have at least a guess now.

The problem again only happens on an IBM JVM. It happens rarely with your unmodified TestStressIndexing. If I change this testcase so that is does not perform the stress test on a RAMDirectory but only on a FSDirectory it happens almost always. This is VERY weird, because I did not change the stress test at all... I just commented the lines

// // First in a RAM directory:
// Directory directory = new RAMDirectory();
// runStressTest(directory);
// directory.close();

and suddenly every test fails with the IOException "The handle is invalid". So that makes me think it has something to do with garbage collection and the finalize() methods.

Now if I just comment out the finalize() method of FSDirectory.FSIndexOutput the test passes. This is even stranger, because the IOException occurred in FSDirectory.FSIndexInput.close(), not in FSIndexOutput, so in a different class which uses hence a different file descriptor.

So I checked how java.io.FileDescriptor is implemented: It simply wraps an int value. I can't see how these int values are computed, because the open() method of RandomAccessFile is native.

So I believe the following happens:
(1) FSIndexOutput uses a FileDescriptor with value x
(2) FSIndexOutput.close() is called, so the underlying file is being closed
(3) A new FSIndexInput instance is created by a searcher thread. This opens a RandomAccessFile. Because FileDescriptor x is not in use anymore, x is used as the value for the new FileDescriptor.
(4) Now garbage collection kicks in. It removes the old instance of FSIndexOutput for which close() has been called already. So the garbage collector calls finalize() which calls RandomAccessFile.close() again which still uses the descriptor with value x. So this call of close() actually closes the file used by the IndexInput instance created in (3).
(5) FSIndexInput.close() is called and tries to close the file which has been closed already in (4) and thus the IOException occurs.

So it seems to me that the IBM JVM makes file descriptor values available after a file has been closed, whereas Sun waits until the FileDescriptor instance is destroyed. This might be a bug in the JVM, but since this patch is very simple we could just use it to be on the safe side.

Do you think this makes sense? Or does anybody have a better idea why commenting out the finalize() method in FSIndexOutput prevents FSIndexInput.close() from throwing the IOException?


Michael McCandless added a comment - 28/Nov/06 05:38 PM
Michael, which OS are you seeing the exception on? I'm trying to repro on Linux w/ IBM's JVM 1.5.0 with no success.

Michael Busch added a comment - 28/Nov/06 07:00 PM
I'm seeing the problem on Windows XP SP2, IBM JVM 1.5 SR3. I'm running the tests in eclipse.

Michael McCandless added a comment - 28/Nov/06 08:11 PM
OK I will try to repro.

In the meantime, I like your theory above! It seems very plausible that the 2nd close (during finalize) could [incorrectly] close what was in fact a newly opened descriptor (in use elsewhere). This also means this bug is more serious that I had thought (I thought it would just throw exceptions up to the GC).

One way to be sure this theory is true is to instrument the finalize() to see that indeed it called close for the second time, and, the close succeeded (instead of throwing the original exception you saw). Ie, if this event occurs and then corresponds to the above exception in the TestStressIndexing unit test, then we've got this explained, and, it's quite serious since in production this could in theory result in errant IOExceptions like the one above.


Michael Busch added a comment - 29/Nov/06 06:09 AM
Mike,

I tried to add some debug to FSIndexOutput and made another observation which makes this issue even odder! I added a boolean variable to FSIndexOutput with the name isOpen. I don't do anything with this variable, I just change the close() method from:

public void close() throws IOException { super.close(); file.close(); }

to

public void close() throws IOException { super.close(); file.close(); isOpen = false; }

and suddenly the problem disappears! Now I change close() to

public void close() throws IOException { super.close(); isOpen = false; file.close(); }

and the IOException occurs again. Notice that I don't use isOpen anywhere else in the code. So it seems that the problem only occurs if file.close() is the last instruction in close(). I bet this is a JVM bug, maybe the compiler makes some kind of optimization (maybe early freeing up the resources of the method's context that are on the stack while the last instruction of the method is being executed). So I'm not completely sure what the real problem is, but I'm pretty sure it is a JVM bug.


Michael McCandless added a comment - 29/Nov/06 11:28 PM
Ugh! This bug is clearly a heisenbug.

OK, I can also reproduce this on Windows when I use the IBM 1.5.0 JRE.
I can't repro with the Sun 1.5.0_07 JRE.

When I apply your patch, the IOException goes away.

Furthermore, my best efforts to get a standalone test to show the
error have failed. I don't understand what precise tickling is
required to get the IOException to happen.

Finally, I found this spooky very recent thread on java-user that
looks very much like this error (and was never resolved):

http://www.gossamer-threads.com/lists/lucene/java-user/40357

I think in this case it was on Sun's JRE. So I tried the test using
"java -server" but it didn't fail. Sigh.

My conclusion is: double-closing a RandomAccessFile is dangerous! And
quite possibly can cause problems for "real" use cases (ie, not just
when testing under Eclipse).

I will commit this patch.

I made a couple of tiny changes: changed the name to "isOpen" and
moved up the "isOpen = true" to be right after the file is actually
opened.

I also like the change to flush the buffer on finalize (if the file is
still open).

Thanks Michael!


Repository Revision Date User Message
ASF #480785 Thu Nov 30 00:07:46 UTC 2006 mikemccand LUCENE-669: don't double-close RandomAccessFile in finalize
Files Changed
MODIFY /lucene/java/trunk/CHANGES.txt
MODIFY /lucene/java/trunk/src/java/org/apache/lucene/store/FSDirectory.java

Michael McCandless made changes - 30/Nov/06 12:08 AM
Resolution Fixed [ 1 ]
Fix Version/s 2.1 [ 12310854 ]
Status Open [ 1 ] Resolved [ 5 ]
Michael Busch added a comment - 30/Nov/06 12:46 AM
Wow that was a tough one!

Thanks for trying so hard to reproduce it, Mike. And thanks for committing, the small changes you made to my patch sound good to me!


Repository Revision Date User Message
ASF #485686 Mon Dec 11 15:01:07 UTC 2006 mikemccand LUCENE-669: I forgot attribution to Michael Busch for this (thanks Michael!)
Files Changed
MODIFY /lucene/java/trunk/CHANGES.txt

Robert Newson added a comment - 06/Feb/07 04:23 PM

The close() method in RandomAccessFile is defined not to throw IOException if it's merely closed twice. The bug here is with the IBM JDK and not Lucene.

This stanza;

final RandomAccessFile raf = new RandomAccessFile("/tmp/raf", "rw");
for (int i = 0; i < 1000; i++) { raf.close(); }

should run fine everywhere, according to Javadocs;

file:///home/rnewson/Documents/jdk-1.5/api/java/io/Closeable.html#close()
"Closes this stream and releases any system resources associated with it. If the stream is already closed then invoking this method has no effect."

This behavior was clarified with the introduction of the Closeable interface in 1.5, so perhaps IBM are not to blame for this.


Michael McCandless added a comment - 27/Feb/07 06:10 PM
Closing all issues that were resolved for 2.1.

Michael McCandless made changes - 27/Feb/07 06:10 PM
Status Resolved [ 5 ] Closed [ 6 ]