|
[
Permlink
| « Hide
]
Michael McCandless added a comment - 10/Nov/06 04:07 PM
This patch looks good to me. It still applies cleanly to the current [Nov 10 2006] svn head, and I think there's very little risk. I think it makes sense to guard against double-closing.
Looks fine to me.
Maybe change this a bit:
That if (file != null) block could be replaced with closeFile() call, I think. Also, what was closing the file when you run this in Eclipse?
Otis Gospodnetic made changes - 10/Nov/06 11:31 PM
The method closeFile() belongs to FSDirectory.FSIndexOutput, so I can't call it in FSDirectory.FSIndexInput.close(). (This is hard to see if you just look at the patch file).
I added the method closeFile() to FSDirectory.FSIndexOutput, because the behaviour of finalize() and close() is slightly different: finalize() simply closes the file, whereas close() calls super.close() first and closes the file then. I didn't want to change this behavior, thus I can't just call close() from finalize(). But now I am actually wondering if this behavior is correct. super.close() triggers a flush of the buffer. So in the current Lucene code, FSDirectory.FSIndexOutput.close() triggers a flush, but FSDirectory.FSIndexOutput.finalize() doesn't. Shouldn't we call flush also inside finalize() surrounded by try/catch?
Michael Busch made changes - 11/Nov/06 12:35 AM
Since the new testcase TestStressIndexing has been added as part of the lockless commits I see this test failing regularly. In ten runs it failed four times with the following exception:
[junit] Testsuite: org.apache.lucene.index.TestStressIndexing It appears to be the same problem for which I opened this issue. After applying the patch I did not see the test failing anymore (I ran it about 20 times...) I attach a new version of this patch. Now FSDirectory.FSIndexOutput.finalize() simply calls close() which triggers a flush of the buffer. I don't see a reason not do that. Anybody does? All unit tests pass with the new patch.
Michael Busch made changes - 26/Nov/06 06:56 AM
Michael Busch made changes - 26/Nov/06 06:57 AM
Hmmm. Michael, how does the exception in this unit test tie into this issue? Ie, I thought this issue was that only finalize would be doing a double-close? I'm confused how the two are connected (it's awesome that your patch fixes this, but I'd like to understand why!).
Mike,
you are right, it is very weird that 1) this problem happens and 2) my patch fixes it. It took me quite a while to figure out what the real problem is but I think I have at least a guess now. The problem again only happens on an IBM JVM. It happens rarely with your unmodified TestStressIndexing. If I change this testcase so that is does not perform the stress test on a RAMDirectory but only on a FSDirectory it happens almost always. This is VERY weird, because I did not change the stress test at all... I just commented the lines // // First in a RAM directory: and suddenly every test fails with the IOException "The handle is invalid". So that makes me think it has something to do with garbage collection and the finalize() methods. Now if I just comment out the finalize() method of FSDirectory.FSIndexOutput the test passes. This is even stranger, because the IOException occurred in FSDirectory.FSIndexInput.close(), not in FSIndexOutput, so in a different class which uses hence a different file descriptor. So I checked how java.io.FileDescriptor is implemented: It simply wraps an int value. I can't see how these int values are computed, because the open() method of RandomAccessFile is native. So I believe the following happens: So it seems to me that the IBM JVM makes file descriptor values available after a file has been closed, whereas Sun waits until the FileDescriptor instance is destroyed. This might be a bug in the JVM, but since this patch is very simple we could just use it to be on the safe side. Do you think this makes sense? Or does anybody have a better idea why commenting out the finalize() method in FSIndexOutput prevents FSIndexInput.close() from throwing the IOException? Michael, which OS are you seeing the exception on? I'm trying to repro on Linux w/ IBM's JVM 1.5.0 with no success.
I'm seeing the problem on Windows XP SP2, IBM JVM 1.5 SR3. I'm running the tests in eclipse.
OK I will try to repro.
In the meantime, I like your theory above! It seems very plausible that the 2nd close (during finalize) could [incorrectly] close what was in fact a newly opened descriptor (in use elsewhere). This also means this bug is more serious that I had thought (I thought it would just throw exceptions up to the GC). One way to be sure this theory is true is to instrument the finalize() to see that indeed it called close for the second time, and, the close succeeded (instead of throwing the original exception you saw). Ie, if this event occurs and then corresponds to the above exception in the TestStressIndexing unit test, then we've got this explained, and, it's quite serious since in production this could in theory result in errant IOExceptions like the one above. Mike,
I tried to add some debug to FSIndexOutput and made another observation which makes this issue even odder! I added a boolean variable to FSIndexOutput with the name isOpen. I don't do anything with this variable, I just change the close() method from: public void close() throws IOException { super.close(); file.close(); } to public void close() throws IOException { super.close(); file.close(); isOpen = false; } and suddenly the problem disappears! Now I change close() to public void close() throws IOException { super.close(); isOpen = false; file.close(); } and the IOException occurs again. Notice that I don't use isOpen anywhere else in the code. So it seems that the problem only occurs if file.close() is the last instruction in close(). I bet this is a JVM bug, maybe the compiler makes some kind of optimization (maybe early freeing up the resources of the method's context that are on the stack while the last instruction of the method is being executed). So I'm not completely sure what the real problem is, but I'm pretty sure it is a JVM bug. Ugh! This bug is clearly a heisenbug.
OK, I can also reproduce this on Windows when I use the IBM 1.5.0 JRE. When I apply your patch, the IOException goes away. Furthermore, my best efforts to get a standalone test to show the Finally, I found this spooky very recent thread on java-user that http://www.gossamer-threads.com/lists/lucene/java-user/40357 I think in this case it was on Sun's JRE. So I tried the test using My conclusion is: double-closing a RandomAccessFile is dangerous! And I will commit this patch. I made a couple of tiny changes: changed the name to "isOpen" and I also like the change to flush the buffer on finalize (if the file is Thanks Michael!
Michael McCandless made changes - 30/Nov/06 12:08 AM
Wow that was a tough one!
Thanks for trying so hard to reproduce it, Mike. And thanks for committing, the small changes you made to my patch sound good to me!
The close() method in RandomAccessFile is defined not to throw IOException if it's merely closed twice. The bug here is with the IBM JDK and not Lucene. This stanza; final RandomAccessFile raf = new RandomAccessFile("/tmp/raf", "rw"); should run fine everywhere, according to Javadocs; file:///home/rnewson/Documents/jdk-1.5/api/java/io/Closeable.html#close( This behavior was clarified with the introduction of the Closeable interface in 1.5, so perhaps IBM are not to blame for this. Closing all issues that were resolved for 2.1.
Michael McCandless made changes - 27/Feb/07 06:10 PM
|
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||