Issue Details (XML | Word | Printable)

Key: LUCENE-1453
Type: Bug Bug
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Uwe Schindler
Reporter: Mark Miller
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

When reopen returns a new IndexReader, both IndexReaders may now control the lifecycle of the underlying Directory which is managed by reference counting

Created: 15/Nov/08 09:01 PM   Updated: 10/Jun/09 12:48 PM
Return to search
Component/s: None
Affects Version/s: 2.4
Fix Version/s: 2.4.1, 2.9

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works Failing-testcase-LUCENE-1453.patch 2009-06-07 10:30 PM Uwe Schindler 9 kB
Text File Licensed for inclusion in ASF works LUCENE-1453-fix-TestIndexReader.patch 2009-06-10 12:00 PM Uwe Schindler 3 kB
Text File Licensed for inclusion in ASF works LUCENE-1453-with-FSDir-open.patch 2009-06-08 09:30 PM Uwe Schindler 23 kB
Text File Licensed for inclusion in ASF works LUCENE-1453.patch 2009-06-10 09:39 AM Uwe Schindler 22 kB
Text File Licensed for inclusion in ASF works LUCENE-1453.patch 2009-06-09 09:36 PM Uwe Schindler 21 kB
Text File Licensed for inclusion in ASF works LUCENE-1453.patch 2009-06-08 09:28 PM Uwe Schindler 17 kB
Text File Licensed for inclusion in ASF works LUCENE-1453.patch 2008-11-17 06:23 PM Michael McCandless 6 kB
Text File Licensed for inclusion in ASF works LUCENE-1453.patch 2008-11-17 11:56 AM Michael McCandless 5 kB
Text File Licensed for inclusion in ASF works LUCENE-1453.patch 2008-11-15 09:33 PM Mark Miller 4 kB
Issue Links:
Incorporates
 
Reference
 

Lucene Fields: New
Resolution Date: 10/Jun/09 12:26 PM


 Description  « Hide
Rough summary. Basically, FSDirectory tracks references to FSDirectory and when IndexReader.reopen shares a Directory with a created IndexReader and closeDirectory is true, FSDirectory's ref management will see two decrements for one increment. You can end up getting an AlreadyClosed exception on the Directory when the IndexReader is open.

I have a test I'll put up. A solution seems fairly straightforward (at least in what needs to be accomplished).



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Mark Miller added a comment - 15/Nov/08 09:33 PM
Test exposing the problem and one possible fix for it.

Mark Miller added a comment - 17/Nov/08 11:31 AM
So this is a bit of an ugly way to do this. Whats the best way? Some kind of clone support on Directory?

Michael McCandless added a comment - 17/Nov/08 11:45 AM
Hmm good catch Mark! With LUCENE-1451 we are moving away from sharing instances of FSDir, which will fix this problem in 3.0. But in the meantime I think this approach is good. Though, maybe conditionalize it on whether closeDirectory is true? We should add a comment explaining why are re-getting the FSDir in that case, also explaining that this logic can be removed in 3.0.

I'll make those changes & commit.


Michael McCandless added a comment - 17/Nov/08 11:56 AM
New patch attached. If this looks OK to you Mark, I'll commit. Thanks!

Mark Miller added a comment - 17/Nov/08 12:40 PM
Looks great to me. The only nitpick thing I'd change (its a left over from my first patch) is dropping mabyeCopy in SegmentReader into the else clause. Don't think its necessary, but I want to OCD it down.

Michael McCandless added a comment - 17/Nov/08 01:51 PM
Hmm actually it's not so simple, because we also need the directory to be cloned if you reopen the segment or get a whole new SegmentReader. Also, the case that LUCENE-1430 fixed (in IndexReader.open) also must be guarded against in reopen. I'll work out a new patch...

Mark Miller added a comment - 17/Nov/08 02:06 PM
Ahh, I see. The newReader variable name should have tipped me off <g> I had it in my mind that there could be more but obviously didn't retain the idea (that test actually only breaks on one of the two originals).

Mark Miller added a comment - 17/Nov/08 04:23 PM
Goes deeper than I would have guessed (not that I know reopen at all). Still have a test failing...

Michael McCandless added a comment - 17/Nov/08 06:23 PM
Attached another iteration...

I moved the logic into DirectoryIndexReader's reopen method, and fixed the case where an exception his hit during reopen (eg if a writer is committing, that's expected) to not incorrectly close the directory.

I also strengthened the test cases to verify we are not over-incRef'ing the FSDir.


Mark Miller added a comment - 18/Nov/08 01:12 AM - edited
Hmm...I've still got a test that fails. I'll post it tonight or in the morning after a triple check.

...trying to check my shoot from the hip modus operandi for a moment <g>...

edit

Ah, was optimizing without changing so getting back the same indexreader. Bad test. Patch holding up good so far...


Mark Miller added a comment - 18/Nov/08 03:20 AM
Okay, I've banged on it and I can't break it. Thats a brilliant way to do it by the way - not nearly as ugly (and is much shorter and actually works, so thats nice).

Michael Busch added a comment - 18/Nov/08 05:06 AM
Patch looks good... I guess this was my bug, so thanks for finding and fixing it Mark and Mike!

Michael McCandless added a comment - 18/Nov/08 09:46 AM
OK thanks for the review guys! I'll commit soon.

Michael McCandless added a comment - 18/Nov/08 09:58 AM
Committed revision 718540. Thanks Mark!

Mark Miller added a comment - 25/Nov/08 11:49 PM - edited
I think this bug should be called out in changes. Disagree? I've seen 2 to 3 people that have seen it, and it seems we should document that its fixed in this version. I'll add it before the next release if nobody disagrees.

Michael McCandless added a comment - 26/Nov/08 10:07 AM
I agree – I just committed an entry to CHANGES.txt.

Michael McCandless added a comment - 19/Feb/09 01:37 AM
Reopening for backport to 2.4.1.

Michael McCandless added a comment - 19/Feb/09 09:54 AM
Committed revision 745797 on 2.4 branch.

Uwe Schindler added a comment - 30/May/09 10:23 PM
Is this patch obsolete, if LUCENE-1658 is correctly resolved and the whole caching from FSDir is removed?

Uwe Schindler added a comment - 31/May/09 07:26 AM - edited
It was too late in the night yesterday: I think this is not really related to LUCENE-1658, but this case showed possibly a problem in my code.
As this issue is only interesting for IndexReaders opened without "Directory" but witch FS path, it was a problem with incomplete removal of the reference counting of FSDirs in an early test.
If the problem occurs again, I will reopen this issue.

Uwe Schindler added a comment - 07/Jun/09 10:27 PM
After some testing, I found out, that this issue is not fixed. The test "TestIndexReaderReopen" fails very often, if all occurences of FSDirectory.getDirectory() are replaced by FSDirectory.open() in IndexReader/IndexWriter and DirectoryReader.
open() returns non refcounting single-use directorys that can be closed only one time. If readers on top of this (using the now-deprecated File/String IndexReader.open()) are reopened the directories are sometimes closed false.
I was hoping, LUCENE-1651 fixes this, but this is not so.

There are two possibilities to fix this:
For both, the first step is to remove the whole closeDir stuff from all IndexReaders and do not close it from within indexReader.
Then there are two solutions:

  • As FSDir.open() only returns subclasses of FSDir that have a no-op close() method, no closing is required. So we can just leave them open
  • Another possibility is to wrap all readers opened by IR.open() with File/String param by an (deprecated, package private) FilterIndexReader that handles closing the directory in close() and reopen() in a proper way. This is much simplier than doing it inside the DirectoryReader reopen stuff. The small overhead in passing through FilterIndexReader only affects people using the deprecated File/String open() methods. Users, directly using FSDir & Co. have no slowdown.

Uwe Schindler added a comment - 07/Jun/09 10:30 PM
Attached is the patch, that replaces all occurences of FSDir.openDirectory() by open(). The test "TestIndexReaderReopen" fails almost every time with an AlreadyClosedException.

Earwin Burrfoot added a comment - 07/Jun/09 10:44 PM

There are two possibilities to fix this:

Vote for "leave them open". Yes, it breaches the contract, but the breach is controlled (and thus harmless) and we get rid of some weird code (=possible point of failure) without introducing new.
There is a way to notice change in DirectoryReader behaviour, but it is too unrealistic:

IndexReader r = IndexReader.open("/path/to/index");
.....
Directory d = r.directory(); // you have to get directory reference as you're not the one who created it
.....
r.close();
.....
d.doSomething(); // and EXPECT this call to fail with exception

Michael McCandless added a comment - 08/Jun/09 11:33 AM
Since we've deprecated all methods that are using FSDirectory.getDirectory under-the-hood, why do we even need to fix this? Ie why replace all these with the new FSDir.open, now, when we're just going to remove them in 3.0 anyway?

Uwe Schindler added a comment - 08/Jun/09 12:21 PM
Because the error sometimes also occurs with the refcounting directories, but more seldom (because of the refcounting helps to keep the directory open, even when it is closed one time too much).
And our problem: we want to really remove this ugly closeDir stuff from IndexReaders, the code is sometimes unreadable and its hard to find out whats going on.

Michael McCandless added a comment - 08/Jun/09 12:48 PM
But the refcounting is also deprecated? And, IndexReader will no longer track closeDir in 3.0, since that's only set to true in the deprecated methods?

Because the error sometimes also occurs with the refcounting directories,

Oh, you mean there is an intermittent failure on the current trunk? (Ie, when using FSDir.getDirectory under the hood).


Uwe Schindler added a comment - 08/Jun/09 09:28 PM
This is the solution using the FilterIndexReader, all tests now pass (with refcounting deprectated dirs as well as FSDir.open-dirs, see next Patch).
The solution consists of two parts:
  • All closeDirectory stuff is removed from DirectoryIndexReader (even the ugly FSDir cloning) and from ReadOnlyDirectoryIndexReader; the code is now simplier to understand. It is now on the status for 3.0, no deprecated helper stuff anymore in these internal classes. So they can be used in 3.0 without modifications.
  • As the DirectoryIndexReader is not closing the directory anymore, the deprectated IndexReader.open methods taking String/File would not work anymore correctly (because they miss to close the dir on closing). To fix this easily, a deprectated private class extends FIlterIndexReader was added, that wraps around the DirectoryIndexReader, when File/String opens are used. This class keeps a refcounter that is incremented on reopen/clone and decremented on doClose(). The last doClose, closes the directory. In 3.0 this class can be removed easily with all File/String open() methods. I could remove this class from IndexReader.java and put in a separate package-private file, if you like.

I would like to have this in 2.9, to get rid of these ugly closeDirectory hacks! All tests pass (I retried TestIndexReaderReopen about hundred times and no variant fails anymore). It also works, when replacing the refcounting FSDir.getDirectory by FSDir.open() calls (see next patch).


Uwe Schindler added a comment - 08/Jun/09 09:30 PM
This is a variant for testing the same with FSDir.open(). As you see, the reopening now also works correctly here and the underlying directory is not closed too often.

This patch is for demonstration only.


Uwe Schindler added a comment - 09/Jun/09 11:50 AM
I forgot to mention:

Oh, you mean there is an intermittent failure on the current trunk? (Ie, when using FSDir.getDirectory under the hood).

Yes, for example, if you clone() and index that owns a directory. In this case the directory was not cloned using the ugly hack currently in trunk (see patches before, which is also removed again by my patch) and was then not correctly refcounted. During reopen(), sometimes DirectoryReader uses clone() to create a new instance, when the readonly flag changed, but the index was not modified. The testcase leads sometimes to exactly this behaviour and fails then.

My patch now also handles clone correctly, so it fixes all these problems by just factoring out the deprecated Directory management to a FilterIndexReader. DirectoryReader just sees the directory and will never try to close it.


Earwin Burrfoot added a comment - 09/Jun/09 12:00 PM
Patch looks fine. I read the last one, LUCENE-1453-with-FSDir-open.patch.

Uwe Schindler added a comment - 09/Jun/09 12:11 PM
Both patches are the same, the second one is just using FSDir.open() instead of FSDir.getDirectory. For committing, I would use getDirectory(), the first patch. FSDir.open() would change behaviour for uses relying on the System property to specify the directory impl.

Michael McCandless added a comment - 09/Jun/09 03:31 PM
OK patch looks good Uwe! This is much cleaner, and it's great to be able to get the scary closeDirectory logic entirely out of DirectoryReader.

Uwe Schindler added a comment - 09/Jun/09 05:30 PM
Only two discussion points (the first one came up during an IRC chat with Earwin):
  • If DirectoryReader.close() throws an Exception, should the directory still be closed for cleanup? This is implemented exactly that way using a try...finally block in DirectoryOwningIndexReader.doClose(). The old DirectoryReader was different, with all its bugs.
  • Should I factor out DirectoryOwningIndexReader (any better name?) into a separate java file to not overload the already very big IndexReader.java? With 3.0, we could simply remove this class file (because already deprecated) together with String/File open() .

Michael McCandless added a comment - 09/Jun/09 05:48 PM

If DirectoryReader.close() throws an Exception, should the directory still be closed for cleanup? This is implemented exactly that way using a try...finally block in DirectoryOwningIndexReader.doClose().

I think it should (be closed in a finally clause).

Should I factor out DirectoryOwningIndexReader (any better name?) into a separate java file to not overload the already very big IndexReader.java?

That sounds good.


Earwin Burrfoot added a comment - 09/Jun/09 06:02 PM

I think it should (be closed in a finally clause).

Then there's the next question of the same sort, but probably belonging in a separate issue. If we close a DR and one of SR throws an exception - should we close the others (currently we don't)? What is the right way, in general, of handling IOExceptions on IR close? Can we retry the close? What does this exception mean?


Michael McCandless added a comment - 09/Jun/09 06:41 PM
Good question Exceptions in close are tricky...

In some places we try hard to keep closing stuff, and then throw the
first exception we hit. Probably that should be our preferred
approach, in general, for readers? (My guess is most apps simply
catch & log the IOException, so, we should try hard to close all that
we can on the first go.)

In other places (SR, DR) we don't.

In general you can retry a close – calling close more than once on
Lucene classes is allowed (or, we aim for that, but likely not all
classes achieve it).

For writers (IndexWriter, and DR with pending changes) we try hard not
to lose state on hitting an exception during close. Meaning, on
hitting an exception in close, either 1) your state already made it
into the index, or 2) if you fix the root cause (free up disk space,
fix permisisions, whatever, which of course hardly any app is going to
be doing) and retry the close, it would succeed, and your index would
be fine and your changes are committed.


Uwe Schindler added a comment - 09/Jun/09 09:36 PM
Here is an updated patch:
  • Factored out the FilterIndexReader
  • Rewrite doClose() in both readers to record the first exception, but still try to close cleanup everything.
  • Some more code cleanup, CHANGES.txt

It is ready to commit. Mike will you do this, or should I assign myself to this issue?


Uwe Schindler added a comment - 09/Jun/09 09:40 PM - edited
Do we need to backport to 2.4.2? It's not so easy, because DirectoryReader does not yet exist there (so you cannot simply apply the patch)...
Without backporting, clone() would not work correctly with shared FSDir IndexReaders.

And: Backporting would hurt performance, because 2.4 does not use per-segment search, so each call from the searcher is passed through the FilterIndexReader...


Earwin Burrfoot added a comment - 09/Jun/09 11:37 PM
Two suggestions:

Factor out RefCount class and use it everywhere through Lucene. I see at least one identical to yours in SegmentReader. Would be easier to replace all these uses with AtomicInteger later.

Looking at the new unsightly loop in doClose(), what if we change all Lucene closeable classes to implement java.io.Closeable and create a static utility method(-s) that receives a bunch of Closeables (an array, iterable, vararg in 1.5) and tries to close them all?
The method should be nullsafe (so you can skip != null checks) and will handle/rethrow exceptions. The most proper way to handle exceptions is probably this - rethrow original exception if it is the only one (be it Runtime or IO), if there's more - gather all exceptions and wrap them into a special IOException subclass that concatenates their messages and keeps them around, so they are inspectable at debug-time or if you implement special treatment for that exception in your code.
This method can be reused in a heap of places later, SR.doClose() comes first to mind.

I can do the latter one in a separate patch to close this issue faster.


Uwe Schindler added a comment - 10/Jun/09 09:39 AM - edited
Hi Earwin,
attached is a patch, that simply reuses SegmentReader.Ref. Factoring it out to o.a.l.util would be harder to do at the moment (some test-cases rely on this class). And SegmentReader seems to be the only class, that uses such a Ref construct. Other classes have their refCounter as Field.
As the Filter is just a deprecated wrapper, that is removed in 3.0, I think reusing SegmentReader.Ref for that is ok.

This patch also contains a new test for clone(), that does the same like the reopen test (checking the refcounts).

Closeable is a Java 1.5 interface only, so this refactoring must wait until 3.0, but the idea is good!


Michael McCandless added a comment - 10/Jun/09 09:47 AM

Mike will you do this, or should I assign myself to this issue?

Go ahead & assign to yourself!


Earwin Burrfoot added a comment - 10/Jun/09 09:48 AM

As the Filter is just a deprecated wrapper, that is removed in 3.0, I think reusing SegmentReader.Ref for that is ok.

Ok. Maybe you are right.

Closeable is a Java 1.5 interface only, so this refactoring must wait until 3.0, but the idea is good!

We can introduce our own Closeable, and replace it with java native in 3.0, thank gods the interface is simple


Uwe Schindler added a comment - 10/Jun/09 10:01 AM
Mike: OK, I commit the latest patch soon!

Earwin:

Closeable is a Java 1.5 interface only, so this refactoring must wait until 3.0, but the idea is good!

We can introduce our own Closeable, and replace it with java native in 3.0, thank gods the interface is simple

I think you should open an issue about that. But the own closeable should be declared as deprecated from the beginning with the note "will be replaced by java.io.Closeable" in 3.0.


Uwe Schindler added a comment - 10/Jun/09 10:11 AM
Committed revision 783280.

2.4 branch is untouched, if backporting is needed (because somebody has problems with reopen/clone), reopen this issue!


Michael McCandless added a comment - 10/Jun/09 10:54 AM
I'm seeing a failure in back-compat tests ("and test-tag -Dtestcase=TestIndexReader"):
[junit] Testcase: testFalseDirectoryAlreadyClosed(org.apache.lucene.index.TestIndexReader):	FAILED
    [junit] did not hit expected exception
    [junit] junit.framework.AssertionFailedError: did not hit expected exception
    [junit] 	at org.apache.lucene.index.TestIndexReader.testFalseDirectoryAlreadyClosed(TestIndexReader.java:1514)

(I assume, but I'm not certain, it's from this fix...).


Uwe Schindler added a comment - 10/Jun/09 11:45 AM - edited
Thanks Mike,
it is from this fix. The test should normally also fail with trunk, but it doesn't because we are using FSDir.open() in the trunk test. This is another test that relies on the refcounting of FSDir.getDirectory.

The problem:
If you do IndexReader.open() on a invalid index and IndexReader.open fails with an Exception, the Directory keeps open (because the wrapper has no chance to close it). I'll fix this and also enable FSDir.getDirectory for this test in trunk.


Uwe Schindler added a comment - 10/Jun/09 12:00 PM
This fixes this special case and the test on trunk to also hit this, I commit soon!

Uwe Schindler added a comment - 10/Jun/09 12:26 PM
Committed revision 783314. Thanks Mike! Next time I will also test-tag, sorry.

I will go through all other tests using FSDir.open() in trunk now and check, if there are more cases, that rely on refcounting. They can be found easily, because they catch AlreadyClosedException.


Michael McCandless added a comment - 10/Jun/09 12:48 PM
OK thanks Uwe!