|
Thanks for your feedback. I have committed my patch (but moved the testcase to TestMultiReader instead of TestMultiSearcher) and will try to address the other issues you found in the next few days. Thus I'm not closing this issue yet.
i haven't looked atthe patch, but i'm a little confused by the issue summary ... for the benefit of people who might encounter this NPE and find this bug when searching, can we clarify under what circumstances MultiReader has this problem.
after all: a MultiReader is returned by Indexreader.open anytime the index has more then one segment right? ... i can't imagine that no one using a multisegment index has ever tried calling isCurrent() before. is this specific to some special use case (or is the reason we're just now noticing the problem because it's a bug recently introduced in the trunk?) Hoss, you're right, this breaks MultiReader, I will revert the patch. Funny that the test cases didn't notice that. Maybe because they use such a small amount of documents that they never need a MultiReader? The NPE happens when one constructs the MultiReader with its only public constructor, MultiReader(IndexReader[] subReaders). This construtor is never used in Lucene, not even in the test cases.
I thought it would not break MultiReader, just do unnecessary work for that method...?
Same new test (using that (readers[]) constructor) would fail also in previous versions. I think main difference is that for the MultiReader created inside IndexReader, (1) all readers share the same directory, and (2) it maintains a SegmentsInfos read from that single directory. Now this is not the case for the other (but still valid So it seems a possible fix would be:
i wasn't suggesting that the patch was flawed – just trying to clarify what circumstances would cause an "NPE in MultiReader.isCurrent() and getVersion()"
it sounds like my comment has spawned a seperate issue... if there are currently no tests for a multi-directory MultiReader then there certainly should be – if the methods mentioned in this issue all currently throw an Exception on a multi-directory MultiReader we should either: a) define what the meaning of those methods is in that case, and implement them accordingly; or b) make those methods throw UnsupportedOperationException (or somethign similar) in thta case. ...either way, i'd still like clarification as to the orriginal point of this issue ... what was the bug? what would trigger the NPE? > ...either way, i'd still like clarification as to the orriginal point of this
> issue ... what was the bug? what would trigger the NPE? It is triggers by having two ways to construct a MultiReader: In (1) all readers use the same directory, and there is a single SegnentInfos. In (2) there is no single dir and no single SegmentInfos. BTW, after (being surprised by) your first comment on this, I checked in 1.9.1 - the test (of case (2)) fails there as well. so the fundamental issue is two radically different use cases of MultiReader – and these methods really only have meaning when talking about a single directory.
if getVersion, isCurrent and isOptimized, have never worked with a MultiReader constructed using "new MultiReader(IndexReader[])" then throwing UnsupportedOperationException definitely seems like the best course of action ... the semantics of those methods don't really make sense on a multi-directory index. for setNorm we should be able to loop over the sub readers and call setNorm on each right? the 50 thousand dollar question is should directory() be modified to throw UnsupportedOperationException even though it doesn't currently throw an NPE ? ... i think it should. I think the MultiReader(IndexReader[]) constructor should allways call super(null) – anyone currently relying on MultiReader.directory() it to return the directory of the "first" IndexReader should be able to easily change their code. if we want to make it really easy we could provide a MultiReader.getSubReader(int n) method. I agree, except for isCurrent() - why not iterating the readers only for case 2? After all it seems like a useful API also in this case.
let's say we do implement isCurrent for a multi directory MultiReader as a loop over teh sub readers that returns true if all of them return true. If a client calls MultiReader.isCurrent() and gets back "false" ... what do they do with that information?
That information only seems usefull if they know how the MultiReader was built - if they know it was built from multiple readers, then can allways do that loop themselves. if they don't know how the MultiReader was constructed then can't attempt to reopen it so what's the point of knowing wether it's up to date? (argueable the app may just want to provide monitoring info about the low level index: but if that's the case the app should probably get that info at the level where it knows it's open a Reader across multiple directories) In general: if it's never worked, then we aren't under any burden to make it work if there isnt' a clear meaning for the value. One could write an application that groups readers to multiReaders in more than 1 level, i.e. r1,r2,r3 grouped to rr1, r4,r5,r6 grouped to rr2, rr1,rr2 grouped to rrr. If rrr.isCurrent() throws unsupported, the application needs to question recursively.
I am not aware of such an application, so you could argue this is only theoretic, still it demonstrates a strength of Lucene. Also, here too, as argued above, even if the answer is false (not current), the application would need to apply the same recursive logic to reopen the non-current reader and reconstruct the multi-reader. So I agree it is valid to throw unsupported. Just that it feels a bit uncomfortable to throw unsupported for existing API of a method with well defined meaning that is quite easy to implement (relying on that anyhow it was never implemented correctly). Lemme put it this way: I'd rather write a new IndexReaderUtils class, with a static isMultiReaderCurrent(MultiReader) method that uses instanceOf to recursively walk all fo the sub indexes then to make MultiReader.isCurrent() do that ... because then people using hte method are clear about what the value of that boolean means.
> Just that it feels a bit uncomfortable to throw unsupported for existing API of a method with I agree, it feels dirty ... but it feels safer too. i certainly won't obejct if someone commits a recursive method for isCurrent – it's just not my prefrence. I would object to a recursive isOptimized ... that one really doens't make sense at all for a multi-directory MultiReader ... in theory it should allways return false since by definition the index is not a single segment, but if you do that, so code could try to optimize it. I've attached an updated patch that now throws an exception for isCurrent() and that does the same for the other methods which Doron noted not to work with the public constructor either. Except doSetNorm() which seems to work okay without any patch.
comments based on cursory read of latest patch(es)...
1) I still think the IndexReader[] constructor should allways call super(null) ... the current behavior could mask future problems if/when new methods get added to IndexReader. #2 is a good example of why i believe in #1 ... I think the cleanest solution here is it to separate MultiReader into two
classes: MultiSegmentReader (package-protected) and MultiReader (public) that extends MultiSegmentReader. This would also help to implement I'll attach a patch here soon. This patch:
All tests pass. > MultiSegmentReader (package-protected) and MultiReader (public) that extends MultiSegmentReader
Hmm. I've never much liked having a non-public class as a base class for a public class. But I can't think of a good reason, except that it makes the javadoc a bit odd, since the non-public class is named there, when normally everything shown in javadoc is public. > except that it makes the javadoc a bit odd, since the non-public class
> is named there, when normally everything shown in javadoc is public. Is it?? I looked into the javadocs built with this patch and I can't see If there are no objections against separating MultiReader into two classes
I would like to commit my patch in a day or two. I think the javadocs look fine, the class MultiSegmentReader does not appear there. One thing I'm not sure about: Should MultiReader throw an Returning false for isOptimized() seems fine.
> I looked into the javadocs built with this patch and I can't see the name MultiSegmentReader.
Great! They've improved javadoc since I last tried that. I remove my reservations. +1 Committed.
MultiSegmentReader.isOptimized() now always returns false. |
||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
While we are looking at this, there are a few more IndexReader methods
which are not implemented by MultiReader.
These 3 methods seems ok:
would work because IndexReader would send to document(int,FieldSelector)
which is implemented in MultiReader.
would both work because IndexReader implementations goes to termDocs() or
to termPositions(), which both are implemented in MultiReader.
These 3 methods should probably be fixed:
would fail - similar to isCurrent()
would fail too, similar reason.
would not fail, but fall to return the directory of reader[0],
is this a correct behavior?
this is because MultiReader() (constructor) calls super with reader[0] -
again, I am not sure, is this correct? (why allowing to create
a multi-reader with no readers at all?)