Issue Details (XML | Word | Printable)

Key: LUCENE-781
Type: Bug Bug
Status: Resolved Resolved
Resolution: Fixed
Priority: Major Major
Assignee: Michael Busch
Reporter: Daniel Naber
Votes: 0
Watchers: 0
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

NPE in MultiReader.isCurrent() and getVersion()

Created: 22/Jan/07 09:37 PM   Updated: 26/Jul/07 10:52 PM
Return to search
Component/s: Index
Affects Version/s: None
Fix Version/s: None

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works lucene-781.patch 2007-07-24 04:01 AM Michael Busch 32 kB
File Licensed for inclusion in ASF works multireader.diff 2007-01-29 09:36 PM Daniel Naber 4 kB
File Licensed for inclusion in ASF works multireader.diff 2007-01-22 09:39 PM Daniel Naber 1 kB
File Licensed for inclusion in ASF works multireader_test.diff 2007-01-29 09:38 PM Daniel Naber 2 kB
File Licensed for inclusion in ASF works multireader_test.diff 2007-01-22 09:38 PM Daniel Naber 2 kB
Issue Links:
Reference
 

Lucene Fields: New, Patch Available
Resolution Date: 26/Jul/07 10:52 PM


 Description  « Hide
I'm attaching a fix for the NPE in MultiReader.isCurrent() plus a testcase. For getVersion(), we should throw a better exception that NPE. I will commit unless someone objects or has a better idea.

 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
Doron Cohen added a comment - 23/Jan/07 03:35 AM
I checked - the fix is working and code seems right.

While we are looking at this, there are a few more IndexReader methods
which are not implemented by MultiReader.

These 3 methods seems ok:

  • document(int)
    would work because IndexReader would send to document(int,FieldSelector)
    which is implemented in MultiReader.
  • termDocs(Term),
  • termPositions(Term)
    would both work because IndexReader implementations goes to termDocs() or
    to termPositions(), which both are implemented in MultiReader.

These 3 methods should probably be fixed:

  • isOptimized()
    would fail - similar to isCurrent()
  • setNorm(int, String, float)
    would fail too, similar reason.
  • directory()
    would not fail, but fall to return the directory of reader[0],
    is this a correct behavior?
    this is because MultiReader() (constructor) calls super with reader[0] -
    again, I am not sure, is this correct? (why allowing to create
    a multi-reader with no readers at all?)

Daniel Naber added a comment - 23/Jan/07 10:04 PM
Thanks for your feedback. I have committed my patch (but moved the testcase to TestMultiReader instead of TestMultiSearcher) and will try to address the other issues you found in the next few days. Thus I'm not closing this issue yet.

Hoss Man added a comment - 23/Jan/07 10:11 PM
i haven't looked atthe patch, but i'm a little confused by the issue summary ... for the benefit of people who might encounter this NPE and find this bug when searching, can we clarify under what circumstances MultiReader has this problem.

after all: a MultiReader is returned by Indexreader.open anytime the index has more then one segment right? ... i can't imagine that no one using a multisegment index has ever tried calling isCurrent() before.

is this specific to some special use case (or is the reason we're just now noticing the problem because it's a bug recently introduced in the trunk?)


Daniel Naber added a comment - 23/Jan/07 10:43 PM
Hoss, you're right, this breaks MultiReader, I will revert the patch. Funny that the test cases didn't notice that. Maybe because they use such a small amount of documents that they never need a MultiReader? The NPE happens when one constructs the MultiReader with its only public constructor, MultiReader(IndexReader[] subReaders). This construtor is never used in Lucene, not even in the test cases.

Doron Cohen added a comment - 23/Jan/07 11:04 PM
I thought it would not break MultiReader, just do unnecessary work for that method...?

Same new test (using that (readers[]) constructor) would fail also in previous versions.

I think main difference is that for the MultiReader created inside IndexReader, (1) all readers share the same directory, and (2) it maintains a SegmentsInfos read from that single directory.

Now this is not the case for the other (but still valid ) usage of MultiReader - because there is no single directory (well, not necessarily) and hence no SegmentInfos for the MultiReader.

So it seems a possible fix would be:

  • define a boolean e.g. isWholeIndex predicate in MultiReader
  • would be true when constructed with a non null dir and a non null segmentInfos
  • base operation upon it:
  • if isWholeIndex call super.isCurrent() otherwise do the (multi) logic in current fix.

Hoss Man added a comment - 24/Jan/07 07:13 AM
i wasn't suggesting that the patch was flawed – just trying to clarify what circumstances would cause an "NPE in MultiReader.isCurrent() and getVersion()"

it sounds like my comment has spawned a seperate issue...

if there are currently no tests for a multi-directory MultiReader then there certainly should be – if the methods mentioned in this issue all currently throw an Exception on a multi-directory MultiReader we should either: a) define what the meaning of those methods is in that case, and implement them accordingly; or b) make those methods throw UnsupportedOperationException (or somethign similar) in thta case.

...either way, i'd still like clarification as to the orriginal point of this issue ... what was the bug? what would trigger the NPE?


Doron Cohen added a comment - 24/Jan/07 07:32 AM
> ...either way, i'd still like clarification as to the orriginal point of this
> issue ... what was the bug? what would trigger the NPE?

It is triggers by having two ways to construct a MultiReader:
(1) as IndexReader does it for the regular (multi segment) index
MultiReader(Directory directory, SegmentInfos sis, boolean closeDirectory, IndexReader[] subReaders)
(2) as anyone can use it, for aggregating results from any indexes:
MultiReader(IndexReader[] subReaders)

In (1) all readers use the same directory, and there is a single SegnentInfos.
This is the standard, tested way.

In (2) there is no single dir and no single SegmentInfos.
This is the "general", less tested way.
In this option, dir (of the multiReader) is initialized to that of subReader[0].
This seems spooky to me.
Also in this option, SegmentInfos in null.
It makes sense, since readers can be anything - but this is the cause for the NPE.

BTW, after (being surprised by) your first comment on this, I checked in 1.9.1 - the test (of case (2)) fails there as well.


Hoss Man added a comment - 25/Jan/07 10:08 PM
so the fundamental issue is two radically different use cases of MultiReader – and these methods really only have meaning when talking about a single directory.

if getVersion, isCurrent and isOptimized, have never worked with a MultiReader constructed using "new MultiReader(IndexReader[])" then throwing UnsupportedOperationException definitely seems like the best course of action ... the semantics of those methods don't really make sense on a multi-directory index.

for setNorm we should be able to loop over the sub readers and call setNorm on each right?

the 50 thousand dollar question is should directory() be modified to throw UnsupportedOperationException even though it doesn't currently throw an NPE ? ... i think it should. I think the MultiReader(IndexReader[]) constructor should allways call super(null) – anyone currently relying on MultiReader.directory() it to return the directory of the "first" IndexReader should be able to easily change their code. if we want to make it really easy we could provide a MultiReader.getSubReader(int n) method.


Doron Cohen added a comment - 25/Jan/07 10:36 PM
I agree, except for isCurrent() - why not iterating the readers only for case 2? After all it seems like a useful API also in this case.

Hoss Man added a comment - 25/Jan/07 10:56 PM
let's say we do implement isCurrent for a multi directory MultiReader as a loop over teh sub readers that returns true if all of them return true. If a client calls MultiReader.isCurrent() and gets back "false" ... what do they do with that information?

That information only seems usefull if they know how the MultiReader was built - if they know it was built from multiple readers, then can allways do that loop themselves. if they don't know how the MultiReader was constructed then can't attempt to reopen it so what's the point of knowing wether it's up to date?

(argueable the app may just want to provide monitoring info about the low level index: but if that's the case the app should probably get that info at the level where it knows it's open a Reader across multiple directories)

In general: if it's never worked, then we aren't under any burden to make it work if there isnt' a clear meaning for the value.


Doron Cohen added a comment - 26/Jan/07 11:03 PM
One could write an application that groups readers to multiReaders in more than 1 level, i.e. r1,r2,r3 grouped to rr1, r4,r5,r6 grouped to rr2, rr1,rr2 grouped to rrr. If rrr.isCurrent() throws unsupported, the application needs to question recursively.

I am not aware of such an application, so you could argue this is only theoretic, still it demonstrates a strength of Lucene. Also, here too, as argued above, even if the answer is false (not current), the application would need to apply the same recursive logic to reopen the non-current reader and reconstruct the multi-reader.

So I agree it is valid to throw unsupported.

Just that it feels a bit uncomfortable to throw unsupported for existing API of a method with well defined meaning that is quite easy to implement (relying on that anyhow it was never implemented correctly).


Hoss Man added a comment - 26/Jan/07 11:18 PM
Lemme put it this way: I'd rather write a new IndexReaderUtils class, with a static isMultiReaderCurrent(MultiReader) method that uses instanceOf to recursively walk all fo the sub indexes then to make MultiReader.isCurrent() do that ... because then people using hte method are clear about what the value of that boolean means.

> Just that it feels a bit uncomfortable to throw unsupported for existing API of a method with
> well defined meaning that is quite easy to implement (relying on that anyhow it was never
> implemented correctly).

I agree, it feels dirty ... but it feels safer too.

i certainly won't obejct if someone commits a recursive method for isCurrent – it's just not my prefrence. I would object to a recursive isOptimized ... that one really doens't make sense at all for a multi-directory MultiReader ... in theory it should allways return false since by definition the index is not a single segment, but if you do that, so code could try to optimize it.


Daniel Naber added a comment - 29/Jan/07 09:36 PM
updated patch

Daniel Naber added a comment - 29/Jan/07 09:38 PM
updated patch

Daniel Naber added a comment - 29/Jan/07 09:53 PM
I've attached an updated patch that now throws an exception for isCurrent() and that does the same for the other methods which Doron noted not to work with the public constructor either. Except doSetNorm() which seems to work okay without any patch.

Hoss Man added a comment - 29/Jan/07 11:11 PM
comments based on cursory read of latest patch(es)...

1) I still think the IndexReader[] constructor should allways call super(null) ... the current behavior could mask future problems if/when new methods get added to IndexReader.
2) what about MultiReader.directory() ? ... shoulnd't that throw Unsupported if false == hasSegmentInfos ?

#2 is a good example of why i believe in #1 ...


Michael Busch added a comment - 24/Jul/07 01:32 AM
I think the cleanest solution here is it to separate MultiReader into two
classes: MultiSegmentReader (package-protected) and MultiReader
(public) that extends MultiSegmentReader.
This would also help to implement LUCENE-743 cleaner.

I'll attach a patch here soon.


Michael Busch added a comment - 24/Jul/07 04:01 AM
This patch:
  • Adds the new class MultiSegmentReader which contains almost
    all code from MultiReader, except the public constructor.
  • Makes MultiTermEnum, MultiTermDocs and MultiTermPositions
    inner, static classes of MultiSegmentReader.
  • Adds the method isCurrent() to MultiReader, which recursively
    checks if all subreaders are up to date.
  • MultiReader now throws UnsupportedOperationException when
    isOptimized() or getVersion() is called.
  • Enables the isCurrent() test in TestMultiReader that was
    disabled due to this issue.

All tests pass.


Doug Cutting added a comment - 24/Jul/07 01:12 PM
> MultiSegmentReader (package-protected) and MultiReader (public) that extends MultiSegmentReader

Hmm. I've never much liked having a non-public class as a base class for a public class. But I can't think of a good reason, except that it makes the javadoc a bit odd, since the non-public class is named there, when normally everything shown in javadoc is public.


Michael Busch added a comment - 24/Jul/07 06:27 PM
> except that it makes the javadoc a bit odd, since the non-public class
> is named there, when normally everything shown in javadoc is public.

Is it?? I looked into the javadocs built with this patch and I can't see
the name MultiSegmentReader. It looks like before, as if MultiReader
would still extend IndexReader:
http://people.apache.org/~buschmi/lucene-781/api/org/apache/lucene/index/MultiReader.html


Michael Busch added a comment - 25/Jul/07 06:52 PM
If there are no objections against separating MultiReader into two classes
I would like to commit my patch in a day or two. I think the javadocs look
fine, the class MultiSegmentReader does not appear there.

One thing I'm not sure about: Should MultiReader throw an
UnsupportedOperationException or simply return false when isOptimized() is
called?


Yonik Seeley added a comment - 25/Jul/07 07:11 PM
Returning false for isOptimized() seems fine.

Doug Cutting added a comment - 26/Jul/07 05:33 PM
> I looked into the javadocs built with this patch and I can't see the name MultiSegmentReader.

Great! They've improved javadoc since I last tried that. I remove my reservations. +1


Michael Busch added a comment - 26/Jul/07 10:52 PM
Committed.

MultiSegmentReader.isOptimized() now always returns false.