Lucene - Core
  1. Lucene - Core
  2. LUCENE-781

NPE in MultiReader.isCurrent() and getVersion()

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I'm attaching a fix for the NPE in MultiReader.isCurrent() plus a testcase. For getVersion(), we should throw a better exception that NPE. I will commit unless someone objects or has a better idea.

      1. lucene-781.patch
        32 kB
        Michael Busch
      2. multireader_test.diff
        2 kB
        Daniel Naber
      3. multireader.diff
        4 kB
        Daniel Naber
      4. multireader.diff
        1 kB
        Daniel Naber
      5. multireader_test.diff
        2 kB
        Daniel Naber

        Issue Links

          Activity

          Daniel Naber created issue -
          Daniel Naber made changes -
          Field Original Value New Value
          Attachment multireader_test.diff [ 12349400 ]
          Daniel Naber made changes -
          Attachment multireader.diff [ 12349401 ]
          Hide
          Doron Cohen added a comment -

          I checked - the fix is working and code seems right.

          While we are looking at this, there are a few more IndexReader methods
          which are not implemented by MultiReader.

          These 3 methods seems ok:

          • document(int)
            would work because IndexReader would send to document(int,FieldSelector)
            which is implemented in MultiReader.
          • termDocs(Term),
          • termPositions(Term)
            would both work because IndexReader implementations goes to termDocs() or
            to termPositions(), which both are implemented in MultiReader.

          These 3 methods should probably be fixed:

          • isOptimized()
            would fail - similar to isCurrent()
          • setNorm(int, String, float)
            would fail too, similar reason.
          • directory()
            would not fail, but fall to return the directory of reader[0],
            is this a correct behavior?
            this is because MultiReader() (constructor) calls super with reader[0] -
            again, I am not sure, is this correct? (why allowing to create
            a multi-reader with no readers at all?)
          Show
          Doron Cohen added a comment - I checked - the fix is working and code seems right. While we are looking at this, there are a few more IndexReader methods which are not implemented by MultiReader. These 3 methods seems ok: document(int) would work because IndexReader would send to document(int,FieldSelector) which is implemented in MultiReader. termDocs(Term), termPositions(Term) would both work because IndexReader implementations goes to termDocs() or to termPositions(), which both are implemented in MultiReader. These 3 methods should probably be fixed: isOptimized() would fail - similar to isCurrent() setNorm(int, String, float) would fail too, similar reason. directory() would not fail, but fall to return the directory of reader [0] , is this a correct behavior? this is because MultiReader() (constructor) calls super with reader [0] - again, I am not sure, is this correct? (why allowing to create a multi-reader with no readers at all?)
          Daniel Naber made changes -
          Summary NPe in MultiReader.isCurrent() and getVersion() NPE in MultiReader.isCurrent() and getVersion()
          Hide
          Daniel Naber added a comment -

          Thanks for your feedback. I have committed my patch (but moved the testcase to TestMultiReader instead of TestMultiSearcher) and will try to address the other issues you found in the next few days. Thus I'm not closing this issue yet.

          Show
          Daniel Naber added a comment - Thanks for your feedback. I have committed my patch (but moved the testcase to TestMultiReader instead of TestMultiSearcher) and will try to address the other issues you found in the next few days. Thus I'm not closing this issue yet.
          Hide
          Hoss Man added a comment -

          i haven't looked atthe patch, but i'm a little confused by the issue summary ... for the benefit of people who might encounter this NPE and find this bug when searching, can we clarify under what circumstances MultiReader has this problem.

          after all: a MultiReader is returned by Indexreader.open anytime the index has more then one segment right? ... i can't imagine that no one using a multisegment index has ever tried calling isCurrent() before.

          is this specific to some special use case (or is the reason we're just now noticing the problem because it's a bug recently introduced in the trunk?)

          Show
          Hoss Man added a comment - i haven't looked atthe patch, but i'm a little confused by the issue summary ... for the benefit of people who might encounter this NPE and find this bug when searching, can we clarify under what circumstances MultiReader has this problem. after all: a MultiReader is returned by Indexreader.open anytime the index has more then one segment right? ... i can't imagine that no one using a multisegment index has ever tried calling isCurrent() before. is this specific to some special use case (or is the reason we're just now noticing the problem because it's a bug recently introduced in the trunk?)
          Hide
          Daniel Naber added a comment -

          Hoss, you're right, this breaks MultiReader, I will revert the patch. Funny that the test cases didn't notice that. Maybe because they use such a small amount of documents that they never need a MultiReader? The NPE happens when one constructs the MultiReader with its only public constructor, MultiReader(IndexReader[] subReaders). This construtor is never used in Lucene, not even in the test cases.

          Show
          Daniel Naber added a comment - Hoss, you're right, this breaks MultiReader, I will revert the patch. Funny that the test cases didn't notice that. Maybe because they use such a small amount of documents that they never need a MultiReader? The NPE happens when one constructs the MultiReader with its only public constructor, MultiReader(IndexReader[] subReaders). This construtor is never used in Lucene, not even in the test cases.
          Hide
          Doron Cohen added a comment -

          I thought it would not break MultiReader, just do unnecessary work for that method...?

          Same new test (using that (readers[]) constructor) would fail also in previous versions.

          I think main difference is that for the MultiReader created inside IndexReader, (1) all readers share the same directory, and (2) it maintains a SegmentsInfos read from that single directory.

          Now this is not the case for the other (but still valid ) usage of MultiReader - because there is no single directory (well, not necessarily) and hence no SegmentInfos for the MultiReader.

          So it seems a possible fix would be:

          • define a boolean e.g. isWholeIndex predicate in MultiReader
          • would be true when constructed with a non null dir and a non null segmentInfos
          • base operation upon it:
          • if isWholeIndex call super.isCurrent() otherwise do the (multi) logic in current fix.
          Show
          Doron Cohen added a comment - I thought it would not break MultiReader, just do unnecessary work for that method...? Same new test (using that (readers[]) constructor) would fail also in previous versions. I think main difference is that for the MultiReader created inside IndexReader, (1) all readers share the same directory, and (2) it maintains a SegmentsInfos read from that single directory. Now this is not the case for the other (but still valid ) usage of MultiReader - because there is no single directory (well, not necessarily) and hence no SegmentInfos for the MultiReader. So it seems a possible fix would be: define a boolean e.g. isWholeIndex predicate in MultiReader would be true when constructed with a non null dir and a non null segmentInfos base operation upon it: if isWholeIndex call super.isCurrent() otherwise do the (multi) logic in current fix.
          Hide
          Hoss Man added a comment -

          i wasn't suggesting that the patch was flawed – just trying to clarify what circumstances would cause an "NPE in MultiReader.isCurrent() and getVersion()"

          it sounds like my comment has spawned a seperate issue...

          if there are currently no tests for a multi-directory MultiReader then there certainly should be – if the methods mentioned in this issue all currently throw an Exception on a multi-directory MultiReader we should either: a) define what the meaning of those methods is in that case, and implement them accordingly; or b) make those methods throw UnsupportedOperationException (or somethign similar) in thta case.

          ...either way, i'd still like clarification as to the orriginal point of this issue ... what was the bug? what would trigger the NPE?

          Show
          Hoss Man added a comment - i wasn't suggesting that the patch was flawed – just trying to clarify what circumstances would cause an "NPE in MultiReader.isCurrent() and getVersion()" it sounds like my comment has spawned a seperate issue... if there are currently no tests for a multi-directory MultiReader then there certainly should be – if the methods mentioned in this issue all currently throw an Exception on a multi-directory MultiReader we should either: a) define what the meaning of those methods is in that case, and implement them accordingly; or b) make those methods throw UnsupportedOperationException (or somethign similar) in thta case. ...either way, i'd still like clarification as to the orriginal point of this issue ... what was the bug? what would trigger the NPE?
          Hide
          Doron Cohen added a comment -

          > ...either way, i'd still like clarification as to the orriginal point of this
          > issue ... what was the bug? what would trigger the NPE?

          It is triggers by having two ways to construct a MultiReader:
          (1) as IndexReader does it for the regular (multi segment) index
          MultiReader(Directory directory, SegmentInfos sis, boolean closeDirectory, IndexReader[] subReaders)
          (2) as anyone can use it, for aggregating results from any indexes:
          MultiReader(IndexReader[] subReaders)

          In (1) all readers use the same directory, and there is a single SegnentInfos.
          This is the standard, tested way.

          In (2) there is no single dir and no single SegmentInfos.
          This is the "general", less tested way.
          In this option, dir (of the multiReader) is initialized to that of subReader[0].
          This seems spooky to me.
          Also in this option, SegmentInfos in null.
          It makes sense, since readers can be anything - but this is the cause for the NPE.

          BTW, after (being surprised by) your first comment on this, I checked in 1.9.1 - the test (of case (2)) fails there as well.

          Show
          Doron Cohen added a comment - > ...either way, i'd still like clarification as to the orriginal point of this > issue ... what was the bug? what would trigger the NPE? It is triggers by having two ways to construct a MultiReader: (1) as IndexReader does it for the regular (multi segment) index MultiReader(Directory directory, SegmentInfos sis, boolean closeDirectory, IndexReader[] subReaders) (2) as anyone can use it, for aggregating results from any indexes: MultiReader(IndexReader[] subReaders) In (1) all readers use the same directory, and there is a single SegnentInfos. This is the standard, tested way. In (2) there is no single dir and no single SegmentInfos. This is the "general", less tested way. In this option, dir (of the multiReader) is initialized to that of subReader [0] . This seems spooky to me. Also in this option, SegmentInfos in null. It makes sense, since readers can be anything - but this is the cause for the NPE. BTW, after (being surprised by) your first comment on this, I checked in 1.9.1 - the test (of case (2)) fails there as well.
          Hide
          Hoss Man added a comment -

          so the fundamental issue is two radically different use cases of MultiReader – and these methods really only have meaning when talking about a single directory.

          if getVersion, isCurrent and isOptimized, have never worked with a MultiReader constructed using "new MultiReader(IndexReader[])" then throwing UnsupportedOperationException definitely seems like the best course of action ... the semantics of those methods don't really make sense on a multi-directory index.

          for setNorm we should be able to loop over the sub readers and call setNorm on each right?

          the 50 thousand dollar question is should directory() be modified to throw UnsupportedOperationException even though it doesn't currently throw an NPE ? ... i think it should. I think the MultiReader(IndexReader[]) constructor should allways call super(null) – anyone currently relying on MultiReader.directory() it to return the directory of the "first" IndexReader should be able to easily change their code. if we want to make it really easy we could provide a MultiReader.getSubReader(int n) method.

          Show
          Hoss Man added a comment - so the fundamental issue is two radically different use cases of MultiReader – and these methods really only have meaning when talking about a single directory. if getVersion, isCurrent and isOptimized, have never worked with a MultiReader constructed using "new MultiReader(IndexReader[])" then throwing UnsupportedOperationException definitely seems like the best course of action ... the semantics of those methods don't really make sense on a multi-directory index. for setNorm we should be able to loop over the sub readers and call setNorm on each right? the 50 thousand dollar question is should directory() be modified to throw UnsupportedOperationException even though it doesn't currently throw an NPE ? ... i think it should. I think the MultiReader(IndexReader[]) constructor should allways call super(null) – anyone currently relying on MultiReader.directory() it to return the directory of the "first" IndexReader should be able to easily change their code. if we want to make it really easy we could provide a MultiReader.getSubReader(int n) method.
          Hide
          Doron Cohen added a comment -

          I agree, except for isCurrent() - why not iterating the readers only for case 2? After all it seems like a useful API also in this case.

          Show
          Doron Cohen added a comment - I agree, except for isCurrent() - why not iterating the readers only for case 2? After all it seems like a useful API also in this case.
          Hide
          Hoss Man added a comment -

          let's say we do implement isCurrent for a multi directory MultiReader as a loop over teh sub readers that returns true if all of them return true. If a client calls MultiReader.isCurrent() and gets back "false" ... what do they do with that information?

          That information only seems usefull if they know how the MultiReader was built - if they know it was built from multiple readers, then can allways do that loop themselves. if they don't know how the MultiReader was constructed then can't attempt to reopen it so what's the point of knowing wether it's up to date?

          (argueable the app may just want to provide monitoring info about the low level index: but if that's the case the app should probably get that info at the level where it knows it's open a Reader across multiple directories)

          In general: if it's never worked, then we aren't under any burden to make it work if there isnt' a clear meaning for the value.

          Show
          Hoss Man added a comment - let's say we do implement isCurrent for a multi directory MultiReader as a loop over teh sub readers that returns true if all of them return true. If a client calls MultiReader.isCurrent() and gets back "false" ... what do they do with that information? That information only seems usefull if they know how the MultiReader was built - if they know it was built from multiple readers, then can allways do that loop themselves. if they don't know how the MultiReader was constructed then can't attempt to reopen it so what's the point of knowing wether it's up to date? (argueable the app may just want to provide monitoring info about the low level index: but if that's the case the app should probably get that info at the level where it knows it's open a Reader across multiple directories) In general: if it's never worked, then we aren't under any burden to make it work if there isnt' a clear meaning for the value.
          Hide
          Doron Cohen added a comment -

          One could write an application that groups readers to multiReaders in more than 1 level, i.e. r1,r2,r3 grouped to rr1, r4,r5,r6 grouped to rr2, rr1,rr2 grouped to rrr. If rrr.isCurrent() throws unsupported, the application needs to question recursively.

          I am not aware of such an application, so you could argue this is only theoretic, still it demonstrates a strength of Lucene. Also, here too, as argued above, even if the answer is false (not current), the application would need to apply the same recursive logic to reopen the non-current reader and reconstruct the multi-reader.

          So I agree it is valid to throw unsupported.

          Just that it feels a bit uncomfortable to throw unsupported for existing API of a method with well defined meaning that is quite easy to implement (relying on that anyhow it was never implemented correctly).

          Show
          Doron Cohen added a comment - One could write an application that groups readers to multiReaders in more than 1 level, i.e. r1,r2,r3 grouped to rr1, r4,r5,r6 grouped to rr2, rr1,rr2 grouped to rrr. If rrr.isCurrent() throws unsupported, the application needs to question recursively. I am not aware of such an application, so you could argue this is only theoretic, still it demonstrates a strength of Lucene. Also, here too, as argued above, even if the answer is false (not current), the application would need to apply the same recursive logic to reopen the non-current reader and reconstruct the multi-reader. So I agree it is valid to throw unsupported. Just that it feels a bit uncomfortable to throw unsupported for existing API of a method with well defined meaning that is quite easy to implement (relying on that anyhow it was never implemented correctly).
          Hide
          Hoss Man added a comment -

          Lemme put it this way: I'd rather write a new IndexReaderUtils class, with a static isMultiReaderCurrent(MultiReader) method that uses instanceOf to recursively walk all fo the sub indexes then to make MultiReader.isCurrent() do that ... because then people using hte method are clear about what the value of that boolean means.

          > Just that it feels a bit uncomfortable to throw unsupported for existing API of a method with
          > well defined meaning that is quite easy to implement (relying on that anyhow it was never
          > implemented correctly).

          I agree, it feels dirty ... but it feels safer too.

          i certainly won't obejct if someone commits a recursive method for isCurrent – it's just not my prefrence. I would object to a recursive isOptimized ... that one really doens't make sense at all for a multi-directory MultiReader ... in theory it should allways return false since by definition the index is not a single segment, but if you do that, so code could try to optimize it.

          Show
          Hoss Man added a comment - Lemme put it this way: I'd rather write a new IndexReaderUtils class, with a static isMultiReaderCurrent(MultiReader) method that uses instanceOf to recursively walk all fo the sub indexes then to make MultiReader.isCurrent() do that ... because then people using hte method are clear about what the value of that boolean means. > Just that it feels a bit uncomfortable to throw unsupported for existing API of a method with > well defined meaning that is quite easy to implement (relying on that anyhow it was never > implemented correctly). I agree, it feels dirty ... but it feels safer too. i certainly won't obejct if someone commits a recursive method for isCurrent – it's just not my prefrence. I would object to a recursive isOptimized ... that one really doens't make sense at all for a multi-directory MultiReader ... in theory it should allways return false since by definition the index is not a single segment, but if you do that, so code could try to optimize it.
          Hide
          Daniel Naber added a comment -

          updated patch

          Show
          Daniel Naber added a comment - updated patch
          Daniel Naber made changes -
          Attachment multireader.diff [ 12349858 ]
          Hide
          Daniel Naber added a comment -

          updated patch

          Show
          Daniel Naber added a comment - updated patch
          Daniel Naber made changes -
          Attachment multireader_test.diff [ 12349859 ]
          Hide
          Daniel Naber added a comment -

          I've attached an updated patch that now throws an exception for isCurrent() and that does the same for the other methods which Doron noted not to work with the public constructor either. Except doSetNorm() which seems to work okay without any patch.

          Show
          Daniel Naber added a comment - I've attached an updated patch that now throws an exception for isCurrent() and that does the same for the other methods which Doron noted not to work with the public constructor either. Except doSetNorm() which seems to work okay without any patch.
          Hide
          Hoss Man added a comment -

          comments based on cursory read of latest patch(es)...

          1) I still think the IndexReader[] constructor should allways call super(null) ... the current behavior could mask future problems if/when new methods get added to IndexReader.
          2) what about MultiReader.directory() ? ... shoulnd't that throw Unsupported if false == hasSegmentInfos ?

          #2 is a good example of why i believe in #1 ...

          Show
          Hoss Man added a comment - comments based on cursory read of latest patch(es)... 1) I still think the IndexReader[] constructor should allways call super(null) ... the current behavior could mask future problems if/when new methods get added to IndexReader. 2) what about MultiReader.directory() ? ... shoulnd't that throw Unsupported if false == hasSegmentInfos ? #2 is a good example of why i believe in #1 ...
          Hoss Man made changes -
          Link This issue is related to LUCENE-832 [ LUCENE-832 ]
          Michael Busch made changes -
          Assignee Michael Busch [ michaelbusch ]
          Hide
          Michael Busch added a comment -

          I think the cleanest solution here is it to separate MultiReader into two
          classes: MultiSegmentReader (package-protected) and MultiReader
          (public) that extends MultiSegmentReader.
          This would also help to implement LUCENE-743 cleaner.

          I'll attach a patch here soon.

          Show
          Michael Busch added a comment - I think the cleanest solution here is it to separate MultiReader into two classes: MultiSegmentReader (package-protected) and MultiReader (public) that extends MultiSegmentReader. This would also help to implement LUCENE-743 cleaner. I'll attach a patch here soon.
          Hide
          Michael Busch added a comment -

          This patch:

          • Adds the new class MultiSegmentReader which contains almost
            all code from MultiReader, except the public constructor.
          • Makes MultiTermEnum, MultiTermDocs and MultiTermPositions
            inner, static classes of MultiSegmentReader.
          • Adds the method isCurrent() to MultiReader, which recursively
            checks if all subreaders are up to date.
          • MultiReader now throws UnsupportedOperationException when
            isOptimized() or getVersion() is called.
          • Enables the isCurrent() test in TestMultiReader that was
            disabled due to this issue.

          All tests pass.

          Show
          Michael Busch added a comment - This patch: Adds the new class MultiSegmentReader which contains almost all code from MultiReader, except the public constructor. Makes MultiTermEnum, MultiTermDocs and MultiTermPositions inner, static classes of MultiSegmentReader. Adds the method isCurrent() to MultiReader, which recursively checks if all subreaders are up to date. MultiReader now throws UnsupportedOperationException when isOptimized() or getVersion() is called. Enables the isCurrent() test in TestMultiReader that was disabled due to this issue. All tests pass.
          Michael Busch made changes -
          Attachment lucene-781.patch [ 12362394 ]
          Hide
          Doug Cutting added a comment -

          > MultiSegmentReader (package-protected) and MultiReader (public) that extends MultiSegmentReader

          Hmm. I've never much liked having a non-public class as a base class for a public class. But I can't think of a good reason, except that it makes the javadoc a bit odd, since the non-public class is named there, when normally everything shown in javadoc is public.

          Show
          Doug Cutting added a comment - > MultiSegmentReader (package-protected) and MultiReader (public) that extends MultiSegmentReader Hmm. I've never much liked having a non-public class as a base class for a public class. But I can't think of a good reason, except that it makes the javadoc a bit odd, since the non-public class is named there, when normally everything shown in javadoc is public.
          Hide
          Michael Busch added a comment -

          > except that it makes the javadoc a bit odd, since the non-public class
          > is named there, when normally everything shown in javadoc is public.

          Is it?? I looked into the javadocs built with this patch and I can't see
          the name MultiSegmentReader. It looks like before, as if MultiReader
          would still extend IndexReader:
          http://people.apache.org/~buschmi/lucene-781/api/org/apache/lucene/index/MultiReader.html

          Show
          Michael Busch added a comment - > except that it makes the javadoc a bit odd, since the non-public class > is named there, when normally everything shown in javadoc is public. Is it?? I looked into the javadocs built with this patch and I can't see the name MultiSegmentReader. It looks like before, as if MultiReader would still extend IndexReader: http://people.apache.org/~buschmi/lucene-781/api/org/apache/lucene/index/MultiReader.html
          Hide
          Michael Busch added a comment -

          If there are no objections against separating MultiReader into two classes
          I would like to commit my patch in a day or two. I think the javadocs look
          fine, the class MultiSegmentReader does not appear there.

          One thing I'm not sure about: Should MultiReader throw an
          UnsupportedOperationException or simply return false when isOptimized() is
          called?

          Show
          Michael Busch added a comment - If there are no objections against separating MultiReader into two classes I would like to commit my patch in a day or two. I think the javadocs look fine, the class MultiSegmentReader does not appear there. One thing I'm not sure about: Should MultiReader throw an UnsupportedOperationException or simply return false when isOptimized() is called?
          Hide
          Yonik Seeley added a comment -

          Returning false for isOptimized() seems fine.

          Show
          Yonik Seeley added a comment - Returning false for isOptimized() seems fine.
          Hide
          Doug Cutting added a comment -

          > I looked into the javadocs built with this patch and I can't see the name MultiSegmentReader.

          Great! They've improved javadoc since I last tried that. I remove my reservations. +1

          Show
          Doug Cutting added a comment - > I looked into the javadocs built with this patch and I can't see the name MultiSegmentReader. Great! They've improved javadoc since I last tried that. I remove my reservations. +1
          Hide
          Michael Busch added a comment -

          Committed.

          MultiSegmentReader.isOptimized() now always returns false.

          Show
          Michael Busch added a comment - Committed. MultiSegmentReader.isOptimized() now always returns false.
          Michael Busch made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Mark Thomas made changes -
          Workflow jira [ 12394810 ] Default workflow, editable Closed status [ 12562314 ]
          Mark Thomas made changes -
          Workflow Default workflow, editable Closed status [ 12562314 ] jira [ 12583314 ]

            People

            • Assignee:
              Michael Busch
              Reporter:
              Daniel Naber
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development