Lucene - Core
  1. Lucene - Core
  2. LUCENE-2144

InstantiatedIndexReader does not handle #termDocs(null) correct (AllTermDocs)

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 2.9, 2.9.1, 3.0
    • Fix Version/s: 2.9.2, 3.0.1, 4.0-ALPHA
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      This patch contains core changes so someone else needs to commit it.

      Due to the incompatible #termDocs(null) behaviour at least MatchAllDocsQuery, FieldCacheRangeFilter and ValueSourceQuery fails using II since 2.9.

      AllTermDocs now has a superclass, AbstractAllTermDocs that also InstantiatedAllTermDocs extend.

      Also:

      • II-tests made less plausable to pass on future incompatible changes to TermDocs and TermEnum
      • IITermDocs#skipTo and #next mimics the behaviour of document posisioning from SegmentTermDocs#dito when returning false
      • II now uses BitVector rather than sets for deleted documents
      1. LUCENE-2144.txt
        28 kB
        Karl Wettin
      2. LUCENE-2144-30.patch
        27 kB
        Michael McCandless

        Activity

        Hide
        Karl Wettin added a comment -

        BUILD SUCCESSFUL
        Total time: 36 minutes 4 seconds

        Show
        Karl Wettin added a comment - BUILD SUCCESSFUL Total time: 36 minutes 4 seconds
        Hide
        Michael McCandless added a comment -

        Patch looks good; tests pass. I'll commit shortly.

        The core change is to just make it possible to override isDeleted logic in AllTermDocs.

        The flex API makes this easier – you must provided a Bits (new interface), specifying the docs you want to skip, when you request the DocsEnum.

        We should fix this on at least 3.0 as well right?

        Show
        Michael McCandless added a comment - Patch looks good; tests pass. I'll commit shortly. The core change is to just make it possible to override isDeleted logic in AllTermDocs. The flex API makes this easier – you must provided a Bits (new interface), specifying the docs you want to skip, when you request the DocsEnum. We should fix this on at least 3.0 as well right?
        Hide
        Karl Wettin added a comment -

        We should fix this on at least 3.0 as well right?

        Would be great if you had the bandwidth to fix that.

        Show
        Karl Wettin added a comment - We should fix this on at least 3.0 as well right? Would be great if you had the bandwidth to fix that.
        Hide
        Michael McCandless added a comment -

        Hmm – I did the svn merge back to 3.0, but I hit these test failures:

            [junit] Testcase: testLoadIndexReader(org.apache.lucene.store.instantiated.TestIndicesEquals):	FAILED
            [junit] null
            [junit] junit.framework.AssertionFailedError
            [junit] 	at org.apache.lucene.store.instantiated.TestIndicesEquals.testTermDocsSomeMore(TestIndicesEquals.java:226)
            [junit] 	at org.apache.lucene.store.instantiated.TestIndicesEquals.testEquals(TestIndicesEquals.java:363)
            [junit] 	at org.apache.lucene.store.instantiated.TestIndicesEquals.testEqualBehaviour(TestIndicesEquals.java:306)
            [junit] 	at org.apache.lucene.store.instantiated.TestIndicesEquals.testLoadIndexReader(TestIndicesEquals.java:77)
            [junit] 
            [junit] 
            [junit] Testcase: testInstantiatedIndexWriter(org.apache.lucene.store.instantiated.TestIndicesEquals):	FAILED
            [junit] null
            [junit] junit.framework.AssertionFailedError
            [junit] 	at org.apache.lucene.store.instantiated.TestIndicesEquals.testTermDocsSomeMore(TestIndicesEquals.java:226)
            [junit] 	at org.apache.lucene.store.instantiated.TestIndicesEquals.testEquals(TestIndicesEquals.java:363)
            [junit] 	at org.apache.lucene.store.instantiated.TestIndicesEquals.testEqualBehaviour(TestIndicesEquals.java:306)
            [junit] 	at org.apache.lucene.store.instantiated.TestIndicesEquals.testInstantiatedIndexWriter(TestIndicesEquals.java:106)
            [junit] 
        

        Any ideas?

        Show
        Michael McCandless added a comment - Hmm – I did the svn merge back to 3.0, but I hit these test failures: [junit] Testcase: testLoadIndexReader(org.apache.lucene.store.instantiated.TestIndicesEquals): FAILED [junit] null [junit] junit.framework.AssertionFailedError [junit] at org.apache.lucene.store.instantiated.TestIndicesEquals.testTermDocsSomeMore(TestIndicesEquals.java:226) [junit] at org.apache.lucene.store.instantiated.TestIndicesEquals.testEquals(TestIndicesEquals.java:363) [junit] at org.apache.lucene.store.instantiated.TestIndicesEquals.testEqualBehaviour(TestIndicesEquals.java:306) [junit] at org.apache.lucene.store.instantiated.TestIndicesEquals.testLoadIndexReader(TestIndicesEquals.java:77) [junit] [junit] [junit] Testcase: testInstantiatedIndexWriter(org.apache.lucene.store.instantiated.TestIndicesEquals): FAILED [junit] null [junit] junit.framework.AssertionFailedError [junit] at org.apache.lucene.store.instantiated.TestIndicesEquals.testTermDocsSomeMore(TestIndicesEquals.java:226) [junit] at org.apache.lucene.store.instantiated.TestIndicesEquals.testEquals(TestIndicesEquals.java:363) [junit] at org.apache.lucene.store.instantiated.TestIndicesEquals.testEqualBehaviour(TestIndicesEquals.java:306) [junit] at org.apache.lucene.store.instantiated.TestIndicesEquals.testInstantiatedIndexWriter(TestIndicesEquals.java:106) [junit] Any ideas?
        Hide
        Karl Wettin added a comment -

        at org.apache.lucene.store.instantiated.TestIndicesEquals.testTermDocsSomeMore(TestIndicesEquals.java:226)

        I have no idea. How do I merge back locally so I can debug it?

        Show
        Karl Wettin added a comment - at org.apache.lucene.store.instantiated.TestIndicesEquals.testTermDocsSomeMore(TestIndicesEquals.java:226) I have no idea. How do I merge back locally so I can debug it?
        Hide
        Michael McCandless added a comment -

        Attached diffs from my merge back onto 3.0.

        Show
        Michael McCandless added a comment - Attached diffs from my merge back onto 3.0.
        Hide
        Karl Wettin added a comment -

        Committed change to trunk.

        In 3.0 comment out ~line 227 in TestIndicesEquals

        // this is invalid use of the API,
        // but if the response differs then it's an indication that something might have changed.
        // in 2.9 and 3.0 the two TermDocs-implementations returned different values at this point.
        assertEquals("Descripency during invalid use of the TermDocs API, see comments in test code for details.",
        aprioriTermDocs.next(), testTermDocs.next());

        Show
        Karl Wettin added a comment - Committed change to trunk. In 3.0 comment out ~line 227 in TestIndicesEquals // this is invalid use of the API, // but if the response differs then it's an indication that something might have changed. // in 2.9 and 3.0 the two TermDocs-implementations returned different values at this point. assertEquals("Descripency during invalid use of the TermDocs API, see comments in test code for details.", aprioriTermDocs.next(), testTermDocs.next());
        Hide
        Michael McCandless added a comment -

        I think we shouldn't test the invalid case? EG trunk could change [again] sometime, which'd cause a false failure here? Ie, we should just remove the test of the un-seek'd TermDocs.next() calls?

        Show
        Michael McCandless added a comment - I think we shouldn't test the invalid case? EG trunk could change [again] sometime, which'd cause a false failure here? Ie, we should just remove the test of the un-seek'd TermDocs.next() calls?
        Hide
        Karl Wettin added a comment -

        I don't have any strong feelings about this line of code, but let me at least explain it.

        I like the idea that IIFoo behaves the same way a SegementFoo, even during incorrect/undocumented use of the API.

        There are no real use cases for this in the Lucene distribution, there are however effects people might use even though caused by invalid use of the API and not recommened. E.g. a skipTo to a target greater than the greatest document associated with that term will position the enum at the greatest document number for that term. Even though I wouldn't do something like this others might.

        In this case, where an immediate #next() on IR#termDocs() is called, it's might look silly to compare the behaviour of II and Segment as it's such blatantly erroneous use of the API, but even I have been known to come up with some rather strange solution now and then when nobody else is looking.

        One alternative is that #next would produce an InvalidStateException or something instead of just accepting the call, but then there is of course the small extra cost associated with checking if the enum has been seeked yet and #next is a rather commonly used method.

        Show
        Karl Wettin added a comment - I don't have any strong feelings about this line of code, but let me at least explain it. I like the idea that IIFoo behaves the same way a SegementFoo, even during incorrect/undocumented use of the API. There are no real use cases for this in the Lucene distribution, there are however effects people might use even though caused by invalid use of the API and not recommened. E.g. a skipTo to a target greater than the greatest document associated with that term will position the enum at the greatest document number for that term. Even though I wouldn't do something like this others might. In this case, where an immediate #next() on IR#termDocs() is called, it's might look silly to compare the behaviour of II and Segment as it's such blatantly erroneous use of the API, but even I have been known to come up with some rather strange solution now and then when nobody else is looking. One alternative is that #next would produce an InvalidStateException or something instead of just accepting the call, but then there is of course the small extra cost associated with checking if the enum has been seeked yet and #next is a rather commonly used method.
        Hide
        Michael McCandless added a comment -

        Thanks Karl – I'll back port to 3.0 & commit.

        Show
        Michael McCandless added a comment - Thanks Karl – I'll back port to 3.0 & commit.
        Hide
        Michael McCandless added a comment -

        Thanks Karl!

        Show
        Michael McCandless added a comment - Thanks Karl!
        Hide
        Uwe Schindler added a comment -

        merge back also to 2.9.2

        Show
        Uwe Schindler added a comment - merge back also to 2.9.2

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Karl Wettin
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development