Lucene - Core
  1. Lucene - Core
  2. LUCENE-2891

IndexWriterConfig does not allow readerTermsIndexDivisor to be -1, while the latest indicates the terms index should not be loaded

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      While you can pass -1 to IR.open(), and it's documented, you cannot do the same for IndexWriter's readers (b/c IWC blocks it). Need to allow this setting as well as add support for it in our tests, e.g. we should randomly set it to -1. Robert also suggested RandomIW use -1 randomly when it opens readers.

      I'll work on a patch

      1. LUCENE-2891.patch
        5 kB
        Shai Erera
      2. LUCENE-2891.patch
        5 kB
        Shai Erera
      3. LUCENE-2891.patch
        4 kB
        Shai Erera

        Activity

        Hide
        Shai Erera added a comment -

        Patch supports -1 passed to IWC.setReaderTermIndexDivisor and randomly sets it in LuceneTestCase.newIndexWriterConfig() and RandomIndexWriter.

        Many tests fail with exceptions like this:

            [junit] java.lang.IllegalStateException: terms index was not loaded when this reader was createde=29178856,total=88990720
            [junit]     at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:309)
            [junit]     at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:215)
            [junit]     at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:208)
            [junit]     at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57)        Caused an ERROR
            [junit]     at org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:301)
            [junit]     at org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:289)ted
            [junit]     at org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:191):309)
        

        So maybe IWC not supporting -1 is not a bug? I didn't see anywhere, even in IR.open() what are the consequences of not loading the terms index. Thought it just means slower term seeks, but apparently it's not supported?

        If so, I think we should mention IR.open() what does it mean to not load the terms index.

        Show
        Shai Erera added a comment - Patch supports -1 passed to IWC.setReaderTermIndexDivisor and randomly sets it in LuceneTestCase.newIndexWriterConfig() and RandomIndexWriter. Many tests fail with exceptions like this: [junit] java.lang.IllegalStateException: terms index was not loaded when this reader was createde=29178856,total=88990720 [junit] at org.apache.lucene.index.TermInfosReader.ensureIndexIsRead(TermInfosReader.java:309) [junit] at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:215) [junit] at org.apache.lucene.index.TermInfosReader.get(TermInfosReader.java:208) [junit] at org.apache.lucene.index.SegmentTermDocs.seek(SegmentTermDocs.java:57) Caused an ERROR [junit] at org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:301) [junit] at org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:289)ted [junit] at org.apache.lucene.index.BufferedDeletes.applyDeletes(BufferedDeletes.java:191):309) So maybe IWC not supporting -1 is not a bug? I didn't see anywhere, even in IR.open() what are the consequences of not loading the terms index. Thought it just means slower term seeks, but apparently it's not supported? If so, I think we should mention IR.open() what does it mean to not load the terms index.
        Hide
        Michael McCandless added a comment -

        -1 means "don't load terms index because I will never, ever seek-by-Term". As far as I know, it's only segment merging that does this...

        Maybe we shouldn't allow it for IR.open? Though, eg if you want to pass the reader to addIndexes (say), opening with -1 should work...

        Show
        Michael McCandless added a comment - -1 means "don't load terms index because I will never, ever seek-by-Term". As far as I know, it's only segment merging that does this... Maybe we shouldn't allow it for IR.open? Though, eg if you want to pass the reader to addIndexes (say), opening with -1 should work...
        Hide
        Shai Erera added a comment -

        Removed the random setting in tests and added a test TestIndexWriterReader.testNoTermsIndex() which ensure IR.open(IW) and seek-by-term throws an exception when the setting is set to -1 on IWC.

        Show
        Shai Erera added a comment - Removed the random setting in tests and added a test TestIndexWriterReader.testNoTermsIndex() which ensure IR.open(IW) and seek-by-term throws an exception when the setting is set to -1 on IWC.
        Hide
        Robert Muir added a comment -

        -1 means "don't load terms index because I will never, ever seek-by-Term". As far as I know, it's only segment merging that does this...

        but it allows next()'ing through all the terms only?

        I suggested the testing because I had seen some code for -1 but I didn't know how tested it is,
        its a tad confusing in the javadoc which just says "Set this to -1 to skip loading the terms index entirely."

        This makes me think (based on the language "skip loading"), that if you do this everything will still work,
        but seek() will just be brute-force or something aweful...

        Show
        Robert Muir added a comment - -1 means "don't load terms index because I will never, ever seek-by-Term". As far as I know, it's only segment merging that does this... but it allows next()'ing through all the terms only? I suggested the testing because I had seen some code for -1 but I didn't know how tested it is, its a tad confusing in the javadoc which just says "Set this to -1 to skip loading the terms index entirely." This makes me think (based on the language "skip loading"), that if you do this everything will still work, but seek() will just be brute-force or something aweful...
        Hide
        Michael McCandless added a comment -

        but it allows next()'ing through all the terms only?

        Right.

        I suggested the testing because I had seen some code for -1 but I didn't know how tested it is,

        It's tested inside IW – for merging, it opens all SRs w/ -1.

        its a tad confusing in the javadoc which just says "Set this to -1 to skip loading the terms index entirely."

        I agree – I think we should sharpen the jdocs to state that this is only useful in very narrow situations, ie when you are certain you just need to next() through all terms. (And probably mark this value as @lucene.experimental... maybe internal?).

        Show
        Michael McCandless added a comment - but it allows next()'ing through all the terms only? Right. I suggested the testing because I had seen some code for -1 but I didn't know how tested it is, It's tested inside IW – for merging, it opens all SRs w/ -1. its a tad confusing in the javadoc which just says "Set this to -1 to skip loading the terms index entirely." I agree – I think we should sharpen the jdocs to state that this is only useful in very narrow situations, ie when you are certain you just need to next() through all terms. (And probably mark this value as @lucene.experimental... maybe internal?).
        Hide
        Shai Erera added a comment -

        Did you guys review my latest patch, which includes a clarification in the jdocs?

        Show
        Shai Erera added a comment - Did you guys review my latest patch, which includes a clarification in the jdocs?
        Hide
        Shai Erera added a comment -

        Patch fixes javadocs and renames term infos to terms index. I plan to commit this later today or tomorrow morning.

        Show
        Shai Erera added a comment - Patch fixes javadocs and renames term infos to terms index. I plan to commit this later today or tomorrow morning.
        Hide
        Shai Erera added a comment -

        Fixed issue subject (replaced termInfos with terms index)

        Show
        Shai Erera added a comment - Fixed issue subject (replaced termInfos with terms index)
        Hide
        Shai Erera added a comment -

        Committed revision 1064255 (3x).
        Committed revision 1064285 (trunk).

        Merge to trunk was not so easy. testNoTermsIndex had to use different API than in 3x (which is ok), but the it turns out few Codecs don't honor the -1 setting, like PreFlex and SimpleText. Surprisingly, MockRandom also fails. I put all of them into an illegalCodecs Set and if that's what the test draws, the test does not continue and succeeds.

        I'll open a separate issue to check why MockRandomCodec fails the test (i.e., why doesn't it honor the -1 setting).

        Show
        Shai Erera added a comment - Committed revision 1064255 (3x). Committed revision 1064285 (trunk). Merge to trunk was not so easy. testNoTermsIndex had to use different API than in 3x (which is ok), but the it turns out few Codecs don't honor the -1 setting, like PreFlex and SimpleText. Surprisingly, MockRandom also fails. I put all of them into an illegalCodecs Set and if that's what the test draws, the test does not continue and succeeds. I'll open a separate issue to check why MockRandomCodec fails the test (i.e., why doesn't it honor the -1 setting).
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            1 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development