Lucene - Core
  1. Lucene - Core
  2. LUCENE-3403

Term vectors missing after addIndexes + optimize

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Blocker Blocker
    • Resolution: Fixed
    • Affects Version/s: 3.3
    • Fix Version/s: 3.4, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      I encountered a problem with addIndexes where term vectors disappeared following optimize(). I wrote a simple test case which demonstrates the problem. The bug appears with both addIndexes() versions, but does not appear if addDocument is called twice, committing changes in between.

      I think I tracked the problem down to IndexWriter.mergeMiddle() – it sets term vectors before merger.merge() was called. In the addDocs case, merger.fieldInfos is already populated, while in the addIndexes case it is empty, hence fieldInfos.hasVectors returns false.

      will post a patch shortly.

        Activity

        Hide
        Shai Erera added a comment -

        Committed revision 1162300 (3x).
        Committed revision 1162301 (trunk – tests only).

        Show
        Shai Erera added a comment - Committed revision 1162300 (3x). Committed revision 1162301 (trunk – tests only).
        Hide
        Michael McCandless added a comment -

        Phew nice catch Shai!

        Show
        Michael McCandless added a comment - Phew nice catch Shai!
        Hide
        Simon Willnauer added a comment -

        You're right, it does not happen on trunk. I still want to commit the test cases to trunk too, so that we've got that covered there as well. Therefore I think I should keep the 4.0 fix version?

        don't get me wrong I was just double checking because 4.0 was not in the affected version. I don't wanna miss such a trap.

        The problem is that SegmentMerger receives its FieldInfos from DocumentsWriter, and it knows whether to set hasVector according to what it receives. When you addDoc, DW has FieldInfos, but when you only addIndexes, DW doesn't.

        maybe we should adopt what trunk does, checking all the FI if one of the stores vectors unless you FIs is readonly?

        If it's ok, I'll commit the fix to 3x and the tests-only to trunk.

        +1 tests are great!

        Show
        Simon Willnauer added a comment - You're right, it does not happen on trunk. I still want to commit the test cases to trunk too, so that we've got that covered there as well. Therefore I think I should keep the 4.0 fix version? don't get me wrong I was just double checking because 4.0 was not in the affected version. I don't wanna miss such a trap. The problem is that SegmentMerger receives its FieldInfos from DocumentsWriter, and it knows whether to set hasVector according to what it receives. When you addDoc, DW has FieldInfos, but when you only addIndexes, DW doesn't. maybe we should adopt what trunk does, checking all the FI if one of the stores vectors unless you FIs is readonly? If it's ok, I'll commit the fix to 3x and the tests-only to trunk. +1 tests are great!
        Hide
        Shai Erera added a comment -

        You're right, it does not happen on trunk. I still want to commit the test cases to trunk too, so that we've got that covered there as well. Therefore I think I should keep the 4.0 fix version?

        The problem is that SegmentMerger receives its FieldInfos from DocumentsWriter, and it knows whether to set hasVector according to what it receives. When you addDoc, DW has FieldInfos, but when you only addIndexes, DW doesn't.

        In fact, the field infos are read only on IW open ... so even if I addIndexes(), commit(), addIndexes(), the field infos would still be missing. A workaround I see for now is to addIndexes(), close(), new IW(), continue with addIndexes() or optimize(). Which is ugly but it's a workaround until we release a new version. I'll try that.

        If it's ok, I'll commit the fix to 3x and the tests-only to trunk.

        Show
        Shai Erera added a comment - You're right, it does not happen on trunk. I still want to commit the test cases to trunk too, so that we've got that covered there as well. Therefore I think I should keep the 4.0 fix version? The problem is that SegmentMerger receives its FieldInfos from DocumentsWriter, and it knows whether to set hasVector according to what it receives. When you addDoc, DW has FieldInfos, but when you only addIndexes, DW doesn't. In fact, the field infos are read only on IW open ... so even if I addIndexes(), commit(), addIndexes(), the field infos would still be missing. A workaround I see for now is to addIndexes(), close(), new IW(), continue with addIndexes() or optimize(). Which is ugly but it's a workaround until we release a new version. I'll try that. If it's ok, I'll commit the fix to 3x and the tests-only to trunk.
        Hide
        Simon Willnauer added a comment -

        good catch Shai, Does this happen on 4.0 too? I don't think we have setHasVectors there anymore. I am just wondering since you put 4.0 as a fix version.

        Show
        Simon Willnauer added a comment - good catch Shai, Does this happen on 4.0 too? I don't think we have setHasVectors there anymore. I am just wondering since you put 4.0 as a fix version.
        Hide
        Shai Erera added a comment -

        Patch adds 3 test cases to TestTermVectors. If you don't apply the fix to IndexWriter, the tests which call addIndexes fail.

        It also moves the setHasVectors call after merger.merge() in IndexWriter.

        BTW, if you omit the optimize() call and the fix to IW, the tests pass.

        Show
        Shai Erera added a comment - Patch adds 3 test cases to TestTermVectors. If you don't apply the fix to IndexWriter, the tests which call addIndexes fail. It also moves the setHasVectors call after merger.merge() in IndexWriter. BTW, if you omit the optimize() call and the fix to IW, the tests pass.

          People

          • Assignee:
            Shai Erera
            Reporter:
            Shai Erera
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development