Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-2811

SegmentInfo should explicitly track whether that segment wrote term vectors

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 3.1, 4.0-ALPHA
    • core/index
    • None
    • New

    Description

      Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.

      This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.

      It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist. I don't like that; it should only be opened if they exist.

      This also fixes these intermittent failures we've been seeing:

      junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
             at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
             at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
             at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
             at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
             at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
             at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
             at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
             at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
             at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
      

      Attachments

        1. LUCENE-2811.patch
          30 kB
          Michael McCandless

        Activity

          Patch.

          mikemccand Michael McCandless added a comment - Patch.

          Patch also has fix for intermittent failure I hit in TestAddIndexes testAddIndexesWithRollback.

          mikemccand Michael McCandless added a comment - Patch also has fix for intermittent failure I hit in TestAddIndexes testAddIndexesWithRollback.

          Mike good that you figured it out - other than that I think this part gets messier and messier each time we change something. Your patch is a good indicator that we need to push stuff into codecs and let codecs decide if a feature is present in a segment. BW code should be handled in PreFlexCodec and new stuff like hasVector should be something a codec holds or rather segmentCodecs encodes really. Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin)

          Also all the CFS and Compound Doc Store stuff should be pushed to codecs.

          I looked at the patch and it looks good to me though except of the one System.out.println:

              System.out.println("SI READ 2");
          

          simon

          simonw Simon Willnauer added a comment - Mike good that you figured it out - other than that I think this part gets messier and messier each time we change something. Your patch is a good indicator that we need to push stuff into codecs and let codecs decide if a feature is present in a segment. BW code should be handled in PreFlexCodec and new stuff like hasVector should be something a codec holds or rather segmentCodecs encodes really. Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin) Also all the CFS and Compound Doc Store stuff should be pushed to codecs. I looked at the patch and it looks good to me though except of the one System.out.println: System .out.println( "SI READ 2" ); simon

          Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin)

          I agree – we gotta get this stuff under codec control! No core code should be operating on file names.

          I looked at the patch and it looks good to me though except of the one System.out.println:

          Woops, thanks, I'll remove the sop!

          mikemccand Michael McCandless added a comment - Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin) I agree – we gotta get this stuff under codec control! No core code should be operating on file names. I looked at the patch and it looks good to me though except of the one System.out.println: Woops, thanks, I'll remove the sop!

          Committed to trunk...

          I'll let this age a bit before backporting to 3.x.

          mikemccand Michael McCandless added a comment - Committed to trunk... I'll let this age a bit before backporting to 3.x.

          I think SegmentInfo.hasVectors should be a boolean.

          If this is an old index, we can check the file presence in SegmentInfo constructor, set it properly, and on next write index is silently upgraded.

          earwin Earwin Burrfoot added a comment - I think SegmentInfo.hasVectors should be a boolean. If this is an old index, we can check the file presence in SegmentInfo constructor, set it properly, and on next write index is silently upgraded.

          Good idea, will do...

          mikemccand Michael McCandless added a comment - Good idea, will do...

          From IRC:
          SegmentMerger.hasVectors carries no new information compared to OneMerge.hasVectors, and can be dropped.
          OneMerge.hasVectors is initialized just near OneMerge.info, and is later used to set OneMerge.info.hasVectors, might as well do that from the get go and drop OM.hV.

          earwin Earwin Burrfoot added a comment - From IRC: SegmentMerger.hasVectors carries no new information compared to OneMerge.hasVectors, and can be dropped. OneMerge.hasVectors is initialized just near OneMerge.info, and is later used to set OneMerge.info.hasVectors, might as well do that from the get go and drop OM.hV.

          Bulk close for 3.1

          gsingers Grant Ingersoll added a comment - Bulk close for 3.1
          tomoko Tomoko Uchida added a comment -

          This issue was moved to GitHub issue: #3885.

          tomoko Tomoko Uchida added a comment - This issue was moved to GitHub issue: #3885 .

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved: