Lucene - Core
  1. Lucene - Core
  2. LUCENE-2811

SegmentInfo should explicitly track whether that segment wrote term vectors

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Today SegmentInfo doesn't know if it has vectors, which means its files() method must check if the files exist.

      This leads to subtle bugs, because Si.files() caches the files but then we fail to invalidate that later when the term vectors files are created.

      It also leads to sloppy code, eg TermVectorsReader "gracefully" handles being opened when the files do not exist. I don't like that; it should only be opened if they exist.

      This also fixes these intermittent failures we've been seeing:

      junit.framework.AssertionFailedError: IndexFileDeleter doesn't know about file _1e.tvx
             at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:979)
             at org.apache.lucene.util.LuceneTestCase$LuceneTestCaseRunner.runChild(LuceneTestCase.java:917)
             at org.apache.lucene.index.IndexWriter.filesExist(IndexWriter.java:3633)
             at org.apache.lucene.index.IndexWriter.startCommit(IndexWriter.java:3699)
             at org.apache.lucene.index.IndexWriter.prepareCommit(IndexWriter.java:2407)
             at org.apache.lucene.index.IndexWriter.commitInternal(IndexWriter.java:2478)
             at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2460)
             at org.apache.lucene.index.IndexWriter.commit(IndexWriter.java:2444)
             at org.apache.lucene.index.TestIndexWriterExceptions.testRandomExceptionsThreads(TestIndexWriterExceptions.java:213)
      
      1. LUCENE-2811.patch
        30 kB
        Michael McCandless

        Activity

        Hide
        Michael McCandless added a comment -

        Patch.

        Show
        Michael McCandless added a comment - Patch.
        Hide
        Michael McCandless added a comment -

        Patch also has fix for intermittent failure I hit in TestAddIndexes testAddIndexesWithRollback.

        Show
        Michael McCandless added a comment - Patch also has fix for intermittent failure I hit in TestAddIndexes testAddIndexesWithRollback.
        Hide
        Simon Willnauer added a comment -

        Mike good that you figured it out - other than that I think this part gets messier and messier each time we change something. Your patch is a good indicator that we need to push stuff into codecs and let codecs decide if a feature is present in a segment. BW code should be handled in PreFlexCodec and new stuff like hasVector should be something a codec holds or rather segmentCodecs encodes really. Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin)

        Also all the CFS and Compound Doc Store stuff should be pushed to codecs.

        I looked at the patch and it looks good to me though except of the one System.out.println:

            System.out.println("SI READ 2");
        

        simon

        Show
        Simon Willnauer added a comment - Mike good that you figured it out - other than that I think this part gets messier and messier each time we change something. Your patch is a good indicator that we need to push stuff into codecs and let codecs decide if a feature is present in a segment. BW code should be handled in PreFlexCodec and new stuff like hasVector should be something a codec holds or rather segmentCodecs encodes really. Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin) Also all the CFS and Compound Doc Store stuff should be pushed to codecs. I looked at the patch and it looks good to me though except of the one System.out.println: System .out.println( "SI READ 2" ); simon
        Hide
        Michael McCandless added a comment -

        Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin)

        I agree – we gotta get this stuff under codec control! No core code should be operating on file names.

        I looked at the patch and it looks good to me though except of the one System.out.println:

        Woops, thanks, I'll remove the sop!

        Show
        Michael McCandless added a comment - Accessing the filename extensions outside a codec seem to be very odd (I know TV and Stored fields are not yet exposed - just sayin) I agree – we gotta get this stuff under codec control! No core code should be operating on file names. I looked at the patch and it looks good to me though except of the one System.out.println: Woops, thanks, I'll remove the sop!
        Hide
        Michael McCandless added a comment -

        Committed to trunk...

        I'll let this age a bit before backporting to 3.x.

        Show
        Michael McCandless added a comment - Committed to trunk... I'll let this age a bit before backporting to 3.x.
        Hide
        Earwin Burrfoot added a comment -

        I think SegmentInfo.hasVectors should be a boolean.

        If this is an old index, we can check the file presence in SegmentInfo constructor, set it properly, and on next write index is silently upgraded.

        Show
        Earwin Burrfoot added a comment - I think SegmentInfo.hasVectors should be a boolean. If this is an old index, we can check the file presence in SegmentInfo constructor, set it properly, and on next write index is silently upgraded.
        Hide
        Michael McCandless added a comment -

        Good idea, will do...

        Show
        Michael McCandless added a comment - Good idea, will do...
        Hide
        Earwin Burrfoot added a comment -

        From IRC:
        SegmentMerger.hasVectors carries no new information compared to OneMerge.hasVectors, and can be dropped.
        OneMerge.hasVectors is initialized just near OneMerge.info, and is later used to set OneMerge.info.hasVectors, might as well do that from the get go and drop OM.hV.

        Show
        Earwin Burrfoot added a comment - From IRC: SegmentMerger.hasVectors carries no new information compared to OneMerge.hasVectors, and can be dropped. OneMerge.hasVectors is initialized just near OneMerge.info, and is later used to set OneMerge.info.hasVectors, might as well do that from the get go and drop OM.hV.
        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1

          People

          • Assignee:
            Michael McCandless
            Reporter:
            Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development