Lucene - Core
  1. Lucene - Core
  2. LUCENE-2720

IndexWriter should throw IndexFormatTooOldExc on open, not later during optimize/getReader/close

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Spinoff of LUCENE-2618 and also related to the original issue LUCENE-2523...

      If you open IW on a too-old index, you don't find out until much later that the index is too old.

      This is because IW does not go and open segment readers on all segments. It only does so when it's time to apply deletes, do merges, open an NRT reader, etc.

      This is a serious bug because you can in fact succeed in committing with the new major version of Lucene against your too-old index, which is catastrophic because suddenly the old Lucene version will no longer open the index, and so your index becomes unusable.

      1. LUCENE-2720-trunk.patch
        13 kB
        Shai Erera
      2. LUCENE-2720-trunk.patch
        17 kB
        Shai Erera
      3. LUCENE-2720-3x.patch
        8 kB
        Shai Erera

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          IndexReader should also throw an immediate exc on open.

          Show
          Michael McCandless added a comment - IndexReader should also throw an immediate exc on open.
          Hide
          Shai Erera added a comment -

          And if we can arrange for addIndexes(Directory...) to throw that too, that would complete the cycle I think. addIndexes(IndexReader) is not a problem, since if IR will detect that upon open(), you shouldn't reach addIndexes(IR) at all.

          To record here your proposal from LUCENE-2618 - one way to achieve that is to let segments_N record the oldest version it contains. This is fine for 3.1 and onwards indexes, however what do we do w/ 3.0 ones? We will need to distinguish >=3.1 and <3.1. However, <3.1 covers both 3.0 (which we should support by 4.0) and 2.9 (which we shouldn't). Maybe this isn't a problem because you cannot upgrade from 2.9 to 4.0 directly - you have to go through 3.x.

          BTW Mike, I think we should track the version per-segment because only when all segments of version X are gone, can the minimum version Y be recorded in segments_N. That is, if you have several segments from version 1, and the index is on version 2, and you merge some of the ver1 segments, the index's oldest version is still 1. However, if you record it only in segments_N, how would you know when all ver1 segments are gone?

          Show
          Shai Erera added a comment - And if we can arrange for addIndexes(Directory...) to throw that too, that would complete the cycle I think. addIndexes(IndexReader) is not a problem, since if IR will detect that upon open(), you shouldn't reach addIndexes(IR) at all. To record here your proposal from LUCENE-2618 - one way to achieve that is to let segments_N record the oldest version it contains. This is fine for 3.1 and onwards indexes, however what do we do w/ 3.0 ones? We will need to distinguish >=3.1 and <3.1. However, <3.1 covers both 3.0 (which we should support by 4.0) and 2.9 (which we shouldn't). Maybe this isn't a problem because you cannot upgrade from 2.9 to 4.0 directly - you have to go through 3.x. BTW Mike, I think we should track the version per-segment because only when all segments of version X are gone, can the minimum version Y be recorded in segments_N. That is, if you have several segments from version 1, and the index is on version 2, and you merge some of the ver1 segments, the index's oldest version is still 1. However, if you record it only in segments_N, how would you know when all ver1 segments are gone?
          Hide
          Michael McCandless added a comment -

          And if we can arrange for addIndexes(Directory...) to throw that too, that would complete the cycle I think.

          I agree.

          BTW Mike, I think we should track the version per-segment because only when all segments of version X are gone, can the minimum version Y be recorded in segments_N

          Yes, each SegmentInfo inside the segments_N file should track the util.Version when it was created.

          This is fine for 3.1 and onwards indexes, however what do we do w/ 3.0 ones? We will need to distinguish >=3.1 and <3.1.

          Actually we need only to distinguish 2.x vs 3.x right? I think we can just use the stored fields file, since it changed as of 3.0?

          Show
          Michael McCandless added a comment - And if we can arrange for addIndexes(Directory...) to throw that too, that would complete the cycle I think. I agree. BTW Mike, I think we should track the version per-segment because only when all segments of version X are gone, can the minimum version Y be recorded in segments_N Yes, each SegmentInfo inside the segments_N file should track the util.Version when it was created. This is fine for 3.1 and onwards indexes, however what do we do w/ 3.0 ones? We will need to distinguish >=3.1 and <3.1. Actually we need only to distinguish 2.x vs 3.x right? I think we can just use the stored fields file, since it changed as of 3.0?
          Hide
          Uwe Schindler added a comment -

          Actually we need only to distinguish 2.x vs 3.x right? I think we can just use the stored fields file, since it changed as of 3.0?

          We discussed this in IRC shortly before:
          To detect a pre-3.0 index, it's enough to read the StoredFields file (is it really always available?). So 3.x and 4.0 do the following in the ctors/open:

          • if segments file is 3.x format, it contains information about util.Version -> we are fine no more work. In 4.0 we are happy with the index
          • Only trunk: if segments file is older format, read the StoredFields file in ctor. This would throw IndexFormatTooOldEx in 4.0 (this is additional work, if stored fields are not needed, but neglectible). Once the index is modified, we know about

          Once an index is modified with 3.x or 4.0 we always update the version and write new segment information. For older segments, this code also has to inspect the stored fields file, to detect if segment is pre-3.x. This is done exatctly once during update of segments_N.

          Show
          Uwe Schindler added a comment - Actually we need only to distinguish 2.x vs 3.x right? I think we can just use the stored fields file, since it changed as of 3.0? We discussed this in IRC shortly before: To detect a pre-3.0 index, it's enough to read the StoredFields file (is it really always available?). So 3.x and 4.0 do the following in the ctors/open: if segments file is 3.x format, it contains information about util.Version -> we are fine no more work. In 4.0 we are happy with the index Only trunk: if segments file is older format, read the StoredFields file in ctor. This would throw IndexFormatTooOldEx in 4.0 (this is additional work, if stored fields are not needed, but neglectible). Once the index is modified, we know about Once an index is modified with 3.x or 4.0 we always update the version and write new segment information. For older segments, this code also has to inspect the stored fields file, to detect if segment is pre-3.x. This is done exatctly once during update of segments_N.
          Hide
          Shai Erera added a comment -

          From the comments I thought what's left to do is to take care of IndexWriter throwing IFTOE when it's opened, but I don't see the min/max versions recorded anywhere. I expected to find them in SegmentInfo(s), but they're not. So did I miss it, or what's left to do here is the whole nine yards?

          Show
          Shai Erera added a comment - From the comments I thought what's left to do is to take care of IndexWriter throwing IFTOE when it's opened, but I don't see the min/max versions recorded anywhere. I expected to find them in SegmentInfo(s), but they're not. So did I miss it, or what's left to do here is the whole nine yards?
          Hide
          Michael McCandless added a comment -

          I think it's the whole 9 yards still, and, I think we better do this before 3.1 (so we can distinguish 3.0 and 3.x)?

          At a minimum for 3.1 we just need to fix SegmentInfo writing to record the current Version. Throwing the exc from IW could then wait...

          Show
          Michael McCandless added a comment - I think it's the whole 9 yards still, and, I think we better do this before 3.1 (so we can distinguish 3.0 and 3.x)? At a minimum for 3.1 we just need to fix SegmentInfo writing to record the current Version. Throwing the exc from IW could then wait...
          Hide
          Shai Erera added a comment -

          Patch against trunk. I need to fix 3x to write the version and produce an index for TestBackCompat before committing this patch (even though the tests pass).

          Show
          Shai Erera added a comment - Patch against trunk. I need to fix 3x to write the version and produce an index for TestBackCompat before committing this patch (even though the tests pass).
          Hide
          Shai Erera added a comment -

          Patch against 3x:

          • Adds FieldsReader.detectCodeVersion - returns "2.x" for pre-3.0 indexes and "3.0" for 3.0 indexes. Not called for 3.1+ segments.
          • SegmentInfo records its code version (Constants.LUCENE_MAIN_VERSION).
          • SegmentInfos bumps up the format number and upgrades old segments ("2.x" or "3.0") to record their version too.

          I'll update the trunk patch to reflect those changes (i.e., now indexes touched by 3.1+ code will have their segments recording their version, whether they are pre-3.0 or not).

          Show
          Shai Erera added a comment - Patch against 3x: Adds FieldsReader.detectCodeVersion - returns "2.x" for pre-3.0 indexes and "3.0" for 3.0 indexes. Not called for 3.1+ segments. SegmentInfo records its code version (Constants.LUCENE_MAIN_VERSION). SegmentInfos bumps up the format number and upgrades old segments ("2.x" or "3.0") to record their version too. I'll update the trunk patch to reflect those changes (i.e., now indexes touched by 3.1+ code will have their segments recording their version, whether they are pre-3.0 or not).
          Hide
          Shai Erera added a comment -

          Modified patch against trunk:

          • Added FieldsReader.checkCodeVersion - throws TooOldEx if format is unsupported
          • Added ctor to IndexFormatTooOldException which takes a String, so that we can properly express it's a "2.x" index
          • DefaultSegmentInfosReader handles si.getVersion().equals("2.x") by throwing TooOldEx

          After it gets the "green light", I'll produce a 3.1 index and check it in trunk for TestBackCompat.

          Show
          Shai Erera added a comment - Modified patch against trunk: Added FieldsReader.checkCodeVersion - throws TooOldEx if format is unsupported Added ctor to IndexFormatTooOldException which takes a String, so that we can properly express it's a "2.x" index DefaultSegmentInfosReader handles si.getVersion().equals("2.x") by throwing TooOldEx After it gets the "green light", I'll produce a 3.1 index and check it in trunk for TestBackCompat.
          Hide
          Michael McCandless added a comment -

          Both 3.x & trunk patches look great!

          Maybe in the comment for SegmentInfo.version explain the "required"
          format of the version? (Since it's a String). And eg note that we
          don't track minor versions here? (Maybe just link to the same comment
          for Constants.LUCENE_MAIN_VERSION).

          Show
          Michael McCandless added a comment - Both 3.x & trunk patches look great! Maybe in the comment for SegmentInfo.version explain the "required" format of the version? (Since it's a String). And eg note that we don't track minor versions here? (Maybe just link to the same comment for Constants.LUCENE_MAIN_VERSION).
          Hide
          Shai Erera added a comment -

          Thanks Mike. I'll add the comments as you suggest and commit.

          Show
          Shai Erera added a comment - Thanks Mike. I'll add the comments as you suggest and commit.
          Hide
          Shai Erera added a comment -

          Thanks Mike and Uwe for your comments !

          Committed revision 1062324 (3x).
          Committed revision 1062325 (trunk).

          I'll send a reindexing note to java-user & dev.

          Show
          Shai Erera added a comment - Thanks Mike and Uwe for your comments ! Committed revision 1062324 (3x). Committed revision 1062325 (trunk). I'll send a reindexing note to java-user & dev.
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1

            People

            • Assignee:
              Shai Erera
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development