Lucene - Core
  1. Lucene - Core
  2. LUCENE-6213

Add test for IndexFormatTooOldException if a commit has a 3.x segment

    Details

    • Type: Bug Bug
    • Status: Reopened
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We should add a 4.x index (4.x commit) with some 3.x segment(s) to our backwards tests.

      I don't think we throw IndexFormatTooOldException correctly in this case. I think instead the user will get a confusing SPI error about a missing codec "Lucene3x".

      1. LUCENE-6213.patch
        4 kB
        Ryan Ernst
      2. LUCENE-6213.patch
        4 kB
        Ryan Ernst
      3. LUCENE-6213.patch
        2 kB
        Ryan Ernst
      4. unsupported.4x-with-3x-segments.zip
        10 kB
        Ryan Ernst

        Activity

        Hide
        Robert Muir added a comment -

        I dont want to cause delays for 5.0, but this might be one to fix as it will confuse users. First we need the test to know if its really a bug, but I think it is.

        Show
        Robert Muir added a comment - I dont want to cause delays for 5.0, but this might be one to fix as it will confuse users. First we need the test to know if its really a bug, but I think it is.
        Hide
        Robert Muir added a comment -

        One way to fix this would be, to have an TooOldCodec that throws IndexFormatTooOldException for every method. We could register it in SPI with the names of codecs we no longer support.

        So in trunk, it would be registered for all the 4.x codecs for example.

        When SegmentInfos asks the codec for the segmentInfoWriter() when decoding the commit, the user will get the correct exception. Alternatively we could just have a hardcoded list/map and conditional logic in SegmentInfos for this.

        Show
        Robert Muir added a comment - One way to fix this would be, to have an TooOldCodec that throws IndexFormatTooOldException for every method. We could register it in SPI with the names of codecs we no longer support. So in trunk, it would be registered for all the 4.x codecs for example. When SegmentInfos asks the codec for the segmentInfoWriter() when decoding the commit, the user will get the correct exception. Alternatively we could just have a hardcoded list/map and conditional logic in SegmentInfos for this.
        Hide
        Ryan Ernst added a comment -

        Here is a quick and dirty patch (against branch_5x) that adds a bwc index test as you suggested, and a quick fix for the bug. I like the idea of a dummy codec, but didn't have time to try it.

        Show
        Ryan Ernst added a comment - Here is a quick and dirty patch (against branch_5x) that adds a bwc index test as you suggested, and a quick fix for the bug. I like the idea of a dummy codec, but didn't have time to try it.
        Hide
        Robert Muir added a comment -

        Thanks Ryan, i thought of another related issue: the backwards-codecs.jar is new in 5.0, so i imagine its common someone will try to open a 4.x index with just a lucene core JAR and get a similar SPI error Maybe we dont need anything there, but we can think about the error message.

        Show
        Robert Muir added a comment - Thanks Ryan, i thought of another related issue: the backwards-codecs.jar is new in 5.0, so i imagine its common someone will try to open a 4.x index with just a lucene core JAR and get a similar SPI error Maybe we dont need anything there, but we can think about the error message.
        Hide
        Adrien Grand added a comment -

        I like the TooOldCodec idea!

        Maybe we dont need anything there, but we can think about the error message

        +1 on improving the error message to mention the backwards-codecs jar

        Show
        Adrien Grand added a comment - I like the TooOldCodec idea! Maybe we dont need anything there, but we can think about the error message +1 on improving the error message to mention the backwards-codecs jar
        Hide
        Ryan Ernst added a comment -

        I tried the codec idea, but it unfortunately required changing a lot of callers to handling IOException or themselves throw IOException, and the change got very large.

        Here is a new patch (still against branch_5x) that incorporates an additional check to give a nice error message if the user should probably include backward-codecs.jar. It also is setup to easily forward port to trunk (just adding to the list of unsupportedCodecs).

        Show
        Ryan Ernst added a comment - I tried the codec idea, but it unfortunately required changing a lot of callers to handling IOException or themselves throw IOException, and the change got very large. Here is a new patch (still against branch_5x) that incorporates an additional check to give a nice error message if the user should probably include backward-codecs.jar. It also is setup to easily forward port to trunk (just adding to the list of unsupportedCodecs).
        Hide
        Robert Muir added a comment -

        I like the simple solution for now!

        When throwing the too-old exception, can we initCause() it with the original exception we got? I think it is not good to lose it.

        Show
        Robert Muir added a comment - I like the simple solution for now! When throwing the too-old exception, can we initCause() it with the original exception we got? I think it is not good to lose it.
        Hide
        Ryan Ernst added a comment -

        Sure, new patch with initCause added.

        Show
        Ryan Ernst added a comment - Sure, new patch with initCause added.
        Hide
        Robert Muir added a comment -

        +1, the only thing i might do is move unsupportedCodecs to be right above readCodec() in the code.

        Show
        Robert Muir added a comment - +1, the only thing i might do is move unsupportedCodecs to be right above readCodec() in the code.
        Hide
        Ryan Ernst added a comment -

        Ok, I'll move unsupportedCodecs before committing.

        Show
        Ryan Ernst added a comment - Ok, I'll move unsupportedCodecs before committing.
        Hide
        ASF subversion and git services added a comment -

        Commit 1656523 from Ryan Ernst in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1656523 ]

        LUCENE-6213: Add useful exception message when commit contains segments from legacy codecs

        Show
        ASF subversion and git services added a comment - Commit 1656523 from Ryan Ernst in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1656523 ] LUCENE-6213 : Add useful exception message when commit contains segments from legacy codecs
        Hide
        Adrien Grand added a comment -

        +1

        Show
        Adrien Grand added a comment - +1
        Hide
        ASF subversion and git services added a comment -

        Commit 1656528 from Ryan Ernst in branch 'dev/branches/lucene_solr_5_0'
        [ https://svn.apache.org/r1656528 ]

        LUCENE-6213: Add useful exception message when commit contains segments from legacy codecs (merged 1656523)

        Show
        ASF subversion and git services added a comment - Commit 1656528 from Ryan Ernst in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1656528 ] LUCENE-6213 : Add useful exception message when commit contains segments from legacy codecs (merged 1656523)
        Hide
        Anshum Gupta added a comment -

        Thanks for getting this into 5.0.

        Show
        Anshum Gupta added a comment - Thanks for getting this into 5.0.
        Hide
        Uwe Schindler added a comment -

        Adrien Grand: Why is the list of unsupported codecs an Array. A simple unmodifiable set would be better, would spare lots of code If its a singletonSet its even simpler Otherwise use Collections.unmodifiableSet(new HashSet<>(Arrays.asList(codecs))). You don't need to change that, but to me this looks simplier and more correct. Memory usage or speed is not a problem here, it is static and never changes after class init.

        Show
        Uwe Schindler added a comment - Adrien Grand : Why is the list of unsupported codecs an Array. A simple unmodifiable set would be better, would spare lots of code If its a singletonSet its even simpler Otherwise use Collections.unmodifiableSet(new HashSet<>(Arrays.asList(codecs))) . You don't need to change that, but to me this looks simplier and more correct. Memory usage or speed is not a problem here, it is static and never changes after class init.
        Hide
        Ryan Ernst added a comment -

        Uwe Schindler While that might be "less lines of code" (but only by 1 or 2 I think), that is a much more complicated data structure than an array. In fact, if java just had an Arrays.find like STL's find() then there would be no difference in number of lines.

        Show
        Ryan Ernst added a comment - Uwe Schindler While that might be "less lines of code" (but only by 1 or 2 I think), that is a much more complicated data structure than an array. In fact, if java just had an Arrays.find like STL's find() then there would be no difference in number of lines.
        Hide
        Uwe Schindler added a comment -

        If you want to keep the array, use Arrays.asList(array).contains(element), that's the pendant to your STL's find(). But I still think that a set is the "right" data structure.

        Show
        Uwe Schindler added a comment - If you want to keep the array, use Arrays.asList(array).contains(element) , that's the pendant to your STL's find() . But I still think that a set is the "right" data structure.
        Hide
        Ryan Ernst added a comment -

        Well, except that would convert the array to a list on each call. I would rather keep the current code.

        Show
        Ryan Ernst added a comment - Well, except that would convert the array to a list on each call. I would rather keep the current code.
        Hide
        Uwe Schindler added a comment -

        Well, except that would convert the array to a list on each call. I would rather keep the current code.

        Well, I would declare it as this List. List and array are basically the same in Java. Arrays.asList() does only "wrap" not "convert". In addition the declaration of the static constant would be much more readable (sorry, I hate this "new String[]

        { ... }

        " syntax - the code looks like written by a C developer - not Java like):

        private static final List<String> unsupportedCodecs = Arrays.asList(
          "Lucene3x"
        );
        

        Then you can do the contains().
        (because its private, there is no need to make it explicitely "unmodifiable")

        Uwe

        Show
        Uwe Schindler added a comment - Well, except that would convert the array to a list on each call. I would rather keep the current code. Well, I would declare it as this List. List and array are basically the same in Java. Arrays.asList() does only "wrap" not "convert". In addition the declaration of the static constant would be much more readable (sorry, I hate this "new String[] { ... } " syntax - the code looks like written by a C developer - not Java like): private static final List< String > unsupportedCodecs = Arrays.asList( "Lucene3x" ); Then you can do the contains(). (because its private, there is no need to make it explicitely "unmodifiable") Uwe
        Hide
        ASF subversion and git services added a comment -

        Commit 1656581 from Ryan Ernst in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1656581 ]

        LUCENE-6213: Bikeshed the hell out of a 1 element list

        Show
        ASF subversion and git services added a comment - Commit 1656581 from Ryan Ernst in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1656581 ] LUCENE-6213 : Bikeshed the hell out of a 1 element list
        Hide
        ASF subversion and git services added a comment -

        Commit 1656582 from Ryan Ernst in branch 'dev/branches/lucene_solr_5_0'
        [ https://svn.apache.org/r1656582 ]

        LUCENE-6213: Bikeshed the hell out of a 1 element list

        Show
        ASF subversion and git services added a comment - Commit 1656582 from Ryan Ernst in branch 'dev/branches/lucene_solr_5_0' [ https://svn.apache.org/r1656582 ] LUCENE-6213 : Bikeshed the hell out of a 1 element list
        Hide
        Uwe Schindler added a comment -

        Thanks for changing! Sometimes I have a bad day - I was dealing with C code a minute ago!

        Show
        Uwe Schindler added a comment - Thanks for changing! Sometimes I have a bad day - I was dealing with C code a minute ago!
        Hide
        Robert Muir added a comment -

        We need this logic in trunk for the same reasons (5.x commit containing 4.x segments). Somehow it only got committed to 5.x branch.

        Show
        Robert Muir added a comment - We need this logic in trunk for the same reasons (5.x commit containing 4.x segments). Somehow it only got committed to 5.x branch.
        Hide
        Ryan Ernst added a comment -

        When I added this logic, 5.0 was not yet released, so we didn't have an official release to build the mixed segments with. I'll take care of this.

        Show
        Ryan Ernst added a comment - When I added this logic, 5.0 was not yet released, so we didn't have an official release to build the mixed segments with. I'll take care of this.
        Hide
        Hoss Man added a comment -

        When I added this logic, 5.0 was not yet released, so we didn't have an official release to build the mixed segments with.

        Perhaps this is something smokechecker should sanity check for us moving forward? It already parses the output of TestBackwardsCompatibility to make sure we test all known previous releases, why not also make it look for testIndexFormatTooOldExceptionX where X is 1 less then whatever the major version number is?

        so smokechecker against trunk (where version is 6.0) will fail unless/until there is a testIndexFormatTooOldException5 run as part of TestBackwardsCompatibility. when we create a branch_6x and update the version on trunk to "7" it will start failing again until we change TestBackwardsCompatibility so that testIndexFormatTooOldException6 exists (a good and legitimate smoke test to fail: we don't want to release "trunk" until 6.0 is final and we have a mixed format to test against)

        Show
        Hoss Man added a comment - When I added this logic, 5.0 was not yet released, so we didn't have an official release to build the mixed segments with. Perhaps this is something smokechecker should sanity check for us moving forward? It already parses the output of TestBackwardsCompatibility to make sure we test all known previous releases, why not also make it look for testIndexFormatTooOldExceptionX where X is 1 less then whatever the major version number is? so smokechecker against trunk (where version is 6.0) will fail unless/until there is a testIndexFormatTooOldException5 run as part of TestBackwardsCompatibility. when we create a branch_6x and update the version on trunk to "7" it will start failing again until we change TestBackwardsCompatibility so that testIndexFormatTooOldException6 exists (a good and legitimate smoke test to fail: we don't want to release "trunk" until 6.0 is final and we have a mixed format to test against)
        Hide
        ASF subversion and git services added a comment -

        Commit 1676648 from Ryan Ernst in branch 'dev/trunk'
        [ https://svn.apache.org/r1676648 ]

        LUCENE-6213: Add useful exception message when commit contains segments from legacy codecs (forward port from branch_5x)

        Show
        ASF subversion and git services added a comment - Commit 1676648 from Ryan Ernst in branch 'dev/trunk' [ https://svn.apache.org/r1676648 ] LUCENE-6213 : Add useful exception message when commit contains segments from legacy codecs (forward port from branch_5x)
        Hide
        ASF subversion and git services added a comment -

        Commit 1676650 from Ryan Ernst in branch 'dev/trunk'
        [ https://svn.apache.org/r1676650 ]

        LUCENE-6213: Revert accidental removal of code from readCommit (bad merge)

        Show
        ASF subversion and git services added a comment - Commit 1676650 from Ryan Ernst in branch 'dev/trunk' [ https://svn.apache.org/r1676650 ] LUCENE-6213 : Revert accidental removal of code from readCommit (bad merge)
        Hide
        Ryan Ernst added a comment - - edited

        Hoss Man We already check unsupported indexes are tested. I do think we should check for this, but only once a version is released. For example, we didn't fail expecting a bwc test for 5.0.0 until that was actually available (we list apache archives of lucene release to find all the versions we expect). In this case, we probably just need some extra logic for a major release. Eg. For M.0.0, expect a test index "(M)x-with-(M-1)x-segments". I think this is essentially what you were saying?

        We should also have this easily created with addBackcompatIndexes.py, so that when it runs on a major release it also creates one of these mixed indexes. I created LUCENE-6457 to track this.

        Show
        Ryan Ernst added a comment - - edited Hoss Man We already check unsupported indexes are tested. I do think we should check for this, but only once a version is released. For example, we didn't fail expecting a bwc test for 5.0.0 until that was actually available (we list apache archives of lucene release to find all the versions we expect). In this case, we probably just need some extra logic for a major release. Eg. For M.0.0, expect a test index "(M)x-with-(M-1)x-segments". I think this is essentially what you were saying? We should also have this easily created with addBackcompatIndexes.py, so that when it runs on a major release it also creates one of these mixed indexes. I created LUCENE-6457 to track this.
        Hide
        Hoss Man added a comment -

        Eg. For M.0.0, expect a test index "(M)x-with-(M-1)x-segments". I think this is essentially what you were saying?

        My point was that for a sanity check like this, where the focus is on major versions, we can go even farther then just looking at the list of existing releases.

        In your example, i think that even if there is no official "M" release, the smoketester for "M+1" should fail if a test for "(M)x-with-(M-1)x-segments can not be found.

        Example: if the trunk version property is set to "8.Y" (where the value of Y is irrelevant) then it should be expected that the a "7x-with-6x-segments" index is tested – even if there is no 7.X release yet according to jira / archive.apache.org.

        that way, in the first few days after a "7x_branch" is created, and trunk's version property is updated to "8.0" it will be clear from the smokeTester failures on trunk that trunk is not in a state suitable to be released – by definittion trunk won't be suitable for release until a 7.0 release is final and the 7x index format is used to create the "7x-with-6x-segments" index for the trunk test.

        Show
        Hoss Man added a comment - Eg. For M.0.0, expect a test index "(M)x-with-(M-1)x-segments". I think this is essentially what you were saying? My point was that for a sanity check like this, where the focus is on major versions, we can go even farther then just looking at the list of existing releases. In your example, i think that even if there is no official "M" release, the smoketester for "M+1" should fail if a test for "(M)x-with-(M-1)x-segments can not be found. Example: if the trunk version property is set to "8.Y" (where the value of Y is irrelevant) then it should be expected that the a "7x-with-6x-segments" index is tested – even if there is no 7.X release yet according to jira / archive.apache.org. that way, in the first few days after a "7x_branch" is created, and trunk's version property is updated to "8.0" it will be clear from the smokeTester failures on trunk that trunk is not in a state suitable to be released – by definittion trunk won't be suitable for release until a 7.0 release is final and the 7x index format is used to create the "7x-with-6x-segments" index for the trunk test.
        Hide
        Ryan Ernst added a comment - - edited

        i think that even if there is no official "M" release, the smoketester for "M+1" should fail if a test for "(M)x-with-(M-1)x-segments can not be found.

        I disagree. We shouldn't fail when there is nothing we can do. The backcompat indexes are created with official releases, so we should not fail until there is an official release with which to build the backcompat index (whether that be a pure index or mixed).

        Show
        Ryan Ernst added a comment - - edited i think that even if there is no official "M" release, the smoketester for "M+1" should fail if a test for "(M)x-with-(M-1)x-segments can not be found. I disagree. We shouldn't fail when there is nothing we can do. The backcompat indexes are created with official releases, so we should not fail until there is an official release with which to build the backcompat index (whether that be a pure index or mixed).
        Hide
        Hoss Man added a comment -

        We shouldn't fail when there is nothing we can do.

        -0

        In my opinion you are looking at the purpose of the smoke tester backwards: the smoke tester shouldn't pass unless the candidate it's smoking is ready for release – if we know that there is no "previous" index format that is being adequately tested in the candidate build, then the candidate is, by definition, not ready for release.

        Show
        Hoss Man added a comment - We shouldn't fail when there is nothing we can do. -0 In my opinion you are looking at the purpose of the smoke tester backwards: the smoke tester shouldn't pass unless the candidate it's smoking is ready for release – if we know that there is no "previous" index format that is being adequately tested in the candidate build, then the candidate is, by definition, not ready for release.
        Hide
        Robert Muir added a comment -

        I disagree. We shouldn't fail when there is nothing we can do.

        Yeah, failing jenkins over and over for a potentially long period of time is not a solution here.

        We can release version X without tests that it can read X-1. In general, test coverage is never perfect.

        Show
        Robert Muir added a comment - I disagree. We shouldn't fail when there is nothing we can do. Yeah, failing jenkins over and over for a potentially long period of time is not a solution here. We can release version X without tests that it can read X-1. In general, test coverage is never perfect.

          People

          • Assignee:
            Ryan Ernst
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:

              Development