Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 2.9, 2.9.1, 2.9.2, 2.9.3
    • Fix Version/s: 2.9.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      There was a lot of user requests to be able to read Lucene 3.0 indexes also with 2.9. This would make the migration easier. There is no problem in doing that, as the new stored fields version in Lucene 3.0 is only used to mark a segment's stored fields file as no longer containing compressed fields. But index format did not really change. This patch simply allows FieldsReader to pass a Lucene 3.0 version number, but still writes segments in 2.9 format (as you could suddenly turn on compression for added documents).

      I added ZIP files for 3.0 indexes for TestBackwards. Without the patch it does not pass, as FieldsReader complains about incorrect version number (although it could read the file easily). If we would release maybe a 2.9.4 release of Lucene we should include that patch.

      1. index.30.cfs.zip
        5 kB
        Uwe Schindler
      2. index.30.nocfs.zip
        9 kB
        Uwe Schindler
      3. LUCENE-2675.patch
        4 kB
        Uwe Schindler

        Activity

        Hide
        Robert Muir added a comment -

        in my opinion this would be very confusing and set a bad precedent.

        I don't understand how it would make migration easier... migration backwards?

        Personally I think we should let things be... e.g. someone will get confused and think they
        can open their 3.0-created index with 2.9/java 1.4 but it is a different version of Unicode, for example.

        Show
        Robert Muir added a comment - in my opinion this would be very confusing and set a bad precedent. I don't understand how it would make migration easier... migration backwards? Personally I think we should let things be... e.g. someone will get confused and think they can open their 3.0-created index with 2.9/java 1.4 but it is a different version of Unicode, for example.
        Hide
        Uwe Schindler added a comment - - edited

        Forget about Java versions. Almost everybody who migrates to 3.0 already uses Java 1.5 or 1.6. The problem is during the migration phase (when you undeprecate your code) you cannot switch between both versions easily as soon as you touch an index with 3.0, it will not open anymore in 2.9, but in reality its the same index version, there is no fileformat change at all! The version number is simply a marker for SegmentMerger in 3.0 that it can raw-copy documents because they do not contain compression anymore. If we would not have removed compression in 3.0, the file format would have been identical.

        As we declare 2.9 and 3.0 as feature-identical even in the latest version, it is not understandable to anyone why they cannot open an 3.0 index with 2.9 and vice versa. For unicode reasons you should then also disallow opening a 2.9 index with 3.0 I got requests (even on java-user, but also from my customers) quite often about that and one user that wants to migrate to 3.0 through 2.9 again asked me today.

        I just repeat: The index format is identical!

        Maybe we have other comments, I will only commit this if we have an agreement and only if we would release 2.9.4.

        Show
        Uwe Schindler added a comment - - edited Forget about Java versions. Almost everybody who migrates to 3.0 already uses Java 1.5 or 1.6. The problem is during the migration phase (when you undeprecate your code) you cannot switch between both versions easily as soon as you touch an index with 3.0, it will not open anymore in 2.9, but in reality its the same index version, there is no fileformat change at all! The version number is simply a marker for SegmentMerger in 3.0 that it can raw-copy documents because they do not contain compression anymore. If we would not have removed compression in 3.0, the file format would have been identical. As we declare 2.9 and 3.0 as feature-identical even in the latest version, it is not understandable to anyone why they cannot open an 3.0 index with 2.9 and vice versa. For unicode reasons you should then also disallow opening a 2.9 index with 3.0 I got requests (even on java-user, but also from my customers) quite often about that and one user that wants to migrate to 3.0 through 2.9 again asked me today. I just repeat: The index format is identical! Maybe we have other comments, I will only commit this if we have an agreement and only if we would release 2.9.4.
        Hide
        Uwe Schindler added a comment -

        Additionally I opened this issue, so I can point those people to this issue, so they can patch their Lucene 2.9 to do what they expect. EVen if this gets rejected somehow. I am simply tired of sending this patch out to everyone.

        Show
        Uwe Schindler added a comment - Additionally I opened this issue, so I can point those people to this issue, so they can patch their Lucene 2.9 to do what they expect. EVen if this gets rejected somehow. I am simply tired of sending this patch out to everyone.
        Hide
        Robert Muir added a comment -

        Forget about Java versions

        well its important to me, so I won't just forget about it. Especially to users that don't know how their analysis works,
        they do not know that java 1.4 is unicode 3.x and java 5 is unicode 4.x and harmony java5 is unicode 5.2 and java 6 is unicode 6.0.

        But this is even just part of the issue, i don't think we should do this. it adds too much confusion to be officially supported in any release.
        furthermore its not like it can be duplicated with 4.0, i would be against adding 4.x index support to 3.x also, forget about in a bugfix release.
        historically lucene has been held back by backwards compatibility, lets not throw forward compatibility into the mix.

        If we would not have removed compression in 3.0, the file format would have been identical.

        a great example of why major release shouldn't be just removal of deprecations.

        Show
        Robert Muir added a comment - Forget about Java versions well its important to me, so I won't just forget about it. Especially to users that don't know how their analysis works, they do not know that java 1.4 is unicode 3.x and java 5 is unicode 4.x and harmony java5 is unicode 5.2 and java 6 is unicode 6.0. But this is even just part of the issue, i don't think we should do this. it adds too much confusion to be officially supported in any release. furthermore its not like it can be duplicated with 4.0, i would be against adding 4.x index support to 3.x also, forget about in a bugfix release. historically lucene has been held back by backwards compatibility, lets not throw forward compatibility into the mix. If we would not have removed compression in 3.0, the file format would have been identical. a great example of why major release shouldn't be just removal of deprecations.
        Hide
        Michael McCandless added a comment -

        So, first off, Lucene's bw compat policy has never ensured this
        "reverse compat". Promising this would have been exceptionally
        costly, in the past. Once a new major release "kisses" your index, it
        cannot be used by older versions of Lucene.

        However, now with the switch to pluggable codecs in 4.0, it would be
        conceptually possible to make a codec that ensures no change to the
        index format, even on upgrade of the software. And I think such a
        codec would be reasonable to offer (we already basically have the
        "hard part" done, with the preflex write codec, but it's only exposed
        for testing). But... nobody has stepped up to create this for
        4.0... and it's not clear anybody will of course.

        That this is so trivial really is a reflection of our crazy major
        release criteria in the past (ie nothing of consequence changes going
        from X.9 -> (X+1).0!). Of course this is now changed, ie, 4.0 is
        changing tons from 3.x.

        I do appreciate the motivation for this, ie to allow an app to "try"
        updating the software (Lucene 2.x -> 3.x) fully independently of
        updating the index format. It's a valid use case.

        Net/net I'm fine with this change. But we should advertise very
        clearly that this is not in general promised by Lucene. It'll be on a
        case by case basis. EG even if someone steps up and we make a codec
        for 4.0 that can read/write the 3.x index format, that's still not
        ensured going forward (unless we make a change to our back-compat
        policy, which is a separate discussion).

        Show
        Michael McCandless added a comment - So, first off, Lucene's bw compat policy has never ensured this "reverse compat". Promising this would have been exceptionally costly, in the past. Once a new major release "kisses" your index, it cannot be used by older versions of Lucene. However, now with the switch to pluggable codecs in 4.0, it would be conceptually possible to make a codec that ensures no change to the index format, even on upgrade of the software. And I think such a codec would be reasonable to offer (we already basically have the "hard part" done, with the preflex write codec, but it's only exposed for testing). But... nobody has stepped up to create this for 4.0... and it's not clear anybody will of course. That this is so trivial really is a reflection of our crazy major release criteria in the past (ie nothing of consequence changes going from X.9 -> (X+1).0!). Of course this is now changed, ie, 4.0 is changing tons from 3.x. I do appreciate the motivation for this, ie to allow an app to "try" updating the software (Lucene 2.x -> 3.x) fully independently of updating the index format. It's a valid use case. Net/net I'm fine with this change. But we should advertise very clearly that this is not in general promised by Lucene. It'll be on a case by case basis. EG even if someone steps up and we make a codec for 4.0 that can read/write the 3.x index format, that's still not ensured going forward (unless we make a change to our back-compat policy, which is a separate discussion).
        Hide
        Uwe Schindler added a comment -

        So as Robert seems to also agree, I think I commit this and modify changes.txt to explicitely say that thisonly applys to 2.9/3.0, as it uses same codebase and same bugfix level.

        Show
        Uwe Schindler added a comment - So as Robert seems to also agree, I think I commit this and modify changes.txt to explicitely say that thisonly applys to 2.9/3.0, as it uses same codebase and same bugfix level.
        Hide
        Uwe Schindler added a comment -

        Committed revision: 1028723

        Show
        Uwe Schindler added a comment - Committed revision: 1028723

          People

          • Assignee:
            Uwe Schindler
            Reporter:
            Uwe Schindler
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development