Lucene - Core
  1. Lucene - Core
  2. LUCENE-5858

Move back compat codecs out of core/ into codecs/ jar

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      These take significant space and bloat core lucene. Not everyone needs the ability to read ancient indexes (especially building a new app).

      We should move this cruft out of the core/ jar. codecs/ is the obvious place, its already setup in the build system for tests and everything else.

        Activity

        Hide
        Hoss Man added a comment -

        IIUC: This means that, moving forward, users who want to upgrade, and have indexes built with older versions of Lucene, will need to include the codecs.jar as well as the core.jar in their applications as well ... correct?

        Since a big part of the reason the codecs modules already exists was so apps that didn't want/need those optional codecs wouldn't have to load them, would it make more sense to create a new backcompat-codecs module/jar instead of lumping them all in codecs.jar?

        Show
        Hoss Man added a comment - IIUC: This means that, moving forward, users who want to upgrade, and have indexes built with older versions of Lucene, will need to include the codecs.jar as well as the core.jar in their applications as well ... correct? Since a big part of the reason the codecs modules already exists was so apps that didn't want/need those optional codecs wouldn't have to load them, would it make more sense to create a new backcompat-codecs module/jar instead of lumping them all in codecs.jar?
        Hide
        Robert Muir added a comment -

        What is the problem with codecs/ ?

        Its not mandatory today. Its only mandatory for tests, which is the advantage, there is no risk to the build system moving stuff there. It removes it from the core jar and makes it optional, whereas today its mandatory.

        This is the conservative route: trying to add a new jar is more complex... I'm not against it, I just want to make an improvement step for 5.0. If someone else wants to do the build system work for some backcompat module, thats great, but it shouldnt be a requirement for getting this crap out of core.

        Especially since its trunk, core/ shouldn't be cluttered with all this cruft.

        Show
        Robert Muir added a comment - What is the problem with codecs/ ? Its not mandatory today. Its only mandatory for tests, which is the advantage, there is no risk to the build system moving stuff there. It removes it from the core jar and makes it optional, whereas today its mandatory. This is the conservative route: trying to add a new jar is more complex... I'm not against it, I just want to make an improvement step for 5.0. If someone else wants to do the build system work for some backcompat module, thats great, but it shouldnt be a requirement for getting this crap out of core. Especially since its trunk, core/ shouldn't be cluttered with all this cruft.
        Hide
        Hoss Man added a comment -

        Its not mandatory today. Its only mandatory for tests, which is the advantage, there is no risk to the build system moving stuff there. It removes it from the core jar and makes it optional, whereas today its mandatory.

        I guess i'm misunderstanding something about your initial suggestion. (There are also a lot of instances of "It" in that paragraph that that are confusing me about which "it" you are refering to at any given time ... sometimes it seems like "it" is the codecs module, other times i think "it" refers to the backcompat codecs?

        Here's the point i was trying to make...

        Today: a simple-app.jar that wants extremely basic search functionality, can depend exclusively on lucene-core.jar and build an index and search that index. If that simple-app.jar built an index with lucene-core-4.5.jar and then later upgraded to use lucene-core-4.9.jar, then simple-app.jar would continue to work just fine

        If i understand your idea correctly: then starting in 4.10 the back compat codecs would now live in lucene-codecs.jar. So simple-app.jar would need to include both lucene-core-4.10.jar and lucene-codecs-4.10.jar in it's classpath if it wanted to keep reading those older 4.5-4.9 indexes. There might however be some trivial-app.jar that doesn't care about index backcompat at all, and it only needs to load lucene-core-4.10.jar

        Am i correct so far?

        The concern i have is that when it comes to upgrading, the "simple-app.jar" scenerio seems more common to me then the "trivial-app.jar" situation, and for the simple-app.jar situation, moving the backcompat codecs into lucene-codecs.jar doesn't actually do anything to reduce the "bloat" of classes in the lucene jars it includes – it makes the bloat worse – because now in addition to the core classes, and the backcompat codecs, simple-app.jar also has to include all of the "optional" codecs (like simpletext) that already exist in the lucene-codecs.jar.

        The bloat of unnecessary classes has been reduced for trivial-app.jar, but is that really the situation we should be optimizing for?

        The key question i'm raising is:

        Does it really make sense to increase the size of jars needed for apps that want to read old indexes, in order to decrease the size of the jars needed for (in my opinion) "toy" apps that don't care about index compatibility?

        Moving the backcompat codecs into their own jar seems like a great idea – i'm just not sure if any "real" lucene users benefit from moving them into the existing codecs jar.

        Show
        Hoss Man added a comment - Its not mandatory today. Its only mandatory for tests, which is the advantage, there is no risk to the build system moving stuff there. It removes it from the core jar and makes it optional, whereas today its mandatory. I guess i'm misunderstanding something about your initial suggestion. (There are also a lot of instances of "It" in that paragraph that that are confusing me about which "it" you are refering to at any given time ... sometimes it seems like "it" is the codecs module, other times i think "it" refers to the backcompat codecs? Here's the point i was trying to make... Today: a simple-app.jar that wants extremely basic search functionality, can depend exclusively on lucene-core.jar and build an index and search that index. If that simple-app.jar built an index with lucene-core-4.5.jar and then later upgraded to use lucene-core-4.9.jar, then simple-app.jar would continue to work just fine If i understand your idea correctly: then starting in 4.10 the back compat codecs would now live in lucene-codecs.jar. So simple-app.jar would need to include both lucene-core-4.10.jar and lucene-codecs-4.10.jar in it's classpath if it wanted to keep reading those older 4.5-4.9 indexes. There might however be some trivial-app.jar that doesn't care about index backcompat at all, and it only needs to load lucene-core-4.10.jar Am i correct so far? The concern i have is that when it comes to upgrading, the "simple-app.jar" scenerio seems more common to me then the "trivial-app.jar" situation, and for the simple-app.jar situation, moving the backcompat codecs into lucene-codecs.jar doesn't actually do anything to reduce the "bloat" of classes in the lucene jars it includes – it makes the bloat worse – because now in addition to the core classes, and the backcompat codecs, simple-app.jar also has to include all of the "optional" codecs (like simpletext) that already exist in the lucene-codecs.jar. The bloat of unnecessary classes has been reduced for trivial-app.jar, but is that really the situation we should be optimizing for? The key question i'm raising is: Does it really make sense to increase the size of jars needed for apps that want to read old indexes, in order to decrease the size of the jars needed for (in my opinion) "toy" apps that don't care about index compatibility? Moving the backcompat codecs into their own jar seems like a great idea – i'm just not sure if any "real" lucene users benefit from moving them into the existing codecs jar.
        Hide
        Uwe Schindler added a comment -

        I would also suggest to move the backwards codecs into a separate JAR, not the experimental codecs module! We should also maybe add some useful text to the Codec.forName() and PostingsFormat.forName() APIs, so it tells the user that he might need to add lucene-backwards-codecs.jar into classpath.

        Show
        Uwe Schindler added a comment - I would also suggest to move the backwards codecs into a separate JAR, not the experimental codecs module! We should also maybe add some useful text to the Codec.forName() and PostingsFormat.forName() APIs, so it tells the user that he might need to add lucene-backwards-codecs.jar into classpath.
        Hide
        Robert Muir added a comment -

        If people really want a separate jar, then fine.

        BUT

        we remove them from the normal "rotation" so core testing has a clean classpath and we can remove even more cruft (like packed ints). These days, we have dedicated TestXXXFormat for every codec, so this is not really needed anymore, instead just an annoyance: wasted time debugging test failures that are just quirks in old behavior of ancient codecs (e.g. not supporting missing values), and false jenkins failures because newer features arent supported (causing tons of SuppressCodecs everywhere). I think it made sense for the initial 3.x->4.x cutover, we didn't have such a mechanism for testing at that point, nor did we have really so many new index features that various search functionality was trying to use: blasting them thru all the tests was our only choice. But I think these days it does more harm than good. Most of the old formats we are still testing (like 3.0) are years and years old and just don't support the modern features. We should be looking forwards instead of backwards.

        My motivation here is to make backwards compatibility simpler and less of a hassle for us as a project, not more difficult and more complex.

        Show
        Robert Muir added a comment - If people really want a separate jar, then fine. BUT we remove them from the normal "rotation" so core testing has a clean classpath and we can remove even more cruft (like packed ints). These days, we have dedicated TestXXXFormat for every codec, so this is not really needed anymore, instead just an annoyance: wasted time debugging test failures that are just quirks in old behavior of ancient codecs (e.g. not supporting missing values), and false jenkins failures because newer features arent supported (causing tons of SuppressCodecs everywhere). I think it made sense for the initial 3.x->4.x cutover, we didn't have such a mechanism for testing at that point, nor did we have really so many new index features that various search functionality was trying to use: blasting them thru all the tests was our only choice. But I think these days it does more harm than good. Most of the old formats we are still testing (like 3.0) are years and years old and just don't support the modern features. We should be looking forwards instead of backwards. My motivation here is to make backwards compatibility simpler and less of a hassle for us as a project, not more difficult and more complex.
        Hide
        Adrien Grand added a comment -

        +1 on these ideas:

        • creating a new module/jar that would contain old codecs, their tests (TestXXXFormat), TestBackwardCompatibility and potentially classes that only old codecs use
        • removing old codecs from rotation (but still use the codecs from the lucene/codecs modules, like SimpleText, etc. which have all codec features)
        • adding information about the backward-compat and codecs modules to the javadocs and exceptions of Codec/PostingsFormat/DocValuesFormat.forName

        On a side note, if we want to keep backward compatibility manageable, I think we should think about releasing Lucene 5.0 soon (4.x is almost 2 years old already).

        Show
        Adrien Grand added a comment - +1 on these ideas: creating a new module/jar that would contain old codecs, their tests (TestXXXFormat), TestBackwardCompatibility and potentially classes that only old codecs use removing old codecs from rotation (but still use the codecs from the lucene/codecs modules, like SimpleText, etc. which have all codec features) adding information about the backward-compat and codecs modules to the javadocs and exceptions of Codec/PostingsFormat/DocValuesFormat.forName On a side note, if we want to keep backward compatibility manageable, I think we should think about releasing Lucene 5.0 soon (4.x is almost 2 years old already).
        Hide
        Uwe Schindler added a comment -

        On a side note, if we want to keep backward compatibility manageable, I think we should think about releasing Lucene 5.0 soon (4.x is almost 2 years old already).

        At least to get the 3.x backwards codec away, To do this, we just need to fix some minor problems with the new Stored/Indexed/Docvalues field API. Robert has some problems with it. I am strongly in fixing those.

        Also the Solr "Server" issues should be solved, so Solr no longer ships as Webapp, but as separate server.

        This are low-hangging fruits for 5.0...

        Show
        Uwe Schindler added a comment - On a side note, if we want to keep backward compatibility manageable, I think we should think about releasing Lucene 5.0 soon (4.x is almost 2 years old already). At least to get the 3.x backwards codec away, To do this, we just need to fix some minor problems with the new Stored/Indexed/Docvalues field API. Robert has some problems with it. I am strongly in fixing those. Also the Solr "Server" issues should be solved, so Solr no longer ships as Webapp, but as separate server. This are low-hangging fruits for 5.0...
        Hide
        Uwe Schindler added a comment -

        Also the Solr "Server" issues should be solved, so Solr no longer ships as Webapp, but as separate server.

        At least hiding that there is a "servlet"-like container called embedded Jetty behind... It should at least feel like a server (own port, own main() routine, so people can make a "windows/unix service" out of it would suffice)...

        Show
        Uwe Schindler added a comment - Also the Solr "Server" issues should be solved, so Solr no longer ships as Webapp, but as separate server. At least hiding that there is a "servlet"-like container called embedded Jetty behind... It should at least feel like a server (own port, own main() routine, so people can make a "windows/unix service" out of it would suffice)...
        Hide
        Robert Muir added a comment -

        Can we discuss such things on another issue? This one is about simplifying backwards compatibility.

        Show
        Robert Muir added a comment - Can we discuss such things on another issue? This one is about simplifying backwards compatibility.
        Hide
        David Smiley added a comment -

        removing old codecs from rotation (but still use the codecs from the lucene/codecs modules, like SimpleText, etc. which have all codec features)

        +1

        I don't have a strong opinion wether the old codecs move to some new jar or to existing codecs.jar. The existing codecs.jar seems simple and sufficient.

        Independently (but related) to this matter, I wish a test could articulate what capabilities (e.g. need doc-values, need missing-value, needs term vectors with payloads) it wants from the codec versus listing codec names it does not want. This issue of putting old codecs out of rotation ameliorates the need for such a feature, but one day there's going to be something new and a test using it is going to be blacklisting stuff again.

        Show
        David Smiley added a comment - removing old codecs from rotation (but still use the codecs from the lucene/codecs modules, like SimpleText, etc. which have all codec features) +1 I don't have a strong opinion wether the old codecs move to some new jar or to existing codecs.jar. The existing codecs.jar seems simple and sufficient. Independently (but related) to this matter, I wish a test could articulate what capabilities (e.g. need doc-values, need missing-value, needs term vectors with payloads) it wants from the codec versus listing codec names it does not want. This issue of putting old codecs out of rotation ameliorates the need for such a feature, but one day there's going to be something new and a test using it is going to be blacklisting stuff again.
        Hide
        ASF subversion and git services added a comment -

        Commit 1621750 from Adrien Grand in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621750 ]

        LUCENE-5858: Create branch.

        Show
        ASF subversion and git services added a comment - Commit 1621750 from Adrien Grand in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621750 ] LUCENE-5858 : Create branch.
        Hide
        ASF subversion and git services added a comment -

        Commit 1621751 from Adrien Grand in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621751 ]

        LUCENE-5858: First iteration.

        Tests pass but the rest (eg. javadocs) might be completely broken.

        Show
        ASF subversion and git services added a comment - Commit 1621751 from Adrien Grand in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621751 ] LUCENE-5858 : First iteration. Tests pass but the rest (eg. javadocs) might be completely broken.
        Hide
        ASF subversion and git services added a comment -

        Commit 1621756 from Adrien Grand in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621756 ]

        LUCENE-5858: Add missing files.

        Show
        ASF subversion and git services added a comment - Commit 1621756 from Adrien Grand in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621756 ] LUCENE-5858 : Add missing files.
        Hide
        ASF subversion and git services added a comment -

        Commit 1621764 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621764 ]

        LUCENE-5858: remove some test cruft

        Show
        ASF subversion and git services added a comment - Commit 1621764 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621764 ] LUCENE-5858 : remove some test cruft
        Hide
        ASF subversion and git services added a comment -

        Commit 1621770 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621770 ]

        LUCENE-5858: fix javadocs

        Show
        ASF subversion and git services added a comment - Commit 1621770 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621770 ] LUCENE-5858 : fix javadocs
        Hide
        ASF subversion and git services added a comment -

        Commit 1621790 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621790 ]

        LUCENE-5858: clear bogus code/comment up

        Show
        ASF subversion and git services added a comment - Commit 1621790 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621790 ] LUCENE-5858 : clear bogus code/comment up
        Hide
        ASF subversion and git services added a comment -

        Commit 1621805 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621805 ]

        LUCENE-5858: remove impersonation

        Show
        ASF subversion and git services added a comment - Commit 1621805 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621805 ] LUCENE-5858 : remove impersonation
        Hide
        ASF subversion and git services added a comment -

        Commit 1621807 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621807 ]

        LUCENE-5858: don't even register RW test codecs, impersonation is removed

        Show
        ASF subversion and git services added a comment - Commit 1621807 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621807 ] LUCENE-5858 : don't even register RW test codecs, impersonation is removed
        Hide
        ASF subversion and git services added a comment -

        Commit 1621816 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621816 ]

        LUCENE-5858: remove remaining unnecessary SuppressCodecs

        Show
        ASF subversion and git services added a comment - Commit 1621816 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621816 ] LUCENE-5858 : remove remaining unnecessary SuppressCodecs
        Hide
        ASF subversion and git services added a comment -

        Commit 1621828 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621828 ]

        LUCENE-5858: remove conditionals around codec features

        Show
        ASF subversion and git services added a comment - Commit 1621828 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621828 ] LUCENE-5858 : remove conditionals around codec features
        Hide
        ASF subversion and git services added a comment -

        Commit 1621832 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621832 ]

        LUCENE-5858: add jar

        Show
        ASF subversion and git services added a comment - Commit 1621832 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621832 ] LUCENE-5858 : add jar
        Hide
        ASF subversion and git services added a comment -

        Commit 1621838 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621838 ]

        LUCENE-5858: try to get nightly-smoke passing

        Show
        ASF subversion and git services added a comment - Commit 1621838 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621838 ] LUCENE-5858 : try to get nightly-smoke passing
        Hide
        ASF subversion and git services added a comment -

        Commit 1621849 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621849 ]

        LUCENE-5858: add missing references to backward-codecs

        Show
        ASF subversion and git services added a comment - Commit 1621849 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621849 ] LUCENE-5858 : add missing references to backward-codecs
        Hide
        Robert Muir added a comment -
           [smoker] SUCCESS! [0:28:53.961116]
        

        I think the branch is ready. various conditionals and suppressions around codec features are cleaned up in tests, impersonation is removed, etc. I think this is much more manageable.

        Show
        Robert Muir added a comment - [smoker] SUCCESS! [0:28:53.961116] I think the branch is ready. various conditionals and suppressions around codec features are cleaned up in tests, impersonation is removed, etc. I think this is much more manageable.
        Hide
        Adrien Grand added a comment -

        +1 It looks great.

        Show
        Adrien Grand added a comment - +1 It looks great.
        Hide
        ASF subversion and git services added a comment -

        Commit 1621957 from Robert Muir in branch 'dev/branches/lucene5858'
        [ https://svn.apache.org/r1621957 ]

        LUCENE-5858: trunk->branch

        Show
        ASF subversion and git services added a comment - Commit 1621957 from Robert Muir in branch 'dev/branches/lucene5858' [ https://svn.apache.org/r1621957 ] LUCENE-5858 : trunk->branch
        Hide
        ASF subversion and git services added a comment -

        Commit 1621960 from Robert Muir in branch 'dev/trunk'
        [ https://svn.apache.org/r1621960 ]

        LUCENE-5858: Move back compat codecs out of core to their own jar

        Show
        ASF subversion and git services added a comment - Commit 1621960 from Robert Muir in branch 'dev/trunk' [ https://svn.apache.org/r1621960 ] LUCENE-5858 : Move back compat codecs out of core to their own jar
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development