Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.0, 6.0
    • Component/s: core/codecs, general/build
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We should move all concrete postings formats and codecs but Lucene40 to a module.

      1. LUCENE-4340.patch
        32 kB
        Adrien Grand
      2. LUCENE-4340.patch
        32 kB
        Adrien Grand
      3. LUCENE-4340.sh
        0.9 kB
        Adrien Grand
      4. LUCENE-4340-bloom.patch
        11 kB
        Adrien Grand

        Activity

        Hide
        Adrien Grand added a comment -

        Maybe the easiest way to go would be to have this codec module depend on lucene-core with scope=compile and lucene-core depend on this codec module with scope=test (is it possible?) so that codecs remain randomly tested with lucene-core test suite.

        Show
        Adrien Grand added a comment - Maybe the easiest way to go would be to have this codec module depend on lucene-core with scope=compile and lucene-core depend on this codec module with scope=test (is it possible?) so that codecs remain randomly tested with lucene-core test suite.
        Hide
        Robert Muir added a comment -

        Sounds like a good plan. We can use ivy configuration for the dependency.

        I also think "concrete" things not used by 4.0 codec like VariableGapTerms/BlockTermsWriter etc should go to the module,
        e.g. underneath packages like blockterms/. But some trickiness here: test-framework uses some of this stuff (e.g. Lucene40WithOrds, NestedPulsing, etc).

        So we have to figure out how to also support these test-only codecs somehow (maybe also RandomCodec so we don't have to do a lot of craziness
        to pass random parameters to these other codecs).

        Show
        Robert Muir added a comment - Sounds like a good plan. We can use ivy configuration for the dependency. I also think "concrete" things not used by 4.0 codec like VariableGapTerms/BlockTermsWriter etc should go to the module, e.g. underneath packages like blockterms/. But some trickiness here: test-framework uses some of this stuff (e.g. Lucene40WithOrds, NestedPulsing, etc). So we have to figure out how to also support these test-only codecs somehow (maybe also RandomCodec so we don't have to do a lot of craziness to pass random parameters to these other codecs).
        Hide
        Adrien Grand added a comment -

        Why would it be a problem to have test-framework depend on this new 'codecs' module?

        Show
        Adrien Grand added a comment - Why would it be a problem to have test-framework depend on this new 'codecs' module?
        Hide
        Robert Muir added a comment -

        yeah, i think thats what we should do.

        Show
        Robert Muir added a comment - yeah, i think thats what we should do.
        Hide
        Uwe Schindler added a comment -

        How about the following:
        I don't think test-framework should depend hard on codec module, because if you do use LuceneTestCase in your own app and you only want to test with default codecs, leave it out. We could make it only "detect" codecs available and use them in randomization, but if the codecs package is missing, ignore them?

        Show
        Uwe Schindler added a comment - How about the following: I don't think test-framework should depend hard on codec module, because if you do use LuceneTestCase in your own app and you only want to test with default codecs, leave it out. We could make it only "detect" codecs available and use them in randomization, but if the codecs package is missing, ignore them?
        Hide
        Robert Muir added a comment -

        That won't work. see my comment above.

        Show
        Robert Muir added a comment - That won't work. see my comment above.
        Hide
        Robert Muir added a comment -

        Also i disagree with making our own stuff harder to make it easier for someone else using LuceneTestCase in their own app.

        Our test stuff is for us first, if it happens to be useful to other people fine.

        Its not a huge burden for them to add lucene-codecs.jar too.

        Show
        Robert Muir added a comment - Also i disagree with making our own stuff harder to make it easier for someone else using LuceneTestCase in their own app. Our test stuff is for us first, if it happens to be useful to other people fine. Its not a huge burden for them to add lucene-codecs.jar too.
        Hide
        Adrien Grand added a comment -

        I tried the proposed approach on my local copy (unfortunately, the diff is pretty meaningless because of the svn mvoves, does anyone know how to produce something both readable and applyable through svn?):

        • moved all sub packages of oal.codecs except perfield and lucene40 to a new codecs module,
        • also moved BlockTerms {Reader,Writer}, FixedGapTermsIndex{Reader, Writer}, TermsIndex{Reader,Writer}

          Base, VariableGapTermsIndex

          {Reader,Writer}

          ,

        • lucene-codecs has been added to the classpath of test-framework,
        • lucene-codecs has been added to the test classpath of lucene-core.

        Compilation and tests seem to work fine...

        Does it look good to you? I tried to take care of javadocs, eclipse, idea and Maven, is there anything else I should worry about?

        Show
        Adrien Grand added a comment - I tried the proposed approach on my local copy (unfortunately, the diff is pretty meaningless because of the svn mvoves, does anyone know how to produce something both readable and applyable through svn?): moved all sub packages of oal.codecs except perfield and lucene40 to a new codecs module, also moved BlockTerms {Reader,Writer}, FixedGapTermsIndex{Reader, Writer}, TermsIndex{Reader,Writer} Base, VariableGapTermsIndex {Reader,Writer} , lucene-codecs has been added to the classpath of test-framework, lucene-codecs has been added to the test classpath of lucene-core. Compilation and tests seem to work fine... Does it look good to you? I tried to take care of javadocs, eclipse, idea and Maven, is there anything else I should worry about?
        Hide
        Steve Rowe added a comment -

        does anyone know how to produce something both readable and applyable through svn?

        I post the svn (move/copy/etc.) script to the issue and generate the patch using svn diff --no-diff-deleted. Somebody wanting to apply the patch just runs the svn script first.

        Show
        Steve Rowe added a comment - does anyone know how to produce something both readable and applyable through svn? I post the svn (move/copy/etc.) script to the issue and generate the patch using svn diff --no-diff-deleted . Somebody wanting to apply the patch just runs the svn script first.
        Hide
        Adrien Grand added a comment -

        Thanks, Steven!

        First patch. To apply, first run the bash script and then apply the patch.

        There are some things I am still unsure about:

        • should the fact that lucene-codecs now is a test dependency of lucene-core be reflected in dev-tools/maven/lucene/core/pom.xml.template (I thought it should, but I was surprised test-framework is not listed here),
        • I modified the IDEA configuration files to add an entry for this new module but I didn't test,
        • Solr lets users configure postings formats on a per-field basis but now only the default Lucene40PostingsFormat will be available by default, should we add lucene-codecs to Solr WAR or do we expect expert users to drop this JAR in their lib directory?
        Show
        Adrien Grand added a comment - Thanks, Steven! First patch. To apply, first run the bash script and then apply the patch. There are some things I am still unsure about: should the fact that lucene-codecs now is a test dependency of lucene-core be reflected in dev-tools/maven/lucene/core/pom.xml.template (I thought it should, but I was surprised test-framework is not listed here), I modified the IDEA configuration files to add an entry for this new module but I didn't test, Solr lets users configure postings formats on a per-field basis but now only the default Lucene40PostingsFormat will be available by default, should we add lucene-codecs to Solr WAR or do we expect expert users to drop this JAR in their lib directory?
        Hide
        Adrien Grand added a comment - - edited

        Updated patch :

        • the codecs module needs a specific Eclipse output folder because its META-INF/services/org.apache.lucene.codecs.Codec and PostingsFormat files will conflict with core otherwise,
        • tested import in Idea,
        • added svn:ignore on lucene/codecs/*.iml.
        Show
        Adrien Grand added a comment - - edited Updated patch : the codecs module needs a specific Eclipse output folder because its META-INF/services/org.apache.lucene.codecs.Codec and PostingsFormat files will conflict with core otherwise, tested import in Idea, added svn:ignore on lucene/codecs/*.iml.
        Hide
        Michael McCandless added a comment -

        +1

        If/When we choose a new default codec parts (eg using BlockPostingsFormat) ... I guess we will svn mv those sources into core at that point.

        Show
        Michael McCandless added a comment - +1 If/When we choose a new default codec parts (eg using BlockPostingsFormat) ... I guess we will svn mv those sources into core at that point.
        Hide
        Adrien Grand added a comment -

        If/When we choose a new default codec parts (eg using BlockPostingsFormat) ... I guess we will svn mv those sources into core at that point.

        I think so.

        I'll commit tomorrow.

        Show
        Adrien Grand added a comment - If/When we choose a new default codec parts (eg using BlockPostingsFormat) ... I guess we will svn mv those sources into core at that point. I think so. I'll commit tomorrow.
        Hide
        Adrien Grand added a comment -

        One nice side effect of moving this stuff out of core is that it makes the core JAR 280kB smaller (now 1.87MB).

        Show
        Adrien Grand added a comment - One nice side effect of moving this stuff out of core is that it makes the core JAR 280kB smaller (now 1.87MB).
        Hide
        Robert Muir added a comment -

        Is there any special purpose stuff in .util used only by these codecs?

        I'm thinking any specialized code in o.a.l.util.packed, and probably o.a.l.util.hash?
        I think if its only used in one place we should just move it underneath the respective codecs/ package (not a .util)

        If it turns out we want it for other things then we can look at it then.

        Show
        Robert Muir added a comment - Is there any special purpose stuff in .util used only by these codecs? I'm thinking any specialized code in o.a.l.util.packed, and probably o.a.l.util.hash? I think if its only used in one place we should just move it underneath the respective codecs/ package (not a .util) If it turns out we want it for other things then we can look at it then.
        Hide
        Adrien Grand added a comment -

        I think all classes in o.a.l.util.packed can be used (either directly or transitively) by DocValues at least. However, o.a.l.util.hash can be moved to lucene/codecs.

        Show
        Adrien Grand added a comment - I think all classes in o.a.l.util.packed can be used (either directly or transitively) by DocValues at least. However, o.a.l.util.hash can be moved to lucene/codecs.
        Hide
        Adrien Grand added a comment -

        I was trying to move the bloom utilities (oal.util.hash and oal.util.FuzzySet) to lucene/codecs but the problem is that SolrResourceLoader imports HashFunction in order to be able to reload SPI implementations.

        Robert just suggested that a SPI is maybe a little over-engineered here so I'm thinking of hardwiring HashFunction to MurmurHash2 (if at some point we feel we would like to change the hash implementation, all we need to do is to bump the version number).

        Show
        Adrien Grand added a comment - I was trying to move the bloom utilities (oal.util.hash and oal.util.FuzzySet) to lucene/codecs but the problem is that SolrResourceLoader imports HashFunction in order to be able to reload SPI implementations. Robert just suggested that a SPI is maybe a little over-engineered here so I'm thinking of hardwiring HashFunction to MurmurHash2 (if at some point we feel we would like to change the hash implementation, all we need to do is to bump the version number).
        Hide
        Michael McCandless added a comment -

        I'm thinking of hardwiring HashFunction to MurmurHash2

        +1

        Show
        Michael McCandless added a comment - I'm thinking of hardwiring HashFunction to MurmurHash2 +1
        Hide
        Adrien Grand added a comment -

        Patch for moving bloom utilities to lucene/codecs. You need to run these svn commands first:

        svn mv lucene/core/src/java/org/apache/lucene/util/hash/HashFunction.java lucene/codecs/src/java/org/apache/lucene/codecs/bloom
        svn mv lucene/core/src/java/org/apache/lucene/util/hash/MurmurHash2.java lucene/codecs/src/java/org/apache/lucene/codecs/bloom
        svn del lucene/core/src/java/org/apache/lucene/util/hash/
        svn mv lucene/core/src/java/org/apache/lucene/util/FuzzySet.java lucene/codecs/src/java/org/apache/lucene/codecs/bloom
        svn del lucene/core/src/resources/META-INF/services/org.apache.lucene.util.hash.HashFunction
        
        Show
        Adrien Grand added a comment - Patch for moving bloom utilities to lucene/codecs. You need to run these svn commands first: svn mv lucene/core/src/java/org/apache/lucene/util/hash/HashFunction.java lucene/codecs/src/java/org/apache/lucene/codecs/bloom svn mv lucene/core/src/java/org/apache/lucene/util/hash/MurmurHash2.java lucene/codecs/src/java/org/apache/lucene/codecs/bloom svn del lucene/core/src/java/org/apache/lucene/util/hash/ svn mv lucene/core/src/java/org/apache/lucene/util/FuzzySet.java lucene/codecs/src/java/org/apache/lucene/codecs/bloom svn del lucene/core/src/resources/META-INF/services/org.apache.lucene.util.hash.HashFunction
        Hide
        Uwe Schindler added a comment -

        Can you remove the public ctor (default) one from MurmurHash (make it private) - it is a singleton (INSTANCE field)? And make it final then (if it has no public ctor anymore).

        Show
        Uwe Schindler added a comment - Can you remove the public ctor (default) one from MurmurHash (make it private) - it is a singleton (INSTANCE field)? And make it final then (if it has no public ctor anymore).
        Hide
        Adrien Grand added a comment -

        Thanks for your comments, Uwe. I just committed (r1381504 and r1381512 on trunk, r1381541 on branch 4.x).

        Show
        Adrien Grand added a comment - Thanks for your comments, Uwe. I just committed (r1381504 and r1381512 on trunk, r1381541 on branch 4.x).
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Adrien Grand
        http://svn.apache.org/viewvc?view=revision&revision=1381541

        LUCENE-4340: Move bloom PF utilities to lucene/codecs (merged from r1381504 and r1381512).

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1381541 LUCENE-4340 : Move bloom PF utilities to lucene/codecs (merged from r1381504 and r1381512).
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Adrien Grand
        http://svn.apache.org/viewvc?view=revision&revision=1381126

        LUCENE-4340: move non-default codec, postings format and terms dictionary implementations to lucene/codecs (manually merged from r1381071 and r1381086).

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Adrien Grand http://svn.apache.org/viewvc?view=revision&revision=1381126 LUCENE-4340 : move non-default codec, postings format and terms dictionary implementations to lucene/codecs (manually merged from r1381071 and r1381086).
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Adrien Grand
            Reporter:
            Adrien Grand
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development