Solr
  1. Solr
  2. SOLR-2259

Improve analyzer/version handling in Solr

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      We added Version for backwards compatibility support in Lucene.
      We use this to fire deprecated code to emulate old version to ensure index backwards compat.
      Related: we deprecate old analysis components and eventually remove them.

      To hook into Solr, at first it defaulted to Version 2.4 emulation everywhere, with the example having the latest.
      if you don't specify a version in your solrconfig, it defaults to 2.4 though.

      However, as of LUCENE-2781 2.4 is removed: but users with old configs that don't specify a version should not be silently "upgraded" to the Version 3.0 emulation... this is bad.

      Additionally, when users are using deprecated emulation or using deprecated factories they might not know it, and it might come as a surprise if they upgrade, especially if they arent looking at java apis or java code.

      I propose:

      1. in trunk: we make the solrconfig luceneMatchVersion mandatory.
        This is simple: Uwe already has a method that will error out if its not present, we just use that.
      2. in 3.x: we warn if you don't specify luceneMatchVersion in solrconfig: telling you that its going to be required in 4.0 and that you are defaulting to 2.4 emulation.
        For example: Warning: luceneMatchVersion is not specified in solrconfig.xml. Defaulting to 2.4 emulation. You should at some point declare and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed in 4.0. This parameter will be mandatory in 4.0.
      3. in 3.x,trunk: we warn if you are using a deprecated matchVersion constant somewhere in general, even for a specific tokenizer, telling you that you need to at some point reindex with a current version before you can move to the next release.
        For example: Warning: you are using 2.4 emulation, at some point you need to bump and reindex to at least 3.0, because 2.4 emulation is deprecated in 3.x and will be removed in 4.0
      4. in 3.x,trunk: we warn if you are using a deprecated TokenStreamFactory so that you know its going to be removed.
        For example: Warning: the ISOLatin1FilterFactory is deprecated and will be removed in the next release. You should migrate to ASCIIFoldingFilterFactory.
      1. SOLR-2259_part3.patch
        1 kB
        Robert Muir
      2. SOLR-2259.patch
        25 kB
        Robert Muir
      3. SOLR-2259.patch
        22 kB
        Robert Muir
      4. SOLR-2259part2.patch
        1 kB
        Robert Muir
      5. SOLR-2259part4.patch
        4 kB
        Robert Muir

        Activity

        Hide
        Grant Ingersoll added a comment -

        Bulk close for 3.1.0 release

        Show
        Grant Ingersoll added a comment - Bulk close for 3.1.0 release
        Hide
        Robert Muir added a comment -

        Here is the patch for the last part, part 4.

        I added a warnDeprecated() helper method to the base class,
        and added messages for all deprecated classes in trunk.

        Show
        Robert Muir added a comment - Here is the patch for the last part, part 4. I added a warnDeprecated() helper method to the base class, and added messages for all deprecated classes in trunk.
        Hide
        Robert Muir added a comment -

        here's a patch for part3.

        Show
        Robert Muir added a comment - here's a patch for part3.
        Hide
        Robert Muir added a comment -

        I committed part 2 in revision 1050064.

        Show
        Robert Muir added a comment - I committed part 2 in revision 1050064.
        Hide
        Robert Muir added a comment -

        here is a patch for branch_3x for part 2.

        it warns if you are missing the luceneMatchVersion param in your config,
        informing you that its emulating Lucene 2.4 and that this emulation is deprecated,
        and that this parameter will be mandatory in 4.0

        Show
        Robert Muir added a comment - here is a patch for branch_3x for part 2. it warns if you are missing the luceneMatchVersion param in your config, informing you that its emulating Lucene 2.4 and that this emulation is deprecated, and that this parameter will be mandatory in 4.0
        Hide
        Robert Muir added a comment -

        I committed the patch for part 1 to trunk in revision 1040982: the luceneMatchVersion parameter is mandatory,
        and all tests files have it (its driven by the existing ant sysprop $tests.luceneMatchVersion).

        I backported just the versioning of the tests files to branch-3x in rev 1040986: this is just for consistency to make
        merging of any changes to these files easier, etc.

        Show
        Robert Muir added a comment - I committed the patch for part 1 to trunk in revision 1040982: the luceneMatchVersion parameter is mandatory, and all tests files have it (its driven by the existing ant sysprop $tests.luceneMatchVersion). I backported just the versioning of the tests files to branch-3x in rev 1040986: this is just for consistency to make merging of any changes to these files easier, etc.
        Hide
        Robert Muir added a comment -

        here's the updated patch, that uses the ant property $tests.luceneMatchVersion
        in all of the configs.

        if this sysprop isnt set (e.g. IDE) then it will use LUCENE_CURRENT,
        which will emit a warning, but probably good for casual running of tests from an IDE.

        Show
        Robert Muir added a comment - here's the updated patch, that uses the ant property $tests.luceneMatchVersion in all of the configs. if this sysprop isnt set (e.g. IDE) then it will use LUCENE_CURRENT, which will emit a warning, but probably good for casual running of tests from an IDE.
        Hide
        Robert Muir added a comment -

        we should add a general framework for these warnings in 3.x

        for #2, i think we might have to add a arg to the config method "warnIfDefaulting" or something...
        regardless of whether its missing, or actually specified as 2.9.x, we should warn that its using old emulation as a default too, so i think we should do an onOrAfter check.

        i think instance #3 can be addressed easiest with an onOrAfter check in the BaseTokenStreamFactory.assureMatchVersion, that prints the classname etc, since its the superclass
        for all tokenstreams.

        for #4, there are really not that many deprecated TokenStreamFactories, just a few, so i think
        it might be easiest to just add the log call to each for now...

        Show
        Robert Muir added a comment - we should add a general framework for these warnings in 3.x for #2, i think we might have to add a arg to the config method "warnIfDefaulting" or something... regardless of whether its missing, or actually specified as 2.9.x, we should warn that its using old emulation as a default too, so i think we should do an onOrAfter check. i think instance #3 can be addressed easiest with an onOrAfter check in the BaseTokenStreamFactory.assureMatchVersion, that prints the classname etc, since its the superclass for all tokenstreams. for #4, there are really not that many deprecated TokenStreamFactories, just a few, so i think it might be easiest to just add the log call to each for now...
        Hide
        Uwe Schindler added a comment -

        +1, thanks for the work!

        Your ideas sound great, we should add a general framework for these warnings in 3.x, not sure what would be the best idea to even generate good messages. Ideally the code could print @deprecated warnings, but those are not available to runtime

        Show
        Uwe Schindler added a comment - +1, thanks for the work! Your ideas sound great, we should add a general framework for these warnings in 3.x, not sure what would be the best idea to even generate good messages. Ideally the code could print @deprecated warnings, but those are not available to runtime
        Hide
        Robert Muir added a comment -

        here is a patch for part 1. This patch is intended for trunk only.

        it adds the required matchVersion, where missing, to any test/example configs, and makes it mandatory.

        Show
        Robert Muir added a comment - here is a patch for part 1. This patch is intended for trunk only. it adds the required matchVersion, where missing, to any test/example configs, and makes it mandatory.

          People

          • Assignee:
            Robert Muir
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development