Lucene - Core
  1. Lucene - Core
  2. LUCENE-1987

Remove rest of analysis deprecations (Token, CharacterCache)

    Details

    • Type: Task Task
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.9.1, 3.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      These removes the rest of the deprecations in the analysis package:

      • Token's termText field- (DONE)
      • eventually un-deprecate ctors of Token taking Strings (they are still useful) -> if yes remove deprec in 2.9.1 (DONE)
      • remove CharacterCache and use Character.valueOf() from Java5 (DONE)
      • Stopwords lists
      • Remove the backwards settings from analyzers (acronym, posIncr,...). They are deprecated, but we still have the VERSION constants. Do not know, how to proceed. Keep the settings alive for index compatibility? Or remove it together with the version constants (which were undeprecated).
      1. LUCENE-1987.patch
        15 kB
        Uwe Schindler
      2. LUCENE-1987.patch
        14 kB
        Uwe Schindler
      3. LUCENE-1987.patch
        14 kB
        Uwe Schindler
      4. LUCENE-1987-StopFilter.patch
        192 kB
        Uwe Schindler
      5. LUCENE-1987-StopFilter.patch
        190 kB
        Uwe Schindler
      6. LUCENE-1987-StopFilter.patch
        182 kB
        Uwe Schindler
      7. LUCENE-1987-StopFilter.patch
        32 kB
        Uwe Schindler
      8. LUCENE-1987-StopFilter.patch
        32 kB
        Uwe Schindler
      9. LUCENE-1987-StopFilter.patch
        31 kB
        Uwe Schindler
      10. LUCENE-1987-StopFilter-backport29.patch
        2 kB
        Uwe Schindler
      11. LUCENE-1987-StopFilter-BW.patch
        79 kB
        Uwe Schindler

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Pastch with the first three points. The three deprecated methods should stay alive in my opinion. Copying the string to the termbuffer in the ctor is the same linke copying the initial termbuffer. If we remove these ctors, we should also remove the setTermBuffer(String) method. This is no consistency.

          If the others agree to keep these three ctors alive I will apply an undeprecation in 2.9 branch.

          Show
          Uwe Schindler added a comment - Pastch with the first three points. The three deprecated methods should stay alive in my opinion. Copying the string to the termbuffer in the ctor is the same linke copying the initial termbuffer. If we remove these ctors, we should also remove the setTermBuffer(String) method. This is no consistency. If the others agree to keep these three ctors alive I will apply an undeprecation in 2.9 branch.
          Hide
          Uwe Schindler added a comment -

          Updated patch to last generics additions

          Show
          Uwe Schindler added a comment - Updated patch to last generics additions
          Hide
          Uwe Schindler added a comment -

          New patch. Will commit soon (+bw branch). I will also commit the un-depreactions to 2.9 branch.

          Still open are the StopFilter/StopAnalyzer patches (not sure, how to proceed here).

          Show
          Uwe Schindler added a comment - New patch. Will commit soon (+bw branch). I will also commit the un-depreactions to 2.9 branch. Still open are the StopFilter/StopAnalyzer patches (not sure, how to proceed here).
          Hide
          Uwe Schindler added a comment -

          Committed revision: 826404 and also in 2.9 branch.

          Show
          Uwe Schindler added a comment - Committed revision: 826404 and also in 2.9 branch.
          Hide
          Uwe Schindler added a comment - - edited

          Mike:
          You invented the VERSION constants for StandardAnalyzer and so on. How should we handle the deprecated settings inside? Just remove the get/setters and only use the version constant to enable the backwards settings? If I remove these settings internally in StopAnalyzer and StopFilter and Standard*, too, I also have to remove some version support.

          How would you proceed?

          Furthermore: The "deprecated" check for overridden tokenStream methods to fix this bug, if somebody subclasses a core-Analyzer, what to do with them. As the tokenStream methods is not deprecated in favour of reusableTokenStream, how to prevent this in future. Keep this reflection checks? Or make the Analyzers final?

          Show
          Uwe Schindler added a comment - - edited Mike: You invented the VERSION constants for StandardAnalyzer and so on. How should we handle the deprecated settings inside? Just remove the get/setters and only use the version constant to enable the backwards settings? If I remove these settings internally in StopAnalyzer and StopFilter and Standard*, too, I also have to remove some version support. How would you proceed? Furthermore: The "deprecated" check for overridden tokenStream methods to fix this bug, if somebody subclasses a core-Analyzer, what to do with them. As the tokenStream methods is not deprecated in favour of reusableTokenStream, how to prevent this in future. Keep this reflection checks? Or make the Analyzers final?
          Hide
          Michael McCandless added a comment -

          How should we handle the deprecated settings inside?

          For "replace invalid acronym" in StandardAnalyzer, there is no version setting that enables that currently, so I'm afraid we must keep those until 4.0 (since they impact your index by altering analysis)? Or, maybe we could add a Versions.LUCENE_23, which'd re-enable that bug (the default for that setting changed to "true" (= fix the bug) in 2.4). Hmm.

          For "stop position increments" in StandardAnalyzer, I think we should remove the direct getter/setters and expose only Versions.LUCENE_24 to get back to that behavior?

          StopFilter should keep its "stop position increments" setting (it's not deprecated).

          Show
          Michael McCandless added a comment - How should we handle the deprecated settings inside? For "replace invalid acronym" in StandardAnalyzer, there is no version setting that enables that currently, so I'm afraid we must keep those until 4.0 (since they impact your index by altering analysis)? Or, maybe we could add a Versions.LUCENE_23, which'd re-enable that bug (the default for that setting changed to "true" (= fix the bug) in 2.4). Hmm. For "stop position increments" in StandardAnalyzer, I think we should remove the direct getter/setters and expose only Versions.LUCENE_24 to get back to that behavior? StopFilter should keep its "stop position increments" setting (it's not deprecated).
          Hide
          Michael McCandless added a comment -

          Or make the Analyzers final?

          +1

          Show
          Michael McCandless added a comment - Or make the Analyzers final? +1
          Hide
          Uwe Schindler added a comment -

          Hallo Mike,

          attached is a patch with all deprecated methods removed (only the setOverridesTokenStream is still there, making Analyzers final is another thing to do).

          Also StopFilter and its stopWord ets were generified (to <?>, which is ok for every type of set, as CharArraySet uses toString() to convert everything to string when testing, so any set is fine)

          I only had the following problems and solution is here (StandardAnalyzer):

          enableStopPositionIncrements = matchVersion.onOrAfter(Version.LUCENE_29);
          replaceInvalidAcronym = matchVersion.onOrAfter(Version.LUCENE_23);
          

          The setting defaultPosIncr was removed (static method, so there is no default anymore). Because of that, the pre 2.9 default was false (which is now not changeable). So I set the posIncr to false for all older versions (this was the default before, but is now fixed as no static setter/sysprop anymore)

          For the invalid acronyms I added LUCENE_23 version constant, so for all versions >=2.3 it is enabled. If you want old behaviour, use LUCENE_22 or below.

          Mike: Can you review this?

          If you're ok with it I have to change 175 "new StandardAnalyzer()" occurences in tests

          Show
          Uwe Schindler added a comment - Hallo Mike, attached is a patch with all deprecated methods removed (only the setOverridesTokenStream is still there, making Analyzers final is another thing to do). Also StopFilter and its stopWord ets were generified (to <?>, which is ok for every type of set, as CharArraySet uses toString() to convert everything to string when testing, so any set is fine) I only had the following problems and solution is here (StandardAnalyzer): enableStopPositionIncrements = matchVersion.onOrAfter(Version.LUCENE_29); replaceInvalidAcronym = matchVersion.onOrAfter(Version.LUCENE_23); The setting defaultPosIncr was removed (static method, so there is no default anymore). Because of that, the pre 2.9 default was false (which is now not changeable). So I set the posIncr to false for all older versions (this was the default before, but is now fixed as no static setter/sysprop anymore) For the invalid acronyms I added LUCENE_23 version constant, so for all versions >=2.3 it is enabled. If you want old behaviour, use LUCENE_22 or below. Mike: Can you review this? If you're ok with it I have to change 175 "new StandardAnalyzer()" occurences in tests
          Hide
          Uwe Schindler added a comment -

          If we are fine with that, I would backport the version constants and the default setting to 2.9.x

          Show
          Uwe Schindler added a comment - If we are fine with that, I would backport the version constants and the default setting to 2.9.x
          Hide
          Uwe Schindler added a comment -

          Correct patch.

          Show
          Uwe Schindler added a comment - Correct patch.
          Hide
          Michael McCandless added a comment -

          I'll have a look, but one thing is invalid acronym replacement should be enabled if version >= 2.4, not >= 2.3. Ie, if version is 2.3, the bug is still present.

          Show
          Michael McCandless added a comment - I'll have a look, but one thing is invalid acronym replacement should be enabled if version >= 2.4, not >= 2.3. Ie, if version is 2.3, the bug is still present.
          Hide
          Uwe Schindler added a comment -

          LUCENE-1068 says: Fix version 2.3

          Show
          Uwe Schindler added a comment - LUCENE-1068 says: Fix version 2.3
          Hide
          Uwe Schindler added a comment -

          Javadocs fixes.

          Show
          Uwe Schindler added a comment - Javadocs fixes.
          Hide
          Michael McCandless added a comment -

          Why add 2.0, 2.1. 2.2 versions? We don't anywhere emulate bugs based on those, right? Otherwise, patch looks great! Thanks Uwe. Nice to see StandardAnalyzer clean again.

          Show
          Michael McCandless added a comment - Why add 2.0, 2.1. 2.2 versions? We don't anywhere emulate bugs based on those, right? Otherwise, patch looks great! Thanks Uwe. Nice to see StandardAnalyzer clean again.
          Hide
          Uwe Schindler added a comment -

          I just added also 20 and 21. I can remove them again (20 and 21).
          22 is needed because the invalidAcronym thing is there in 2.2 and fixed in 2.3 (according to LUCENE-1068).

          Show
          Uwe Schindler added a comment - I just added also 20 and 21. I can remove them again (20 and 21). 22 is needed because the invalidAcronym thing is there in 2.2 and fixed in 2.3 (according to LUCENE-1068 ).
          Hide
          Michael McCandless added a comment -

          LUCENE-1068 says: Fix version 2.3

          Right, that bug was fixed in 2.3, however with that fix the buggy behavior was kept by default. In 2.4 we then fixed the default to be true, ie, the bug would be fixed by default. So if I were to specify VERSION_23, I should get the buggy behavior, but if I specify VERSION_24, I should get the correct behavior.

          Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed.

          Show
          Michael McCandless added a comment - LUCENE-1068 says: Fix version 2.3 Right, that bug was fixed in 2.3, however with that fix the buggy behavior was kept by default. In 2.4 we then fixed the default to be true, ie, the bug would be fixed by default. So if I were to specify VERSION_23, I should get the buggy behavior, but if I specify VERSION_24, I should get the correct behavior. Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed.
          Hide
          Uwe Schindler added a comment -

          Updated patch with LUCENE_24. I did not remove the other version constants, because then we have them and can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The user should be free to give his version he used before for this backwards compatibility.

          Mike: Should I backport the setting for 2.4 to 2.9 to enable plugin-replacements from 2.9.1 to 3.0?

          Show
          Uwe Schindler added a comment - Updated patch with LUCENE_24. I did not remove the other version constants, because then we have them and can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The user should be free to give his version he used before for this backwards compatibility. Mike: Should I backport the setting for 2.4 to 2.9 to enable plugin-replacements from 2.9.1 to 3.0?
          Hide
          Uwe Schindler added a comment - - edited

          Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed.

          Do you mean I should add the default ctor of StandardAnalyzer() and rewire it to LUCENE_CURRENT? We have to put this in the docs, that from 3.0 on, the standard analyzer's default ctor now no longer behaves like 2.4, but always uses the newest features.

          That would help me lot with the tests....

          Show
          Uwe Schindler added a comment - - edited Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed. Do you mean I should add the default ctor of StandardAnalyzer() and rewire it to LUCENE_CURRENT? We have to put this in the docs, that from 3.0 on, the standard analyzer's default ctor now no longer behaves like 2.4, but always uses the newest features. That would help me lot with the tests....
          Hide
          Michael McCandless added a comment -

          I did not remove the other version constants, because then we have them and can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The user should be free to give his version he used before for this backwards compatibility.

          OK I think that's reasonable.

          Mike: Should I backport the setting for 2.4 to 2.9 to enable plugin-replacements from 2.9.1 to 3.0?

          +1

          Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed.

          Do you mean I should add the default ctor of StandardAnalyzer() and rewire it to LUCENE_CURRENT?

          Sorry, I wasn't clear...

          No – I don't think we should ever have a ctor that defaults to LUCENE_CURRENT. That's a back compat trap (and it just gets us back to where we started when we had no explicit version). Users must be explicit about which version they want.

          What I meant was: when fixing some sneaky bug in the future, we should never set the default so that the bug is still present (as we did on the first go of "invalid acronyms"), expecting new users to realize they have to go out of their way to tell Lucene not to emulate the bug. Instead, the default going forward (if version >= next-release-version) should be "the bug is fixed".

          Show
          Michael McCandless added a comment - I did not remove the other version constants, because then we have them and can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The user should be free to give his version he used before for this backwards compatibility. OK I think that's reasonable. Mike: Should I backport the setting for 2.4 to 2.9 to enable plugin-replacements from 2.9.1 to 3.0? +1 Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed. Do you mean I should add the default ctor of StandardAnalyzer() and rewire it to LUCENE_CURRENT? Sorry, I wasn't clear... No – I don't think we should ever have a ctor that defaults to LUCENE_CURRENT. That's a back compat trap (and it just gets us back to where we started when we had no explicit version). Users must be explicit about which version they want. What I meant was: when fixing some sneaky bug in the future, we should never set the default so that the bug is still present (as we did on the first go of "invalid acronyms"), expecting new users to realize they have to go out of their way to tell Lucene not to emulate the bug. Instead, the default going forward (if version >= next-release-version) should be "the bug is fixed".
          Hide
          Uwe Schindler added a comment -

          OK, I fix the tests using find/grep/sed

          Show
          Uwe Schindler added a comment - OK, I fix the tests using find/grep/sed
          Hide
          Uwe Schindler added a comment -

          Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1):

          All core tests pass, all bw tests pass. Most contrib tests also pass, but we have the following problems and inconsistencies:

          • benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer
          • Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode (current) it fails because the position increments of stop words are not correctly respected. This fails in addition/combination with the following:
          • Very bad inconsistency: The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words -> bäng. We should change the default for QueryParser(+ contrib QP), too. There is march rework needed and much documentation. The tests in core now pass, as most parts use StandardAnalyzer in 2.9 mode but have no stop words. And the special tests explicitely set the posIncr flag. This is totally disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new StandardAnalyzer with LUCENE_CURRENT).
          • XMLQueryParser also fails with latest StandardAnalyzer version, because it cannot set the flag in QueryParser. In my opinion, the query parser should take the flag from the analyzer, but this is not easy to fix.
          • All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too!

          What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and Paracetamol and beer is not enough to think clear again...

          And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such!

          Show
          Uwe Schindler added a comment - Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1): All core tests pass, all bw tests pass. Most contrib tests also pass, but we have the following problems and inconsistencies: benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode (current) it fails because the position increments of stop words are not correctly respected. This fails in addition/combination with the following: Very bad inconsistency: The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words -> bäng. We should change the default for QueryParser(+ contrib QP), too. There is march rework needed and much documentation. The tests in core now pass, as most parts use StandardAnalyzer in 2.9 mode but have no stop words. And the special tests explicitely set the posIncr flag. This is totally disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new StandardAnalyzer with LUCENE_CURRENT). XMLQueryParser also fails with latest StandardAnalyzer version, because it cannot set the flag in QueryParser. In my opinion, the query parser should take the flag from the analyzer, but this is not easy to fix. All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too! What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and Paracetamol and beer is not enough to think clear again... And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such!
          Hide
          Robert Muir added a comment -

          All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too!

          Personally I would not be against this, not sure yet... downside would be more complexity and maintenance
          Upside would be that we could improve these analyzers in various ways, without annoying users

          benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer

          I also personally like having default ctor... its convienient and nice to be able to look at what these analyzers do in Luke, etc
          But I think this goes against the version flag concept? (because if users just set it to LUCENE_CURRENT then its doing nothing?)
          But I wonder if users do this anyway... maybe the default should really be LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on you as the user to set the flag right if you don't want to reindex?

          Show
          Robert Muir added a comment - All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too! Personally I would not be against this, not sure yet... downside would be more complexity and maintenance Upside would be that we could improve these analyzers in various ways, without annoying users benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer I also personally like having default ctor... its convienient and nice to be able to look at what these analyzers do in Luke, etc But I think this goes against the version flag concept? (because if users just set it to LUCENE_CURRENT then its doing nothing?) But I wonder if users do this anyway... maybe the default should really be LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on you as the user to set the flag right if you don't want to reindex?
          Hide
          Michael McCandless added a comment -

          All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too!

          Ugh, this is because they embed StopFilter, right? One option might be to simply keep StopFilter's deprecated static methods for setting the default? Though I think adding Version to them over time is the right thing to do (though more work, today).

          benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer

          When the no-arg ctor is unavailable, can we fallback to looking for a ctor that takes Version? For now we should just pass LUCENE_CURRENT; a future enhancement to benchmark can allow specifying version compat.

          The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words

          Hmm. How about adding Version to QP ctor?

          And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such!

          OK, I agree. I'll try to do this in the future!

          Show
          Michael McCandless added a comment - All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too! Ugh, this is because they embed StopFilter, right? One option might be to simply keep StopFilter's deprecated static methods for setting the default? Though I think adding Version to them over time is the right thing to do (though more work, today). benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer When the no-arg ctor is unavailable, can we fallback to looking for a ctor that takes Version? For now we should just pass LUCENE_CURRENT; a future enhancement to benchmark can allow specifying version compat. The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words Hmm. How about adding Version to QP ctor? And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such! OK, I agree. I'll try to do this in the future!
          Hide
          Michael McCandless added a comment -

          maybe the default should really be LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on you as the user to set the flag right if you don't want to reindex?

          The problem is that this is not very different from saying "the onus is on the user to call the setXYZ method to get back to the old buggy behavior", which at least last time we discussed back-compat was controversial (ie, it's a change to our drop-in back-compat policy).

          Show
          Michael McCandless added a comment - maybe the default should really be LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on you as the user to set the flag right if you don't want to reindex? The problem is that this is not very different from saying "the onus is on the user to call the setXYZ method to get back to the old buggy behavior", which at least last time we discussed back-compat was controversial (ie, it's a change to our drop-in back-compat policy).
          Hide
          Robert Muir added a comment -

          Ugh, this is because they embed StopFilter, right? One option might be to simply keep StopFilter's deprecated static methods for setting the default? Though I think adding Version to them over time is the right thing to do (though more work, today).

          not just this. Many use StandardTokenizer, so they have same invalid acronym, etc issues StandardAnalyzer has. But, this versioning/etc is all managed at StandardAnalyzer level (system properties, version numbers, etc)... when it also affects these other analyzers too.

          Show
          Robert Muir added a comment - Ugh, this is because they embed StopFilter, right? One option might be to simply keep StopFilter's deprecated static methods for setting the default? Though I think adding Version to them over time is the right thing to do (though more work, today). not just this. Many use StandardTokenizer, so they have same invalid acronym, etc issues StandardAnalyzer has. But, this versioning/etc is all managed at StandardAnalyzer level (system properties, version numbers, etc)... when it also affects these other analyzers too.
          Hide
          Robert Muir added a comment -

          The problem is that this is not very different from saying "the onus is on the user to call the setXYZ method to get back to the old buggy behavior", which at least last time we discussed back-compat was controversial (ie, it's a change to our drop-in back-compat policy).

          Michael, yes I agree with you. What I am wondering is: is it really working in practice/in spirit? Forcing the user to supply the version, well it does make them look at the warning in the Version class, which is good. But nothing stops them from just using CURRENT.

          Use this to get the latest & greatest settings, bug fixes, etc, for Lucene.
          

          followed by the big bold warning about backwards compatibility. just curious what most users are doing, sacrificing drop-in for "latest and greatest?"

          I do think we should do things to improve contrib analyzers that are still stuck with this buggy behavior at some point: i.e LUCENE-1373.
          But maybe we don't need the Version with contrib analyzers, since you should be able to use an older lucene-analyzers jar file with new lucene if you want the back compat????

          (sorry to stray somewhat off-topic)

          Show
          Robert Muir added a comment - The problem is that this is not very different from saying "the onus is on the user to call the setXYZ method to get back to the old buggy behavior", which at least last time we discussed back-compat was controversial (ie, it's a change to our drop-in back-compat policy). Michael, yes I agree with you. What I am wondering is: is it really working in practice/in spirit? Forcing the user to supply the version, well it does make them look at the warning in the Version class, which is good. But nothing stops them from just using CURRENT. Use this to get the latest & greatest settings, bug fixes, etc, for Lucene. followed by the big bold warning about backwards compatibility. just curious what most users are doing, sacrificing drop-in for "latest and greatest?" I do think we should do things to improve contrib analyzers that are still stuck with this buggy behavior at some point: i.e LUCENE-1373 . But maybe we don't need the Version with contrib analyzers, since you should be able to use an older lucene-analyzers jar file with new lucene if you want the back compat???? (sorry to stray somewhat off-topic)
          Hide
          Uwe Schindler added a comment -

          To move back to my other problem:
          How to handle the problem with LUCENE_29 setting and the posIncr of stopwords together with QueryParser that has a default setting of ignoring posIncr?:

          This leads to the problem, that a phrase query does not hit anything if you index with StandardAnalyzer=LUCENE_29 and QueryParser using the same analyzer but with setEnablePositionIncrements(false) [the current default for QueryParser].

          Show
          Uwe Schindler added a comment - To move back to my other problem: How to handle the problem with LUCENE_29 setting and the posIncr of stopwords together with QueryParser that has a default setting of ignoring posIncr?: This leads to the problem, that a phrase query does not hit anything if you index with StandardAnalyzer=LUCENE_29 and QueryParser using the same analyzer but with setEnablePositionIncrements(false) [the current default for QueryParser] .
          Hide
          Michael McCandless added a comment -

          How to handle the problem with LUCENE_29 setting and the posIncr of stopwords together with QueryParser that has a default setting of ignoring posIncr?

          How about adding required Version to QP ctor?

          Show
          Michael McCandless added a comment - How to handle the problem with LUCENE_29 setting and the posIncr of stopwords together with QueryParser that has a default setting of ignoring posIncr? How about adding required Version to QP ctor?
          Hide
          Uwe Schindler added a comment -

          A new patch which resolves the Benchmark problem by adding a static method in NewAnalyzerTask that loads an analyzer by class name:

          public static final Analyzer createAnalyzer(String className) throws Exception{
              final Class<? extends Analyzer> clazz = Class.forName(className).asSubclass(Analyzer.class);
              try {
                // first try to use a ctor with version parameter (needed for many new Analyzers that have no default one anymore
                Constructor<? extends Analyzer> cnstr = clazz.getConstructor(Version.class);
                return cnstr.newInstance(Version.LUCENE_CURRENT);
              } catch (NoSuchMethodException nsme) {
                // otherwise use default ctor
                return clazz.newInstance();
              }
          }
          

          This method is reused at other places where an Analyzer is created by a config property.

          This patch now passes all test. There are still the problems with Analyzer and QueryParsr with wrong default properties, but I would like to commit this first and then solve the problems, also in 2.9.1.

          Mike, are you OK with that?

          Show
          Uwe Schindler added a comment - A new patch which resolves the Benchmark problem by adding a static method in NewAnalyzerTask that loads an analyzer by class name: public static final Analyzer createAnalyzer( String className) throws Exception{ final Class <? extends Analyzer> clazz = Class .forName(className).asSubclass(Analyzer.class); try { // first try to use a ctor with version parameter (needed for many new Analyzers that have no default one anymore Constructor<? extends Analyzer> cnstr = clazz.getConstructor(Version.class); return cnstr.newInstance(Version.LUCENE_CURRENT); } catch (NoSuchMethodException nsme) { // otherwise use default ctor return clazz.newInstance(); } } This method is reused at other places where an Analyzer is created by a config property. This patch now passes all test. There are still the problems with Analyzer and QueryParsr with wrong default properties, but I would like to commit this first and then solve the problems, also in 2.9.1. Mike, are you OK with that?
          Hide
          Michael McCandless added a comment -

          Mike, are you OK with that?

          Looks great! Not only am I OK with it, it's exactly what I proposed (above – https://issues.apache.org/jira/browse/LUCENE-1987?focusedCommentId=12767449&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767449). Maybe you missed my response there? (I also suggested adding Version to QP ctor).

          Show
          Michael McCandless added a comment - Mike, are you OK with that? Looks great! Not only am I OK with it, it's exactly what I proposed (above – https://issues.apache.org/jira/browse/LUCENE-1987?focusedCommentId=12767449&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767449 ). Maybe you missed my response there? (I also suggested adding Version to QP ctor).
          Hide
          Uwe Schindler added a comment -

          I have seen your comment yesterday and implemented the benchmark thing that way.

          The QP ctor with Version param also looks good, but we have to add this to 2.9, too, to be able to remove the no-arg ctor, too.

          My patch still has a failed test int the ant task (missing no-arg ctor), will look into it, but fix is same like for benchmark.

          Show
          Uwe Schindler added a comment - I have seen your comment yesterday and implemented the benchmark thing that way. The QP ctor with Version param also looks good, but we have to add this to 2.9, too, to be able to remove the no-arg ctor, too. My patch still has a failed test int the ant task (missing no-arg ctor), will look into it, but fix is same like for benchmark.
          Hide
          Uwe Schindler added a comment -

          Fix ant task.

          Show
          Uwe Schindler added a comment - Fix ant task.
          Hide
          Uwe Schindler added a comment -

          Committed in 2.9, 3.0, backwards branch.

          For the QueryParser problems and other additions of version constants I will open another issue.

          Show
          Uwe Schindler added a comment - Committed in 2.9, 3.0, backwards branch. For the QueryParser problems and other additions of version constants I will open another issue.
          Hide
          Michael McCandless added a comment -

          Bulk close all 2.9.1 issues.

          Show
          Michael McCandless added a comment - Bulk close all 2.9.1 issues.

            People

            • Assignee:
              Uwe Schindler
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development