[LUCENE-1987] Remove rest of analysis deprecations (Token, CharacterCache) - ASF JIRA

Details

Type: Task
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 2.9.1, 3.0
Component/s: modules/analysis
Labels:
None

Lucene Fields:

New

Description

These removes the rest of the deprecations in the analysis package:

~~Token's termText field~~- (DONE)
~~eventually un-deprecate ctors of Token taking Strings (they are still useful) -> if yes remove deprec in 2.9.1~~ (DONE)
~~remove CharacterCache and use Character.valueOf() from Java5~~ (DONE)
Stopwords lists
Remove the backwards settings from analyzers (acronym, posIncr,...). They are deprecated, but we still have the VERSION constants. Do not know, how to proceed. Keep the settings alive for index compatibility? Or remove it together with the version constants (which were undeprecated).

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-1987.patch
18/Oct/09 10:26
15 kB
Uwe Schindler
LUCENE-1987.patch
17/Oct/09 10:01
14 kB
Uwe Schindler
LUCENE-1987.patch
16/Oct/09 21:36
14 kB
Uwe Schindler
LUCENE-1987-StopFilter.patch
21/Oct/09 11:33
192 kB
Uwe Schindler
LUCENE-1987-StopFilter.patch
21/Oct/09 10:26
190 kB
Uwe Schindler
LUCENE-1987-StopFilter.patch
19/Oct/09 17:52
182 kB
Uwe Schindler
LUCENE-1987-StopFilter.patch
19/Oct/09 10:09
32 kB
Uwe Schindler
LUCENE-1987-StopFilter.patch
19/Oct/09 09:54
32 kB
Uwe Schindler
LUCENE-1987-StopFilter.patch
19/Oct/09 09:44
31 kB
Uwe Schindler
LUCENE-1987-StopFilter-backport29.patch
19/Oct/09 17:52
2 kB
Uwe Schindler
LUCENE-1987-StopFilter-BW.patch
19/Oct/09 17:52
79 kB
Uwe Schindler

Issue Links

incorporates

LUCENE-1988 CharacterCache - references deleted

Resolved

is blocked by

LUCENE-1989 CharArraySet cannot be made generic, because it violates the Set<char[]> interface

Reopened

is related to

LUCENE-2002 Add oal.util.Version ctor to QueryParser

Closed

LUCENE-2003 Highlighter has problems when you use StandardAnalyzer with LUCENE_29 or simplier StopFilter with stopWordsPosIncr mode switched on

Closed

Activity

Ascending order - Click to sort in descending order

Uwe Schindler added a comment - 16/Oct/09 21:36

Pastch with the first three points. The three deprecated methods should stay alive in my opinion. Copying the string to the termbuffer in the ctor is the same linke copying the initial termbuffer. If we remove these ctors, we should also remove the setTermBuffer(String) method. This is no consistency.

If the others agree to keep these three ctors alive I will apply an undeprecation in 2.9 branch.

Uwe Schindler added a comment - 16/Oct/09 21:36 Pastch with the first three points. The three deprecated methods should stay alive in my opinion. Copying the string to the termbuffer in the ctor is the same linke copying the initial termbuffer. If we remove these ctors, we should also remove the setTermBuffer(String) method. This is no consistency. If the others agree to keep these three ctors alive I will apply an undeprecation in 2.9 branch.

Uwe Schindler added a comment - 17/Oct/09 10:01

Updated patch to last generics additions

Uwe Schindler added a comment - 17/Oct/09 10:01 Updated patch to last generics additions

Uwe Schindler added a comment - 18/Oct/09 10:26

New patch. Will commit soon (+bw branch). I will also commit the un-depreactions to 2.9 branch.

Still open are the StopFilter/StopAnalyzer patches (not sure, how to proceed here).

Uwe Schindler added a comment - 18/Oct/09 10:26 New patch. Will commit soon (+bw branch). I will also commit the un-depreactions to 2.9 branch. Still open are the StopFilter/StopAnalyzer patches (not sure, how to proceed here).

Uwe Schindler added a comment - 18/Oct/09 10:38

Committed revision: 826404 and also in 2.9 branch.

Uwe Schindler added a comment - 18/Oct/09 10:38 Committed revision: 826404 and also in 2.9 branch.

Uwe Schindler added a comment - 18/Oct/09 12:02 - edited

Mike:
You invented the VERSION constants for StandardAnalyzer and so on. How should we handle the deprecated settings inside? Just remove the get/setters and only use the version constant to enable the backwards settings? If I remove these settings internally in StopAnalyzer and StopFilter and Standard*, too, I also have to remove some version support.

How would you proceed?

Furthermore: The "deprecated" check for overridden tokenStream methods to fix this bug, if somebody subclasses a core-Analyzer, what to do with them. As the tokenStream methods is not deprecated in favour of reusableTokenStream, how to prevent this in future. Keep this reflection checks? Or make the Analyzers final?

Uwe Schindler added a comment - 18/Oct/09 12:02 - edited Mike: You invented the VERSION constants for StandardAnalyzer and so on. How should we handle the deprecated settings inside? Just remove the get/setters and only use the version constant to enable the backwards settings? If I remove these settings internally in StopAnalyzer and StopFilter and Standard*, too, I also have to remove some version support. How would you proceed? Furthermore: The "deprecated" check for overridden tokenStream methods to fix this bug, if somebody subclasses a core-Analyzer, what to do with them. As the tokenStream methods is not deprecated in favour of reusableTokenStream, how to prevent this in future. Keep this reflection checks? Or make the Analyzers final?

Michael McCandless added a comment - 18/Oct/09 13:28

How should we handle the deprecated settings inside?

For "replace invalid acronym" in StandardAnalyzer, there is no version setting that enables that currently, so I'm afraid we must keep those until 4.0 (since they impact your index by altering analysis)? Or, maybe we could add a Versions.LUCENE_23, which'd re-enable that bug (the default for that setting changed to "true" (= fix the bug) in 2.4). Hmm.

For "stop position increments" in StandardAnalyzer, I think we should remove the direct getter/setters and expose only Versions.LUCENE_24 to get back to that behavior?

StopFilter should keep its "stop position increments" setting (it's not deprecated).

Michael McCandless added a comment - 18/Oct/09 13:28 How should we handle the deprecated settings inside? For "replace invalid acronym" in StandardAnalyzer, there is no version setting that enables that currently, so I'm afraid we must keep those until 4.0 (since they impact your index by altering analysis)? Or, maybe we could add a Versions.LUCENE_23, which'd re-enable that bug (the default for that setting changed to "true" (= fix the bug) in 2.4). Hmm. For "stop position increments" in StandardAnalyzer, I think we should remove the direct getter/setters and expose only Versions.LUCENE_24 to get back to that behavior? StopFilter should keep its "stop position increments" setting (it's not deprecated).

Michael McCandless added a comment - 18/Oct/09 13:29

Or make the Analyzers final?

Michael McCandless added a comment - 18/Oct/09 13:29 Or make the Analyzers final? +1

Uwe Schindler added a comment - 19/Oct/09 09:40

Hallo Mike,

attached is a patch with all deprecated methods removed (only the setOverridesTokenStream is still there, making Analyzers final is another thing to do).

Also StopFilter and its stopWord ets were generified (to <?>, which is ok for every type of set, as CharArraySet uses toString() to convert everything to string when testing, so any set is fine)

I only had the following problems and solution is here (StandardAnalyzer):

enableStopPositionIncrements = matchVersion.onOrAfter(Version.LUCENE_29);
replaceInvalidAcronym = matchVersion.onOrAfter(Version.LUCENE_23);

The setting defaultPosIncr was removed (static method, so there is no default anymore). Because of that, the pre 2.9 default was false (which is now not changeable). So I set the posIncr to false for all older versions (this was the default before, but is now fixed as no static setter/sysprop anymore)

For the invalid acronyms I added LUCENE_23 version constant, so for all versions >=2.3 it is enabled. If you want old behaviour, use LUCENE_22 or below.

Mike: Can you review this?

If you're ok with it I have to change 175 "new StandardAnalyzer()" occurences in tests

Uwe Schindler added a comment - 19/Oct/09 09:40 Hallo Mike, attached is a patch with all deprecated methods removed (only the setOverridesTokenStream is still there, making Analyzers final is another thing to do). Also StopFilter and its stopWord ets were generified (to <?>, which is ok for every type of set, as CharArraySet uses toString() to convert everything to string when testing, so any set is fine) I only had the following problems and solution is here (StandardAnalyzer): enableStopPositionIncrements = matchVersion.onOrAfter(Version.LUCENE_29); replaceInvalidAcronym = matchVersion.onOrAfter(Version.LUCENE_23); The setting defaultPosIncr was removed (static method, so there is no default anymore). Because of that, the pre 2.9 default was false (which is now not changeable). So I set the posIncr to false for all older versions (this was the default before, but is now fixed as no static setter/sysprop anymore) For the invalid acronyms I added LUCENE_23 version constant, so for all versions >=2.3 it is enabled. If you want old behaviour, use LUCENE_22 or below. Mike: Can you review this? If you're ok with it I have to change 175 "new StandardAnalyzer()" occurences in tests

Uwe Schindler added a comment - 19/Oct/09 09:41

If we are fine with that, I would backport the version constants and the default setting to 2.9.x

Uwe Schindler added a comment - 19/Oct/09 09:41 If we are fine with that, I would backport the version constants and the default setting to 2.9.x

Uwe Schindler added a comment - 19/Oct/09 09:44

Correct patch.

Uwe Schindler added a comment - 19/Oct/09 09:44 Correct patch.

Michael McCandless added a comment - 19/Oct/09 09:46

I'll have a look, but one thing is invalid acronym replacement should be enabled if version >= 2.4, not >= 2.3. Ie, if version is 2.3, the bug is still present.

Michael McCandless added a comment - 19/Oct/09 09:46 I'll have a look, but one thing is invalid acronym replacement should be enabled if version >= 2.4, not >= 2.3. Ie, if version is 2.3, the bug is still present.

Uwe Schindler added a comment - 19/Oct/09 09:53

~~LUCENE-1068~~ says: Fix version 2.3

Uwe Schindler added a comment - 19/Oct/09 09:53 LUCENE-1068 says: Fix version 2.3

Uwe Schindler added a comment - 19/Oct/09 09:54

Javadocs fixes.

Uwe Schindler added a comment - 19/Oct/09 09:54 Javadocs fixes.

Michael McCandless added a comment - 19/Oct/09 09:54

Why add 2.0, 2.1. 2.2 versions? We don't anywhere emulate bugs based on those, right? Otherwise, patch looks great! Thanks Uwe. Nice to see StandardAnalyzer clean again.

Michael McCandless added a comment - 19/Oct/09 09:54 Why add 2.0, 2.1. 2.2 versions? We don't anywhere emulate bugs based on those, right? Otherwise, patch looks great! Thanks Uwe. Nice to see StandardAnalyzer clean again.

Uwe Schindler added a comment - 19/Oct/09 09:57

I just added also 20 and 21. I can remove them again (20 and 21).
22 is needed because the invalidAcronym thing is there in 2.2 and fixed in 2.3 (according to ~~LUCENE-1068~~).

Uwe Schindler added a comment - 19/Oct/09 09:57 I just added also 20 and 21. I can remove them again (20 and 21). 22 is needed because the invalidAcronym thing is there in 2.2 and fixed in 2.3 (according to LUCENE-1068 ).

Michael McCandless added a comment - 19/Oct/09 09:59

~~LUCENE-1068~~ says: Fix version 2.3

Right, that bug was fixed in 2.3, however with that fix the buggy behavior was kept by default. In 2.4 we then fixed the default to be true, ie, the bug would be fixed by default. So if I were to specify VERSION_23, I should get the buggy behavior, but if I specify VERSION_24, I should get the correct behavior.

Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed.

Michael McCandless added a comment - 19/Oct/09 09:59 LUCENE-1068 says: Fix version 2.3 Right, that bug was fixed in 2.3, however with that fix the buggy behavior was kept by default. In 2.4 we then fixed the default to be true, ie, the bug would be fixed by default. So if I were to specify VERSION_23, I should get the buggy behavior, but if I specify VERSION_24, I should get the correct behavior. Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed.

Uwe Schindler added a comment - 19/Oct/09 10:09

Updated patch with LUCENE_24. I did not remove the other version constants, because then we have them and can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The user should be free to give his version he used before for this backwards compatibility.

Mike: Should I backport the setting for 2.4 to 2.9 to enable plugin-replacements from 2.9.1 to 3.0?

Uwe Schindler added a comment - 19/Oct/09 10:09 Updated patch with LUCENE_24. I did not remove the other version constants, because then we have them and can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The user should be free to give his version he used before for this backwards compatibility. Mike: Should I backport the setting for 2.4 to 2.9 to enable plugin-replacements from 2.9.1 to 3.0?

Uwe Schindler added a comment - 19/Oct/09 10:13 - edited

Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed.

Do you mean I should add the default ctor of StandardAnalyzer() and rewire it to LUCENE_CURRENT? We have to put this in the docs, that from 3.0 on, the standard analyzer's default ctor now no longer behaves like 2.4, but always uses the newest features.

That would help me lot with the tests....

Uwe Schindler added a comment - 19/Oct/09 10:13 - edited Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed. Do you mean I should add the default ctor of StandardAnalyzer() and rewire it to LUCENE_CURRENT? We have to put this in the docs, that from 3.0 on, the standard analyzer's default ctor now no longer behaves like 2.4, but always uses the newest features. That would help me lot with the tests....

Michael McCandless added a comment - 19/Oct/09 11:44

I did not remove the other version constants, because then we have them and can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The user should be free to give his version he used before for this backwards compatibility.

OK I think that's reasonable.

Mike: Should I backport the setting for 2.4 to 2.9 to enable plugin-replacements from 2.9.1 to 3.0?

Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed.

Do you mean I should add the default ctor of StandardAnalyzer() and rewire it to LUCENE_CURRENT?

Sorry, I wasn't clear...

No – I don't think we should ever have a ctor that defaults to LUCENE_CURRENT. That's a back compat trap (and it just gets us back to where we started when we had no explicit version). Users must be explicit about which version they want.

What I meant was: when fixing some sneaky bug in the future, we should never set the default so that the bug is still present (as we did on the first go of "invalid acronyms"), expecting new users to realize they have to go out of their way to tell Lucene not to emulate the bug. Instead, the default going forward (if version >= next-release-version) should be "the bug is fixed".

Michael McCandless added a comment - 19/Oct/09 11:44 I did not remove the other version constants, because then we have them and can use them anywhere else. And a user coming from e.g. 2.2 to 3.0 can just use LUCENE_22 to match his old behaviour. The user should be free to give his version he used before for this backwards compatibility. OK I think that's reasonable. Mike: Should I backport the setting for 2.4 to 2.9 to enable plugin-replacements from 2.9.1 to 3.0? +1 Going forward, when we fix a bug but need to conditionally preserve the bug for back compat, we should use the version switching so that by default for new users (VERSION_CURRENT or VERSION_XX if XX is the next release) the bug is fixed. Do you mean I should add the default ctor of StandardAnalyzer() and rewire it to LUCENE_CURRENT? Sorry, I wasn't clear... No – I don't think we should ever have a ctor that defaults to LUCENE_CURRENT. That's a back compat trap (and it just gets us back to where we started when we had no explicit version). Users must be explicit about which version they want. What I meant was: when fixing some sneaky bug in the future, we should never set the default so that the bug is still present (as we did on the first go of "invalid acronyms"), expecting new users to realize they have to go out of their way to tell Lucene not to emulate the bug. Instead, the default going forward (if version >= next-release-version) should be "the bug is fixed".

Uwe Schindler added a comment - 19/Oct/09 11:58

OK, I fix the tests using find/grep/sed

Uwe Schindler added a comment - 19/Oct/09 11:58 OK, I fix the tests using find/grep/sed

Uwe Schindler added a comment - 19/Oct/09 17:52

Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1):

All core tests pass, all bw tests pass. Most contrib tests also pass, but we have the following problems and inconsistencies:

benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer
Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode (current) it fails because the position increments of stop words are not correctly respected. This fails in addition/combination with the following:
Very bad inconsistency: The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words -> bäng. We should change the default for QueryParser(+ contrib QP), too. There is march rework needed and much documentation. The tests in core now pass, as most parts use StandardAnalyzer in 2.9 mode but have no stop words. And the special tests explicitely set the posIncr flag. This is totally disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new StandardAnalyzer with LUCENE_CURRENT).
XMLQueryParser also fails with latest StandardAnalyzer version, because it cannot set the flag in QueryParser. In my opinion, the query parser should take the flag from the analyzer, but this is not easy to fix.
All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too!

What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and Paracetamol and beer is not enough to think clear again...

And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such!

Uwe Schindler added a comment - 19/Oct/09 17:52 Here 2 mega patches and one backport to 2.9 (want to get this in before 2.9.1): All core tests pass, all bw tests pass. Most contrib tests also pass, but we have the following problems and inconsistencies: benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer Highlighter only works, if StandardAnalyzer is in 2.4 mde, in 2.9 mode (current) it fails because the position increments of stop words are not correctly respected. This fails in addition/combination with the following: Very bad inconsistency: The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words -> bäng. We should change the default for QueryParser(+ contrib QP), too. There is march rework needed and much documentation. The tests in core now pass, as most parts use StandardAnalyzer in 2.9 mode but have no stop words. And the special tests explicitely set the posIncr flag. This is totally disturbed, it needs fixing! (it also affects 2.9.0, if somebody uses the new StandardAnalyzer with LUCENE_CURRENT). XMLQueryParser also fails with latest StandardAnalyzer version, because it cannot set the flag in QueryParser. In my opinion, the query parser should take the flag from the analyzer, but this is not easy to fix. All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too! What to do? After this StopFilter/StandardAnalyzer-hell-day Aspirin and Paracetamol and beer is not enough to think clear again... And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such!

Robert Muir added a comment - 19/Oct/09 18:17

All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too!

Personally I would not be against this, not sure yet... downside would be more complexity and maintenance
Upside would be that we could improve these analyzers in various ways, without annoying users

benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer

I also personally like having default ctor... its convienient and nice to be able to look at what these analyzers do in Luke, etc
But I think this goes against the version flag concept? (because if users just set it to LUCENE_CURRENT then its doing nothing?)
But I wonder if users do this anyway... maybe the default should really be LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on you as the user to set the flag right if you don't want to reindex?

Robert Muir added a comment - 19/Oct/09 18:17 All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too! Personally I would not be against this, not sure yet... downside would be more complexity and maintenance Upside would be that we could improve these analyzers in various ways, without annoying users benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer I also personally like having default ctor... its convienient and nice to be able to look at what these analyzers do in Luke, etc But I think this goes against the version flag concept? (because if users just set it to LUCENE_CURRENT then its doing nothing?) But I wonder if users do this anyway... maybe the default should really be LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on you as the user to set the flag right if you don't want to reindex?

Michael McCandless added a comment - 19/Oct/09 18:53

All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too!

Ugh, this is because they embed StopFilter, right? One option might be to simply keep StopFilter's deprecated static methods for setting the default? Though I think adding Version to them over time is the right thing to do (though more work, today).

benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer

When the no-arg ctor is unavailable, can we fallback to looking for a ctor that takes Version? For now we should just pass LUCENE_CURRENT; a future enhancement to benchmark can allow specifying version compat.

The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words

Hmm. How about adding Version to QP ctor?

And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such!

OK, I agree. I'll try to do this in the future!

Michael McCandless added a comment - 19/Oct/09 18:53 All contrib analyzers have stopWordPosIncr turned off (backwards compatibility). Maybe we need a Version Parameter in all analyzers there too! Ugh, this is because they embed StopFilter, right? One option might be to simply keep StopFilter's deprecated static methods for setting the default? Though I think adding Version to them over time is the right thing to do (though more work, today). benchmark does not work any longer, because StandardAnalyzer has no default ctor anymore and cannot be instantiated by reflection, same with StopAnalyzer When the no-arg ctor is unavailable, can we fallback to looking for a ctor that takes Version? For now we should just pass LUCENE_CURRENT; a future enhancement to benchmark can allow specifying version compat. The default of QueryParser is to ignore position increments, but the current version of StandardAnalyzer uses posIncr for stop words Hmm. How about adding Version to QP ctor? And please: next time when we deprecate APIs: remove all deprecated calls from tests and contrib and mark all deprecated-test as such! OK, I agree. I'll try to do this in the future!

Michael McCandless added a comment - 19/Oct/09 18:56

maybe the default should really be LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on you as the user to set the flag right if you don't want to reindex?

The problem is that this is not very different from saying "the onus is on the user to call the setXYZ method to get back to the old buggy behavior", which at least last time we discussed back-compat was controversial (ie, it's a change to our drop-in back-compat policy).

Michael McCandless added a comment - 19/Oct/09 18:56 maybe the default should really be LUCENE_CURRENT, and if you want the back compat-buggy behavior, the onus is on you as the user to set the flag right if you don't want to reindex? The problem is that this is not very different from saying "the onus is on the user to call the setXYZ method to get back to the old buggy behavior", which at least last time we discussed back-compat was controversial (ie, it's a change to our drop-in back-compat policy).

Robert Muir added a comment - 19/Oct/09 18:58

Ugh, this is because they embed StopFilter, right? One option might be to simply keep StopFilter's deprecated static methods for setting the default? Though I think adding Version to them over time is the right thing to do (though more work, today).

not just this. Many use StandardTokenizer, so they have same invalid acronym, etc issues StandardAnalyzer has. But, this versioning/etc is all managed at StandardAnalyzer level (system properties, version numbers, etc)... when it also affects these other analyzers too.

Robert Muir added a comment - 19/Oct/09 18:58 Ugh, this is because they embed StopFilter, right? One option might be to simply keep StopFilter's deprecated static methods for setting the default? Though I think adding Version to them over time is the right thing to do (though more work, today). not just this. Many use StandardTokenizer, so they have same invalid acronym, etc issues StandardAnalyzer has. But, this versioning/etc is all managed at StandardAnalyzer level (system properties, version numbers, etc)... when it also affects these other analyzers too.

Robert Muir added a comment - 19/Oct/09 19:23

The problem is that this is not very different from saying "the onus is on the user to call the setXYZ method to get back to the old buggy behavior", which at least last time we discussed back-compat was controversial (ie, it's a change to our drop-in back-compat policy).

Michael, yes I agree with you. What I am wondering is: is it really working in practice/in spirit? Forcing the user to supply the version, well it does make them look at the warning in the Version class, which is good. But nothing stops them from just using CURRENT.

Use this to get the latest & greatest settings, bug fixes, etc, for Lucene.

followed by the big bold warning about backwards compatibility. just curious what most users are doing, sacrificing drop-in for "latest and greatest?"

I do think we should do things to improve contrib analyzers that are still stuck with this buggy behavior at some point: i.e ~~LUCENE-1373~~.
But maybe we don't need the Version with contrib analyzers, since you should be able to use an older lucene-analyzers jar file with new lucene if you want the back compat????

(sorry to stray somewhat off-topic)

Robert Muir added a comment - 19/Oct/09 19:23 The problem is that this is not very different from saying "the onus is on the user to call the setXYZ method to get back to the old buggy behavior", which at least last time we discussed back-compat was controversial (ie, it's a change to our drop-in back-compat policy). Michael, yes I agree with you. What I am wondering is: is it really working in practice/in spirit? Forcing the user to supply the version, well it does make them look at the warning in the Version class, which is good. But nothing stops them from just using CURRENT. Use this to get the latest & greatest settings, bug fixes, etc, for Lucene. followed by the big bold warning about backwards compatibility. just curious what most users are doing, sacrificing drop-in for "latest and greatest?" I do think we should do things to improve contrib analyzers that are still stuck with this buggy behavior at some point: i.e LUCENE-1373 . But maybe we don't need the Version with contrib analyzers, since you should be able to use an older lucene-analyzers jar file with new lucene if you want the back compat???? (sorry to stray somewhat off-topic)

Uwe Schindler added a comment - 19/Oct/09 21:12

To move back to my other problem:
How to handle the problem with LUCENE_29 setting and the posIncr of stopwords together with QueryParser that has a default setting of ignoring posIncr?:

This leads to the problem, that a phrase query does not hit anything if you index with StandardAnalyzer=LUCENE_29 and QueryParser using the same analyzer but with setEnablePositionIncrements(false) [the current default for QueryParser].

Uwe Schindler added a comment - 19/Oct/09 21:12 To move back to my other problem: How to handle the problem with LUCENE_29 setting and the posIncr of stopwords together with QueryParser that has a default setting of ignoring posIncr?: This leads to the problem, that a phrase query does not hit anything if you index with StandardAnalyzer=LUCENE_29 and QueryParser using the same analyzer but with setEnablePositionIncrements(false) [the current default for QueryParser] .

Michael McCandless added a comment - 20/Oct/09 11:51

How to handle the problem with LUCENE_29 setting and the posIncr of stopwords together with QueryParser that has a default setting of ignoring posIncr?

How about adding required Version to QP ctor?

Michael McCandless added a comment - 20/Oct/09 11:51 How to handle the problem with LUCENE_29 setting and the posIncr of stopwords together with QueryParser that has a default setting of ignoring posIncr? How about adding required Version to QP ctor?

Uwe Schindler added a comment - 21/Oct/09 10:26

A new patch which resolves the Benchmark problem by adding a static method in NewAnalyzerTask that loads an analyzer by class name:

public static final Analyzer createAnalyzer(String className) throws Exception{
    final Class<? extends Analyzer> clazz = Class.forName(className).asSubclass(Analyzer.class);
    try {
      // first try to use a ctor with version parameter (needed for many new Analyzers that have no default one anymore
      Constructor<? extends Analyzer> cnstr = clazz.getConstructor(Version.class);
      return cnstr.newInstance(Version.LUCENE_CURRENT);
    } catch (NoSuchMethodException nsme) {
      // otherwise use default ctor
      return clazz.newInstance();
    }
}

This method is reused at other places where an Analyzer is created by a config property.

This patch now passes all test. There are still the problems with Analyzer and QueryParsr with wrong default properties, but I would like to commit this first and then solve the problems, also in 2.9.1.

Mike, are you OK with that?

Uwe Schindler added a comment - 21/Oct/09 10:26 A new patch which resolves the Benchmark problem by adding a static method in NewAnalyzerTask that loads an analyzer by class name: public static final Analyzer createAnalyzer( String className) throws Exception{ final Class <? extends Analyzer> clazz = Class .forName(className).asSubclass(Analyzer.class); try { // first try to use a ctor with version parameter (needed for many new Analyzers that have no default one anymore Constructor<? extends Analyzer> cnstr = clazz.getConstructor(Version.class); return cnstr.newInstance(Version.LUCENE_CURRENT); } catch (NoSuchMethodException nsme) { // otherwise use default ctor return clazz.newInstance(); } } This method is reused at other places where an Analyzer is created by a config property. This patch now passes all test. There are still the problems with Analyzer and QueryParsr with wrong default properties, but I would like to commit this first and then solve the problems, also in 2.9.1. Mike, are you OK with that?

Michael McCandless added a comment - 21/Oct/09 10:35

Mike, are you OK with that?

Looks great! Not only am I OK with it, it's exactly what I proposed (above – https://issues.apache.org/jira/browse/LUCENE-1987?focusedCommentId=12767449&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767449). Maybe you missed my response there? (I also suggested adding Version to QP ctor).

Michael McCandless added a comment - 21/Oct/09 10:35 Mike, are you OK with that? Looks great! Not only am I OK with it, it's exactly what I proposed (above – https://issues.apache.org/jira/browse/LUCENE-1987?focusedCommentId=12767449&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12767449 ). Maybe you missed my response there? (I also suggested adding Version to QP ctor).

Uwe Schindler added a comment - 21/Oct/09 11:21

I have seen your comment yesterday and implemented the benchmark thing that way.

The QP ctor with Version param also looks good, but we have to add this to 2.9, too, to be able to remove the no-arg ctor, too.

My patch still has a failed test int the ant task (missing no-arg ctor), will look into it, but fix is same like for benchmark.

Uwe Schindler added a comment - 21/Oct/09 11:21 I have seen your comment yesterday and implemented the benchmark thing that way. The QP ctor with Version param also looks good, but we have to add this to 2.9, too, to be able to remove the no-arg ctor, too. My patch still has a failed test int the ant task (missing no-arg ctor), will look into it, but fix is same like for benchmark.

Uwe Schindler added a comment - 21/Oct/09 11:33

Fix ant task.

Uwe Schindler added a comment - 21/Oct/09 11:33 Fix ant task.

Uwe Schindler added a comment - 21/Oct/09 12:13

Committed in 2.9, 3.0, backwards branch.

For the QueryParser problems and other additions of version constants I will open another issue.

Uwe Schindler added a comment - 21/Oct/09 12:13 Committed in 2.9, 3.0, backwards branch. For the QueryParser problems and other additions of version constants I will open another issue.

Michael McCandless added a comment - 07/Nov/09 14:57

Bulk close all 2.9.1 issues.

Michael McCandless added a comment - 07/Nov/09 14:57 Bulk close all 2.9.1 issues.

Tomoko Uchida added a comment - 28/Aug/22 12:11

This issue was moved to GitHub issue: #3062.

Tomoko Uchida added a comment - 28/Aug/22 12:11 This issue was moved to GitHub issue: #3062 .

People

Assignee:: Uwe Schindler

Reporter:: Uwe Schindler

Votes:: 0 Vote for this issue

Watchers:: 0 Start watching this issue

Dates

Created:: 16/Oct/09 21:33

Updated:: 28/Aug/22 12:11

Resolved:: 21/Oct/09 12:13

Lucene - Core

Details

Description

Attachments

Attachments

Issue Links

Activity

People

Dates