Lucene - Core
  1. Lucene - Core
  2. LUCENE-2002

Add oal.util.Version ctor to QueryParser

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.9, 3.0
    • Fix Version/s: 2.9.1
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is a followup of LUCENE-1987:

      If somebody uses StandardAnalyzer with Version.LUCENE_CURRENT and then uses QueryParser, phrase queries will not work, because the StopFilter enables position Increments for stop words, but QueryParser ignores them per default. The user has to explicitely enable them.

      This issue would add a ctor taking the Version constant and automatically enable this setting. The same applies to the contrib queryparser. Eventually also StopAnalyzer should add this version ctor.

      To be able to remove the default ctor for 3.0 (to remove a possible trap for users of QueryParser), it must be deprecated and the new one also added to 2.9.1.

      1. LUCENE-2002-29-wrongdefault.patch
        2 kB
        Uwe Schindler
      2. LUCENE-2002.patch
        269 kB
        Michael McCandless
      3. LUCENE-2002-29.patch
        100 kB
        Michael McCandless
      4. LUCENE-2002-29.patch
        93 kB
        Michael McCandless
      5. LUCENE-2002-29.patch
        21 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Bulk close all 2.9.1 issues.

          Show
          Michael McCandless added a comment - Bulk close all 2.9.1 issues.
          Hide
          Uwe Schindler added a comment -

          I reopened it -> I resolve it

          At revision: 830910

          Show
          Uwe Schindler added a comment - I reopened it -> I resolve it At revision: 830910
          Hide
          Michael McCandless added a comment -

          Patch looks good Uwe, thanks!

          Show
          Michael McCandless added a comment - Patch looks good Uwe, thanks!
          Hide
          Uwe Schindler added a comment -

          For Lucene 2.9, the deafult in StopFilter is not controlable, if you use the no-arg Analyzer ctor. There was the previous static setter in StopFilter which is now without any use.

          Attached patch fixes this, to return ENABLE_POSITION_INCREMENTS_DEFAULT in getEnablePositionIncrementsVersionDefault() for all versions<2.9

          Show
          Uwe Schindler added a comment - For Lucene 2.9, the deafult in StopFilter is not controlable, if you use the no-arg Analyzer ctor. There was the previous static setter in StopFilter which is now without any use. Attached patch fixes this, to return ENABLE_POSITION_INCREMENTS_DEFAULT in getEnablePositionIncrementsVersionDefault() for all versions<2.9
          Hide
          Michael McCandless added a comment -

          So the QueryParser ctors should also use the TEST_VERSION constant, so that it can be easily updated to do the testing for various versions.

          Good idea – I'll fix!

          Show
          Michael McCandless added a comment - So the QueryParser ctors should also use the TEST_VERSION constant, so that it can be easily updated to do the testing for various versions. Good idea – I'll fix!
          Hide
          Uwe Schindler added a comment -

          A mega patch, one thing:

          The highlighter testcase uses a separate constant for the version. My idea was to iterate over all version constants and run the test several times to test all combinations.

          So the QueryParser ctors should also use the TEST_VERSION constant, so that it can be easily updated to do the testing for various versions.

          Show
          Uwe Schindler added a comment - A mega patch, one thing: The highlighter testcase uses a separate constant for the version. My idea was to iterate over all version constants and run the test several times to test all combinations. So the QueryParser ctors should also use the TEST_VERSION constant, so that it can be easily updated to do the testing for various versions.
          Hide
          Michael McCandless added a comment -

          Patch for trunk; I plan to commit soon...

          Show
          Michael McCandless added a comment - Patch for trunk; I plan to commit soon...
          Hide
          Michael McCandless added a comment -

          Actually to port to trunk I was going svn merge, remove conflicts, remove merged in but deprecated methods, then fix all resulting compilation/test errors. I'll try to do this myself... wish me luck I'm only using emacs over here!!

          Show
          Michael McCandless added a comment - Actually to port to trunk I was going svn merge, remove conflicts, remove merged in but deprecated methods, then fix all resulting compilation/test errors. I'll try to do this myself... wish me luck I'm only using emacs over here!!
          Hide
          Uwe Schindler added a comment -

          I know this problem of trunk. But the first step is to merge the changes in and then remove the deprecated parts again. Then fixing of tests, which may be many as QueryParser is used almost everywhere. Maybe we can split contrib/core.

          Show
          Uwe Schindler added a comment - I know this problem of trunk. But the first step is to merge the changes in and then remove the deprecated parts again. Then fixing of tests, which may be many as QueryParser is used almost everywhere. Maybe we can split contrib/core.
          Hide
          Michael McCandless added a comment -

          I am happy to then use the merge operations in SVN tru apply this in trunk!

          Are you offering to port this to trunk? That'd be nice I wasn't looking forward to that part!

          Note that on trunk it'll be fairly difference since we'll remove (not deprecate) the old methods, and, we have to go fix all usage of the deprecated APIs (eg in tests, contrib) which I haven't done (which I haven't done for 2.9).

          Show
          Michael McCandless added a comment - I am happy to then use the merge operations in SVN tru apply this in trunk! Are you offering to port this to trunk? That'd be nice I wasn't looking forward to that part! Note that on trunk it'll be fairly difference since we'll remove (not deprecate) the old methods, and, we have to go fix all usage of the deprecated APIs (eg in tests, contrib) which I haven't done (which I haven't done for 2.9).
          Hide
          Michael McCandless added a comment -

          New patch, adding Version to StopAnalyzer as well.

          Thanks for reviewing – I'll commit in a bit!

          Show
          Michael McCandless added a comment - New patch, adding Version to StopAnalyzer as well. Thanks for reviewing – I'll commit in a bit!
          Hide
          Uwe Schindler added a comment -

          I am happy to then use the merge operations in SVN tru apply this in trunk!

          +1 from my side!

          Show
          Uwe Schindler added a comment - I am happy to then use the merge operations in SVN tru apply this in trunk! +1 from my side!
          Hide
          Grant Ingersoll added a comment -

          +1 on this patch.

          Show
          Grant Ingersoll added a comment - +1 on this patch.
          Hide
          Uwe Schindler added a comment -

          They only appear with native patch. All higher-level svn related patch impls "know" this. I think bekause oth this they were removed in trunk.

          Show
          Uwe Schindler added a comment - They only appear with native patch. All higher-level svn related patch impls "know" this. I think bekause oth this they were removed in trunk.
          Hide
          Grant Ingersoll added a comment -

          Yes, they are near the $Id tags. That's kind of an odd error, though. At any rate, I'm running the tests now.

          Show
          Grant Ingersoll added a comment - Yes, they are near the $Id tags. That's kind of an odd error, though. At any rate, I'm running the tests now.
          Hide
          Michael McCandless added a comment -

          I'm getting errors applying this, but they look like they are just in the javadoc comments..

          Are the patch errors near the $Id$ tags? (Which we've removed from trunk, for this reason).

          I would add it to StopAnalyzer

          OK I'll add.

          Show
          Michael McCandless added a comment - I'm getting errors applying this, but they look like they are just in the javadoc comments.. Are the patch errors near the $Id$ tags? (Which we've removed from trunk, for this reason). I would add it to StopAnalyzer OK I'll add.
          Hide
          Grant Ingersoll added a comment -

          I'm getting errors applying this, but they look like they are just in the javadoc comments..

          Show
          Grant Ingersoll added a comment - I'm getting errors applying this, but they look like they are just in the javadoc comments..
          Hide
          Uwe Schindler added a comment -

          Looks good.

          I didn't add Version to StopFilter nor StopAnalyzer; I think it's better to up-front require the "enablePositionIncrements" to their ctors.

          I would add it to StopAnalyzer, StopFilter is not so important (because low-level). But that's my opinion.

          Show
          Uwe Schindler added a comment - Looks good. I didn't add Version to StopFilter nor StopAnalyzer; I think it's better to up-front require the "enablePositionIncrements" to their ctors. I would add it to StopAnalyzer, StopFilter is not so important (because low-level). But that's my opinion.
          Hide
          Michael McCandless added a comment -

          New patch attached. All tests pass. Changes:

          • Fixed the "patch" -> "match" typo
          • Fixed build.xml to make 2 autogen'd (by JavaCC) public QueryParser
            ctors protected, and added unit test to assert this
          • Added Version matchVersion param to all (I think!) contrib
            analyzers that instantiate either StandardTokenizer (to manage
            changing the "fix invalid acronym" setting across versions), or
            StopFilter (to manage "enable pos incr" setting across versions),
            or, both, and threaded it down to StandardTokenizer & StopFilter

          I didn't add Version to StopFilter nor StopAnalyzer; I think it's
          better to up-front require the "enablePositionIncrements" to their
          ctors.

          Show
          Michael McCandless added a comment - New patch attached. All tests pass. Changes: Fixed the "patch" -> "match" typo Fixed build.xml to make 2 autogen'd (by JavaCC) public QueryParser ctors protected, and added unit test to assert this Added Version matchVersion param to all (I think!) contrib analyzers that instantiate either StandardTokenizer (to manage changing the "fix invalid acronym" setting across versions), or StopFilter (to manage "enable pos incr" setting across versions), or, both, and threaded it down to StandardTokenizer & StopFilter I didn't add Version to StopFilter nor StopAnalyzer; I think it's better to up-front require the "enablePositionIncrements" to their ctors.
          Hide
          Michael McCandless added a comment -

          I think we are good: I just looked @ 1.6.3's javadocs (we specify ant 1.6.3 in BUILD.txt) and it's got the replaceregexp task.

          Show
          Michael McCandless added a comment - I think we are good: I just looked @ 1.6.3's javadocs (we specify ant 1.6.3 in BUILD.txt) and it's got the replaceregexp task.
          Hide
          Uwe Schindler added a comment -

          Cool. Did you check the minimum ANT version needed for this? If the current BUILD.txt minimum does not fit, we shoudl update the build, docs. My problem: I didn't found the minimum version for replaceregexp in the docs.

          Show
          Uwe Schindler added a comment - Cool. Did you check the minimum ANT version needed for this? If the current BUILD.txt minimum does not fit, we shoudl update the build, docs. My problem: I didn't found the minimum version for replaceregexp in the docs.
          Hide
          Michael McCandless added a comment -

          Maybe the search-replace with regex functionality can do it.

          Excellent! That worked like a charm. I'll still leave the unit test in place to catch us if this fails...

          Show
          Michael McCandless added a comment - Maybe the search-replace with regex functionality can do it. Excellent! That worked like a charm. I'll still leave the unit test in place to catch us if this fails...
          Hide
          Uwe Schindler added a comment - - edited

          Eric Hatcher

          Maybe the search-replace with regex functionality can do it.
          see: http://ant.apache.org/manual/OptionalTasks/replaceregexp.html

          Show
          Uwe Schindler added a comment - - edited Eric Hatcher Maybe the search-replace with regex functionality can do it. see: http://ant.apache.org/manual/OptionalTasks/replaceregexp.html
          Hide
          Michael McCandless added a comment -

          We may want to see if it can be automated in the ANT task so that we don't have to remember to do it by hand each time.

          That would be fabulous but is way beyond my ant skills Any ant pros out there want to try?

          Show
          Michael McCandless added a comment - We may want to see if it can be automated in the ANT task so that we don't have to remember to do it by hand each time. That would be fabulous but is way beyond my ant skills Any ant pros out there want to try?
          Hide
          Grant Ingersoll added a comment -

          OK I'll take that approach, and I guess make a unit test that peeks & confirms these methods are still protected (to catch us in the future).

          We may want to see if it can be automated in the ANT task so that we don't have to remember to do it by hand each time.

          Show
          Grant Ingersoll added a comment - OK I'll take that approach, and I guess make a unit test that peeks & confirms these methods are still protected (to catch us in the future). We may want to see if it can be automated in the ANT task so that we don't have to remember to do it by hand each time.
          Hide
          Michael McCandless added a comment -

          Michael, what about LUCENE-1258?

          Oh yeah, and look who opened that one I'll go resolve as a dup of this one.

          Show
          Michael McCandless added a comment - Michael, what about LUCENE-1258 ? Oh yeah, and look who opened that one I'll go resolve as a dup of this one.
          Hide
          Robert Muir added a comment -

          Ahh yes indeed. Is there a corresponding issue about not being able to control stop filter pos incr? Can't keep track of all these issues anymore!

          Michael, what about LUCENE-1258?

          Show
          Robert Muir added a comment - Ahh yes indeed. Is there a corresponding issue about not being able to control stop filter pos incr? Can't keep track of all these issues anymore! Michael, what about LUCENE-1258 ?
          Hide
          Michael McCandless added a comment -

          Michael, if you do this, can you mark LUCENE-1373 as resolved?

          Ahh yes indeed. Is there a corresponding issue about not being able to control stop filter pos incr? Can't keep track of all these issues anymore!

          Show
          Michael McCandless added a comment - Michael, if you do this, can you mark LUCENE-1373 as resolved? Ahh yes indeed. Is there a corresponding issue about not being able to control stop filter pos incr? Can't keep track of all these issues anymore!
          Hide
          Robert Muir added a comment -

          Or, with this issue I could add Version to all contrib analyzers that embed StopFilter. I think I like that solution better (we shouldn't be using static defaults). I'll go forward w/ that shortly unless any objections come up... this'd also take care of analyzers that use StandardTokenizer (ie, we'll control fixing the acronym bug with Version as well).

          Michael, if you do this, can you mark LUCENE-1373 as resolved?

          Show
          Robert Muir added a comment - Or, with this issue I could add Version to all contrib analyzers that embed StopFilter. I think I like that solution better (we shouldn't be using static defaults). I'll go forward w/ that shortly unless any objections come up... this'd also take care of analyzers that use StandardTokenizer (ie, we'll control fixing the acronym bug with Version as well). Michael, if you do this, can you mark LUCENE-1373 as resolved?
          Hide
          Michael McCandless added a comment -

          Many analyzers can use a stopfilter and one of the stopfilters params is to enable or disable this setting.

          In fact, I think we may have to un-deprecate StopFilter.get/setEnablePositionIncrementsDefault for this reason? Many analyzers do embed StopFilter without exposing control over this setting, and so the only way (up to & including 2.9) to change the setting is to set the static default with StopFilter. If we remove that then we've taken that control away.

          Or, with this issue I could add Version to all contrib analyzers that embed StopFilter. I think I like that solution better (we shouldn't be using static defaults). I'll go forward w/ that shortly unless any objections come up... this'd also take care of analyzers that use StandardTokenizer (ie, we'll control fixing the acronym bug with Version as well).

          Show
          Michael McCandless added a comment - Many analyzers can use a stopfilter and one of the stopfilters params is to enable or disable this setting. In fact, I think we may have to un-deprecate StopFilter.get/setEnablePositionIncrementsDefault for this reason? Many analyzers do embed StopFilter without exposing control over this setting, and so the only way (up to & including 2.9) to change the setting is to set the static default with StopFilter. If we remove that then we've taken that control away. Or, with this issue I could add Version to all contrib analyzers that embed StopFilter. I think I like that solution better (we shouldn't be using static defaults). I'll go forward w/ that shortly unless any objections come up... this'd also take care of analyzers that use StandardTokenizer (ie, we'll control fixing the acronym bug with Version as well).
          Hide
          Michael McCandless added a comment -

          Thus, I think, unfortunately, the answer just might be to edit the generated Java file by hand and make them be protected.

          OK I'll take that approach, and I guess make a unit test that peeks & confirms these methods are still protected (to catch us in the future).

          Show
          Michael McCandless added a comment - Thus, I think, unfortunately, the answer just might be to edit the generated Java file by hand and make them be protected. OK I'll take that approach, and I guess make a unit test that peeks & confirms these methods are still protected (to catch us in the future).
          Hide
          Grant Ingersoll added a comment -

          Unfortunately, JavaCC generates two public ctors for QueryParser (one taking
          CharStream, another taking QueryParserTokenManager) that I don't know
          how to override to take a Version param.

          Those two constructors are bad anyway b/c if anyone calls them, it won't set the Analyzer, etc. Thus, I think, unfortunately, the answer just might be to edit the generated Java file by hand and make them be protected. I've looked through the JavaCC docs and I don't see any other way. Of course, the big down side to this is we now need to do this going forward.

          Show
          Grant Ingersoll added a comment - Unfortunately, JavaCC generates two public ctors for QueryParser (one taking CharStream, another taking QueryParserTokenManager) that I don't know how to override to take a Version param. Those two constructors are bad anyway b/c if anyone calls them, it won't set the Analyzer, etc. Thus, I think, unfortunately, the answer just might be to edit the generated Java file by hand and make them be protected. I've looked through the JavaCC docs and I don't see any other way. Of course, the big down side to this is we now need to do this going forward.
          Hide
          Mark Miller added a comment -

          I think we need more doc as well - stopfilter is not just tied to standardanalyzer - standardanalyzer just happens to use it. Many analyzers can use a stopfilter and one of the stopfilters params is to enable or disable this setting.

          Show
          Mark Miller added a comment - I think we need more doc as well - stopfilter is not just tied to standardanalyzer - standardanalyzer just happens to use it. Many analyzers can use a stopfilter and one of the stopfilters params is to enable or disable this setting.
          Hide
          Michael McCandless added a comment -

          Eek! My fingers are doing the thinking, apparently Been typing that word a bit too much!! I'll fix. Thanks.

          Show
          Michael McCandless added a comment - Eek! My fingers are doing the thinking, apparently Been typing that word a bit too much!! I'll fix. Thanks.
          Hide
          Robert Muir added a comment -

          Mike, saw a couple of these and laughed a little

          @param matchVersion Lucene version to patch; this is passed through to QueryParser.

          Show
          Robert Muir added a comment - Mike, saw a couple of these and laughed a little @param matchVersion Lucene version to patch ; this is passed through to QueryParser.
          Hide
          Michael McCandless added a comment -

          Attached patch, for 2.9..x

          I added required Version param to QueryParser, MultiFieldQueryParser
          and ComplexPhraseQueryParser (contrib), which enable position
          increments when matchVersion >= LUCENE_19.

          For the deprecated ctors it defaults to Version.LUCENE_24 for back
          compat.

          Unfortunately, JavaCC generates two public ctors for QueryParser (one taking
          CharStream, another taking QueryParserTokenManager) that I don't know
          how to override to take a Version param.

          Show
          Michael McCandless added a comment - Attached patch, for 2.9..x I added required Version param to QueryParser, MultiFieldQueryParser and ComplexPhraseQueryParser (contrib), which enable position increments when matchVersion >= LUCENE_19. For the deprecated ctors it defaults to Version.LUCENE_24 for back compat. Unfortunately, JavaCC generates two public ctors for QueryParser (one taking CharStream, another taking QueryParserTokenManager) that I don't know how to override to take a Version param.
          Hide
          Uwe Schindler added a comment -

          Issue created!

          Show
          Uwe Schindler added a comment - Issue created!
          Hide
          Michael McCandless added a comment -

          Add another issue?

          +1!

          Show
          Michael McCandless added a comment - Add another issue? +1!
          Hide
          Uwe Schindler added a comment -

          During 1987, I also found a bug in Highlighter, which is also not able to handle the posIncr of stopwords correctly. Add another issue?

          Show
          Uwe Schindler added a comment - During 1987, I also found a bug in Highlighter, which is also not able to handle the posIncr of stopwords correctly. Add another issue?
          Hide
          Uwe Schindler added a comment -

          Take it! I haven't started.

          Show
          Uwe Schindler added a comment - Take it! I haven't started.
          Hide
          Michael McCandless added a comment -

          Uwe I can take this if you want? Have you started?

          Show
          Michael McCandless added a comment - Uwe I can take this if you want? Have you started?

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Uwe Schindler
            • Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development