Lucene - Core
  1. Lucene - Core
  2. LUCENE-4963

Deprecate broken TokenFilter constructors

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.4
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      We have some TokenFilters which are only broken with specific options. This includes:

      • TrimFilter when updateOffsets=true
      • StopFilter, JapanesePartOfSpeechStopFilter, KeepWordFilter, LengthFilter, TypeTokenFilter when enablePositionIncrements=false

      I think we should deprecate these behaviors in 4.4 and remove them in trunk.

      1. LUCENE-4963.patch
        127 kB
        Adrien Grand

        Issue Links

          Activity

          Hide
          Uwe Schindler added a comment -

          Hi,
          +1 to deprecate the constuctors taking those arguments and make sure the alternatives already implement the correct behaviour.
          Unfortunately we may not be able to completely remove that code in 5.x (we can remove the ctors, yes), because matchVersion sometimes implies the wrong behaviour. We nuked almost all of them, we should just check this.

          Show
          Uwe Schindler added a comment - Hi, +1 to deprecate the constuctors taking those arguments and make sure the alternatives already implement the correct behaviour. Unfortunately we may not be able to completely remove that code in 5.x (we can remove the ctors, yes), because matchVersion sometimes implies the wrong behaviour. We nuked almost all of them, we should just check this.
          Hide
          Adrien Grand added a comment -

          Thanks Uwe for the advice. Here is a first patch:

          • Deprecate constructors that expose broken options and make them throw an IllegalArgumentException when the lucene match version is >= 4.4
          • Remove the same constructors from TestRandomChains' exclusion list.
          • Since enablePositionIncrements=true was used by the Analyzing and Fuzzy suggesters to ignore position holes, I had to make it an option in the suggesters themselves instead of the token streams.
          • More documentation in the oal.analysis package: PositionLengthAttribute and guidelines on writing non-corrupt token streams.
          Show
          Adrien Grand added a comment - Thanks Uwe for the advice. Here is a first patch: Deprecate constructors that expose broken options and make them throw an IllegalArgumentException when the lucene match version is >= 4.4 Remove the same constructors from TestRandomChains' exclusion list. Since enablePositionIncrements=true was used by the Analyzing and Fuzzy suggesters to ignore position holes, I had to make it an option in the suggesters themselves instead of the token streams. More documentation in the oal.analysis package: PositionLengthAttribute and guidelines on writing non-corrupt token streams.
          Hide
          Uwe Schindler added a comment -

          Hi Adrien,
          thanks for doing the work! I think on the first look its fine! I will now verify the changes for each TokenFilter... The patch is for 4.x? In trunk you will remove the deprecations in a second step to enable merging without extra work?

          Show
          Uwe Schindler added a comment - Hi Adrien, thanks for doing the work! I think on the first look its fine! I will now verify the changes for each TokenFilter... The patch is for 4.x? In trunk you will remove the deprecations in a second step to enable merging without extra work?
          Hide
          Adrien Grand added a comment -

          Hi Uwe, thanks for doing the review! The patch applies to trunk and I plan to remove deprecations in a second step. Is it OK with you?

          Show
          Adrien Grand added a comment - Hi Uwe, thanks for doing the review! The patch applies to trunk and I plan to remove deprecations in a second step. Is it OK with you?
          Hide
          Uwe Schindler added a comment -

          Yeah, I would do it the same way. First deprecate and commit to both branches. Then remove stuff in trunk only.

          Show
          Uwe Schindler added a comment - Yeah, I would do it the same way. First deprecate and commit to both branches. Then remove stuff in trunk only.
          Hide
          Adrien Grand added a comment -

          I'll commit this soon unless someone objects.

          Show
          Adrien Grand added a comment - I'll commit this soon unless someone objects.
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] jpountz
          http://svn.apache.org/viewvc?view=revision&revision=1479148

          LUCENE-4963: Deprecate broken TokenFilter options.

          Show
          Commit Tag Bot added a comment - [trunk commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1479148 LUCENE-4963 : Deprecate broken TokenFilter options.
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] jpountz
          http://svn.apache.org/viewvc?view=revision&revision=1479151

          LUCENE-4963: Deprecate broken TokenFilter options (merged from r1479148).

          Show
          Commit Tag Bot added a comment - [branch_4x commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1479151 LUCENE-4963 : Deprecate broken TokenFilter options (merged from r1479148).
          Hide
          Commit Tag Bot added a comment -

          [trunk commit] jpountz
          http://svn.apache.org/viewvc?view=revision&revision=1479171

          LUCENE-4963: Completely remove deprecated options in 5.0.

          Show
          Commit Tag Bot added a comment - [trunk commit] jpountz http://svn.apache.org/viewvc?view=revision&revision=1479171 LUCENE-4963 : Completely remove deprecated options in 5.0.
          Hide
          Adrien Grand added a comment -

          Thank you Uwe!

          Show
          Adrien Grand added a comment - Thank you Uwe!
          Hide
          Steve Rowe added a comment -

          Bulk close resolved 4.4 issues

          Show
          Steve Rowe added a comment - Bulk close resolved 4.4 issues
          Hide
          Artem Lukanin added a comment -

          There should be an option in AnalyzingLookupFactory and FuzzyLookupFactory to setPreservePositionIncrements then.

          Show
          Artem Lukanin added a comment - There should be an option in AnalyzingLookupFactory and FuzzyLookupFactory to setPreservePositionIncrements then.
          Hide
          Artem Lukanin added a comment -

          There should be an option in AnalyzingLookupFactory and FuzzyLookupFactory to setPreservePositionIncrements then.

          Show
          Artem Lukanin added a comment - There should be an option in AnalyzingLookupFactory and FuzzyLookupFactory to setPreservePositionIncrements then.
          Show
          Alexander S. added a comment - Hi, how we're now supposed to fix this? http://stackoverflow.com/questions/18668376/solr-4-4-stopfilterfactory-and-enablepositionincrements
          Hide
          Alexander S. added a comment -

          I do index
          twitter.com/testuser
          then search for
          http://www.twitter.com/testuser

          These are in stopwords filter:
          http
          https
          www

          No results.

          Show
          Alexander S. added a comment - I do index twitter.com/testuser then search for http://www.twitter.com/testuser These are in stopwords filter: http https www No results.
          Hide
          Adrien Grand added a comment -

          The way to generate a query based on the token stream of the query string is the responsibility of the query parser. For example, QueryParserBase has the setEnablePositionIncrements method http://lucene.apache.org/core/4_4_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setEnablePositionIncrements(boolean) which can be used to ignore holes in the token stream (which seems to be what you are looking for).

          Show
          Adrien Grand added a comment - The way to generate a query based on the token stream of the query string is the responsibility of the query parser. For example, QueryParserBase has the setEnablePositionIncrements method http://lucene.apache.org/core/4_4_0/queryparser/org/apache/lucene/queryparser/classic/QueryParserBase.html#setEnablePositionIncrements(boolean ) which can be used to ignore holes in the token stream (which seems to be what you are looking for).

            People

            • Assignee:
              Adrien Grand
              Reporter:
              Adrien Grand
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development