Solr
  1. Solr
  2. SOLR-521

Allow StopFilterFactory to use StopFilter setEnablePositionIncrementsDefault function

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 1.3
    • Fix Version/s: 1.3
    • Component/s: None
    • Labels:
      None

      Description

      Lucene StopFilter has a function, setEnablePositionIncrementsDefault, that when set, "when a token is stopped (omitted), the position increment of the following token is incremented". Solr however have no setting in schema.xml to activate this

      1. stopfilter.patch
        1.0 kB
        Walter Ferrara
      2. stopfilter.patch
        1.0 kB
        Walter Ferrara

        Activity

        Hide
        Walter Ferrara added a comment -

        This patch add a boolean flag (enablePositionIncrements) for StopFilterTokenizer to use the setEnablePositionIncrementsDefault; after applying the patch you can modify your schema.xml as in:

        <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/>

        to use the new function. Default value is false. Notice that it acts on the static method StopFilter.setEnablePositionIncrementsDefault.

        Show
        Walter Ferrara added a comment - This patch add a boolean flag (enablePositionIncrements) for StopFilterTokenizer to use the setEnablePositionIncrementsDefault; after applying the patch you can modify your schema.xml as in: <filter class="solr.StopFilterFactory" ignoreCase="true" words="stopwords.txt" enablePositionIncrements="true"/> to use the new function. Default value is false. Notice that it acts on the static method StopFilter.setEnablePositionIncrementsDefault.
        Hide
        Hoss Man added a comment -

        it would probably be better to use StopFilter.setEnablePositionIncrements instead of the version that changes the global variable.

        Show
        Hoss Man added a comment - it would probably be better to use StopFilter.setEnablePositionIncrements instead of the version that changes the global variable.
        Hide
        Walter Ferrara added a comment -

        This version use the setEnablePositionIncrements on the newly created StopFilter object instead of the static method.

        Show
        Walter Ferrara added a comment - This version use the setEnablePositionIncrements on the newly created StopFilter object instead of the static method.
        Hide
        Hoss Man added a comment -

        looks good ... i've committed the patch as well as added it's use to schema.xml as i think it should be the "recommended" setting.

        Committed revision 648433.

        Leaving this issue open pending discussion: I think we should actually change the default to "true" (with a note in the CHANGES.txt section that people should set it to 'false" in existing schemas when upgrading).

        Any objections?

        Show
        Hoss Man added a comment - looks good ... i've committed the patch as well as added it's use to schema.xml as i think it should be the "recommended" setting. Committed revision 648433. Leaving this issue open pending discussion: I think we should actually change the default to "true" (with a note in the CHANGES.txt section that people should set it to 'false" in existing schemas when upgrading). Any objections?
        Hide
        Otis Gospodnetic added a comment -

        I think using enablePI=true would be a fine default.
        I think this issue can be marked Fixed now, Hoss.

        Show
        Otis Gospodnetic added a comment - I think using enablePI=true would be a fine default. I think this issue can be marked Fixed now, Hoss.
        Hide
        Hoss Man added a comment -

        I was going to change the default, and i'd even already written up the CHANGES.txt verbage to include in, when i noticed that it caused 2 tests to fail: on for DisMax and one in ConvertedLegacyTest.

        This wasn't a huge surprise, i figured the test were just expecting "broken" behavior, but when i looked at exact failures they were by no means obvious failures. In both cases doing "the right thing" had some subtle impacts on the matching/scoring of docs that made me realize changing the default is probably not in the best interests of existing users (if it caused problems like this in our simple unit tests, it could have some pretty serious impacts on real world cases)

        FWIW, here's the verbage i was going to add...

        A new "enablePositionIncrements" option has been added to the
        StopFilterFactory. The default value is "true", indicating that a
        "gap" should be left when a stop word is removed, which will affect
        how much slop is required in order for Phrase Queries to match. Users
        who wish to preserve previous behavior should add
        'enablePositionIncrements="false"' to usages of StopFilterFactory in
        their schema.xml. Other users should consider reindexing to ensure
        consistency in behavior for all documents.

        Show
        Hoss Man added a comment - I was going to change the default, and i'd even already written up the CHANGES.txt verbage to include in, when i noticed that it caused 2 tests to fail: on for DisMax and one in ConvertedLegacyTest. This wasn't a huge surprise, i figured the test were just expecting "broken" behavior, but when i looked at exact failures they were by no means obvious failures. In both cases doing "the right thing" had some subtle impacts on the matching/scoring of docs that made me realize changing the default is probably not in the best interests of existing users (if it caused problems like this in our simple unit tests, it could have some pretty serious impacts on real world cases) FWIW, here's the verbage i was going to add... A new "enablePositionIncrements" option has been added to the StopFilterFactory. The default value is "true", indicating that a "gap" should be left when a stop word is removed, which will affect how much slop is required in order for Phrase Queries to match. Users who wish to preserve previous behavior should add 'enablePositionIncrements="false"' to usages of StopFilterFactory in their schema.xml. Other users should consider reindexing to ensure consistency in behavior for all documents.

          People

          • Assignee:
            Hoss Man
            Reporter:
            Walter Ferrara
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development