Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.5, 6.0
    • Component/s: modules/spellchecker
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      This is like StopFilter, except if the token is the very last token
      and there were no non-token characters after it, it keeps the token.

      This is useful with analyzing suggesters (AnalyzingSuggester,
      AnalyzingInfixSuggester, FuzzySuggester), where you often want to
      remove stop words, but not if it's the last word and the user hasn't
      finished typing it.

      E.g. "fast a" might complete to "fast amoeba", but if you simply use
      StopFilter then the a is removed.

      Really our analysis APIs aren't quite designed to handle a "partial"
      tokens that suggesters need to work with.

      1. LUCENE-5165.patch
        31 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          mikemccand Michael McCandless added a comment -

          Patch, I think it's ready... it [sneakily] calls end() from its
          incrementToken and then looks at the final endOffset to decide whether
          to filter the stopword or not.

          I've pushed it to http://jirasearch.mikemccandless.com and now "fail
          if byte" gets the right suggestion (before it got no suggestions,
          because I was previously keeping stop words at lookup time to
          workaround the issue).

          Show
          mikemccand Michael McCandless added a comment - Patch, I think it's ready... it [sneakily] calls end() from its incrementToken and then looks at the final endOffset to decide whether to filter the stopword or not. I've pushed it to http://jirasearch.mikemccandless.com and now "fail if byte" gets the right suggestion (before it got no suggestions, because I was previously keeping stop words at lookup time to workaround the issue).
          Hide
          rcmuir Robert Muir added a comment -

          This looks good, i like the BaseTokenStreamTestCase improvements especially.

          Show
          rcmuir Robert Muir added a comment - This looks good, i like the BaseTokenStreamTestCase improvements especially.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1513940 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1513940 ]

          LUCENE-5165: add SuggestStopFilter

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1513940 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1513940 ] LUCENE-5165 : add SuggestStopFilter
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1513942 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1513942 ]

          LUCENE-5165: add SuggestStopFilter

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1513942 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1513942 ] LUCENE-5165 : add SuggestStopFilter
          Hide
          jpountz Adrien Grand added a comment -

          4.5 release -> bulk close

          Show
          jpountz Adrien Grand added a comment - 4.5 release -> bulk close

            People

            • Assignee:
              mikemccand Michael McCandless
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development