Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5180

ShingleFilter should make shingles from trailing holes

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.6, 6.0
    • modules/analysis
    • None
    • New

    Description

      When ShingleFilter hits a hole, it uses _ as the token, e.g. bigrams for "the dog barked", if you have a StopFilter removing the, would be: "_ dog", "dog barked".

      But if the input ends with a stopword, e.g. "wizard of", ShingleFilter fails to produce "wizard _" due to LUCENE-3849 ... once we fix that I think we should fix ShingleFilter to make shingles for trailing holes too ...

      Attachments

        1. LUCENE-5180.patch
          13 kB
          Michael McCandless
        2. LUCENE-5180.patch
          10 kB
          Michael McCandless

        Issue Links

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment