Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5180

ShingleFilter should make shingles from trailing holes

    XMLWordPrintableJSON

Details

    • Improvement
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 4.6, 6.0
    • modules/analysis
    • None
    • New

    Description

      When ShingleFilter hits a hole, it uses _ as the token, e.g. bigrams for "the dog barked", if you have a StopFilter removing the, would be: "_ dog", "dog barked".

      But if the input ends with a stopword, e.g. "wizard of", ShingleFilter fails to produce "wizard _" due to LUCENE-3849 ... once we fix that I think we should fix ShingleFilter to make shingles for trailing holes too ...

      Attachments

        1. LUCENE-5180.patch
          13 kB
          Michael McCandless
        2. LUCENE-5180.patch
          10 kB
          Michael McCandless

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: