Lucene - Core
  1. Lucene - Core
  2. LUCENE-5180

ShingleFilter should make shingles from trailing holes

    Details

    • Type: Improvement Improvement
    • Status: Resolved
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.6, 6.0
    • Component/s: modules/analysis
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When ShingleFilter hits a hole, it uses _ as the token, e.g. bigrams for "the dog barked", if you have a StopFilter removing the, would be: "_ dog", "dog barked".

      But if the input ends with a stopword, e.g. "wizard of", ShingleFilter fails to produce "wizard _" due to LUCENE-3849 ... once we fix that I think we should fix ShingleFilter to make shingles for trailing holes too ...

      1. LUCENE-5180.patch
        13 kB
        Michael McCandless
      2. LUCENE-5180.patch
        10 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Michael McCandless added a comment -

          Patch; it turned out to be easier than I expected: I just tapped into the existing logic that ShingleFilter has for handling holes between tokens.

          Show
          Michael McCandless added a comment - Patch; it turned out to be easier than I expected: I just tapped into the existing logic that ShingleFilter has for handling holes between tokens.
          Hide
          Steve Rowe added a comment -

          +1, patch looks good.

          +1 to your suggestion about ShingleFilterTest.TestTokenStream:

          // TODO: merge w/ CannedTokenStream?

          Show
          Steve Rowe added a comment - +1, patch looks good. +1 to your suggestion about ShingleFilterTest.TestTokenStream: // TODO: merge w/ CannedTokenStream?
          Hide
          Michael McCandless added a comment -

          Thanks Steve!

          Here's a new patch w/ that TODO done ... I think it's ready.

          Show
          Michael McCandless added a comment - Thanks Steve! Here's a new patch w/ that TODO done ... I think it's ready.
          Hide
          ASF subversion and git services added a comment -

          Commit 1524117 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1524117 ]

          LUCENE-5180: ShingleFilter creates shingles from trailing holes

          Show
          ASF subversion and git services added a comment - Commit 1524117 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1524117 ] LUCENE-5180 : ShingleFilter creates shingles from trailing holes
          Hide
          ASF subversion and git services added a comment -

          Commit 1524120 from Michael McCandless in branch 'dev/branches/branch_4x'
          [ https://svn.apache.org/r1524120 ]

          LUCENE-5180: ShingleFilter creates shingles from trailing holes

          Show
          ASF subversion and git services added a comment - Commit 1524120 from Michael McCandless in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1524120 ] LUCENE-5180 : ShingleFilter creates shingles from trailing holes
          Hide
          ASF subversion and git services added a comment -

          Commit 1524122 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1524122 ]

          LUCENE-5180: move CHANGES entry

          Show
          ASF subversion and git services added a comment - Commit 1524122 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1524122 ] LUCENE-5180 : move CHANGES entry

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development