Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-400

NGramFilter -- construct n-grams from a TokenStream

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4
    • Component/s: modules/analysis
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

    • Lucene Fields:
      Patch Available
    • Bugzilla Id:
      35456

      Description

      This filter constructs n-grams (token combinations up to a fixed size, sometimes
      called "shingles") from a token stream.

      The filter sets start offsets, end offsets and position increments, so
      highlighting and phrase queries should work.

      Position increments > 1 in the input stream are replaced by filler tokens
      (tokens with termText "_" and endOffset - startOffset = 0) in the output
      n-grams. (Position increments > 1 in the input stream are usually caused by
      removing some tokens, eg. stopwords, from a stream.)

      The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache
      Commons-Collections.

      Filter, test case and an analyzer are attached.

        Attachments

        1. ASF.LICENSE.NOT.GRANTED--NGramFilter.java
          6 kB
          Sebastian Kirsch
        2. ASF.LICENSE.NOT.GRANTED--NGramAnalyzerWrapper.java
          2 kB
          Sebastian Kirsch
        3. ASF.LICENSE.NOT.GRANTED--NGramFilterTest.java
          6 kB
          Sebastian Kirsch
        4. ASF.LICENSE.NOT.GRANTED--NGramAnalyzerWrapperTest.java
          5 kB
          Sebastian Kirsch
        5. LUCENE-400.patch
          26 kB
          Steve Rowe

          Activity

            People

            • Assignee:
              gsingers Grant Ingersoll
              Reporter:
              apache-bugzilla@sebastian-kirsch.org Sebastian Kirsch
            • Votes:
              5 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: