Lucene - Core
  1. Lucene - Core
  2. LUCENE-400

NGramFilter -- construct n-grams from a TokenStream

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 2.4
    • Component/s: modules/analysis
    • Labels:
      None
    • Environment:

      Operating System: All
      Platform: All

      Description

      This filter constructs n-grams (token combinations up to a fixed size, sometimes
      called "shingles") from a token stream.

      The filter sets start offsets, end offsets and position increments, so
      highlighting and phrase queries should work.

      Position increments > 1 in the input stream are replaced by filler tokens
      (tokens with termText "_" and endOffset - startOffset = 0) in the output
      n-grams. (Position increments > 1 in the input stream are usually caused by
      removing some tokens, eg. stopwords, from a stream.)

      The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache
      Commons-Collections.

      Filter, test case and an analyzer are attached.

      1. ASF.LICENSE.NOT.GRANTED--NGramAnalyzerWrapper.java
        2 kB
        Sebastian Kirsch
      2. ASF.LICENSE.NOT.GRANTED--NGramAnalyzerWrapperTest.java
        5 kB
        Sebastian Kirsch
      3. ASF.LICENSE.NOT.GRANTED--NGramFilter.java
        6 kB
        Sebastian Kirsch
      4. ASF.LICENSE.NOT.GRANTED--NGramFilterTest.java
        6 kB
        Sebastian Kirsch
      5. LUCENE-400.patch
        26 kB
        Steve Rowe

        Activity

        Sebastian Kirsch created issue -
        Jeff Turner made changes -
        Field Original Value New Value
        issue.field.bugzillaimportkey 35456 12314550
        Grant Ingersoll made changes -
        Resolution Won't Fix [ 2 ]
        Status Open [ 1 ] Closed [ 6 ]
        Assignee Lucene Developers [ java-dev@lucene.apache.org ]
        Grant Ingersoll made changes -
        Link This issue duplicates LUCENE-759 [ LUCENE-759 ]
        Grant Ingersoll made changes -
        Status Closed [ 6 ] Reopened [ 4 ]
        Resolution Won't Fix [ 2 ]
        Steve Rowe made changes -
        Attachment LUCENE-400.patch [ 12373074 ]
        Grant Ingersoll made changes -
        Fix Version/s 2.4 [ 12312681 ]
        Lucene Fields [Patch Available]
        Steve Rowe made changes -
        Link This issue duplicates LUCENE-759 [ LUCENE-759 ]
        Otis Gospodnetic made changes -
        Assignee Otis Gospodnetic [ otis ]
        Otis Gospodnetic made changes -
        Assignee Otis Gospodnetic [ otis ] Grant Ingersoll [ gsingers ]
        Grant Ingersoll made changes -
        Resolution Fixed [ 1 ]
        Status Reopened [ 4 ] Resolved [ 5 ]
        Michael McCandless made changes -
        Status Resolved [ 5 ] Closed [ 6 ]
        Mark Thomas made changes -
        Workflow jira [ 12324555 ] Default workflow, editable Closed status [ 12564484 ]
        Mark Thomas made changes -
        Workflow Default workflow, editable Closed status [ 12564484 ] jira [ 12585745 ]
        Steve Rowe made changes -
        Affects Version/s unspecified [ 12310280 ]

          People

          • Assignee:
            Grant Ingersoll
            Reporter:
            Sebastian Kirsch
          • Votes:
            5 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development