Issue Details (XML | Word | Printable)

Key: LUCENE-400
Type: Improvement Improvement
Status: Closed Closed
Resolution: Fixed
Priority: Minor Minor
Assignee: Grant Ingersoll
Reporter: Sebastian Kirsch
Votes: 5
Watchers: 3
Operations

If you were logged in you would be able to see more operations.
Lucene - Java

NGramFilter -- construct n-grams from a TokenStream

Created: 22/Jun/05 06:08 AM   Updated: 11/Oct/08 12:49 PM
Return to search
Component/s: Analysis
Affects Version/s: unspecified
Fix Version/s: 2.4

Time Tracking:
Not Specified

File Attachments:
  Size
Text File Licensed for inclusion in ASF works LUCENE-400.patch 2008-01-14 04:15 AM Steven Rowe 26 kB
Java Source File NGramAnalyzerWrapper.java 2005-06-22 06:10 AM Sebastian Kirsch 2 kB
Java Source File NGramAnalyzerWrapperTest.java 2005-07-29 09:56 PM Sebastian Kirsch 5 kB
Java Source File NGramFilter.java 2005-06-22 06:09 AM Sebastian Kirsch 6 kB
Java Source File NGramFilterTest.java 2005-06-22 06:12 AM Sebastian Kirsch 6 kB
Environment:
Operating System: All
Platform: All

Bugzilla Id: 35456
Lucene Fields: Patch Available
Resolution Date: 29/Mar/08 09:09 PM


 Description  « Hide
This filter constructs n-grams (token combinations up to a fixed size, sometimes
called "shingles") from a token stream.

The filter sets start offsets, end offsets and position increments, so
highlighting and phrase queries should work.

Position increments > 1 in the input stream are replaced by filler tokens
(tokens with termText "_" and endOffset - startOffset = 0) in the output
n-grams. (Position increments > 1 in the input stream are usually caused by
removing some tokens, eg. stopwords, from a stream.)

The filter uses CircularFifoBuffer and UnboundedFifoBuffer from Apache
Commons-Collections.

Filter, test case and an analyzer are attached.



 All   Comments   Work Log   Change History   Subversion Commits      Sort Order: Ascending order - Click to sort in descending order
No work has yet been logged on this issue.