Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-8323

New ConcatenateFilter, a TokenFilter to concat/join tokens

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • None
    • modules/analysis
    • None
    • New

    Description

      Here I introduce the ConcatenateFilter (with Factory) to concatenate/join tokens with a provided separator to produce one final token. It's similar to FingerprintFilter but doesn't deduplicate or sort. It's useful for doing exact-ish search on short text (think names or titles) with simple analysis. At this task, its faster than a PhraseQuery equivalent, and solves the issue of matching completely and not a portion of the tokens. It's also useful for using Lucene to hold a dictionary of short names/phrases for entity-extraction (aka text tagging). The OpenSextant SolrTextTagger uses it for this purpose, which is where I'm taking it from.

      Attachments

        1. LUCENE-8323.patch
          17 kB
          David Smiley

        Issue Links

          Activity

            People

              dsmiley David Smiley
              dsmiley David Smiley
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: