Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-5012

Make graph-based TokenFilters easier

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • modules/analysis
    • None
    • New

    Description

      SynonymFilter has two limitations today:

      • It cannot create positions, so eg dns -> domain name service
        creates blatantly wrong highlights (SOLR-3390, LUCENE-4499 and
        others).
      • It cannot consume a graph, so e.g. if you try to apply synonyms
        after Kuromoji tokenizer I'm not sure what will happen.

      I've thought about how to fix these issues but it's really quite
      difficult with the current PosInc/PosLen graph representation, so I'd
      like to explore an alternative approach.

      Attachments

        1. LUCENE-5012.patch
          79 kB
          Matt Weber
        2. LUCENE-5012.patch
          79 kB
          Michael McCandless

        Activity

          People

            mikemccand Michael McCandless
            mikemccand Michael McCandless

            Dates

              Created:
              Updated:

              Slack

                Issue deployment