Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6638

Factor graph flattening out of SynonymFilter

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Duplicate
    • None
    • 6.0
    • None
    • None
    • New

    Description

      Spinoff from LUCENE-6582.

      SynonymFilter is very hairy, and has known nearly-impossible-to-fix bugs: it produces the wrong graph, both accepting too many phrases and not enough phrases, because it never creates new positions.

      This makes improvements like LUCENE-6582, to fix some of its bugs, unnecessarily hard.

      I'd like to pull out the graph flattening into its own token filter, and I think I have a starting patch that works. I started with the "sausagizer" stage on the branch from LUCENE-5012, but changed the approach so that it should not have so many adversarial cases.

      I think this should make SynonymFilter quite a bit simpler, hopefully to the point where we can just fix its bugs already.

      Attachments

        1. LUCENE-6638.patch
          17 kB
          Michael McCandless
        2. LUCENE-6638.patch
          17 kB
          Michael McCandless

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: