Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6664

Replace SynonymFilter with SynonymGraphFilter

    Details

    • Type: New Feature
    • Status: Resolved
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.4, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Spinoff from LUCENE-6582.

      I created a new SynonymGraphFilter (to replace the current buggy
      SynonymFilter), that produces correct graphs (does no "graph
      flattening" itself). I think this makes it simpler.

      This means you must add the FlattenGraphFilter yourself, if you are
      applying synonyms during indexing.

      Index-time syn expansion is a necessarily "lossy" graph transformation
      when multi-token (input or output) synonyms are applied, because the
      index does not store posLength, so there will always be phrase
      queries that should match but do not, and then phrase queries that
      should not match but do.
      http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
      goes into detail about this.

      However, with this new SynonymGraphFilter, if instead you do synonym
      expansion at query time (and don't do the flattening), and you use
      TermAutomatonQuery (future: somehow integrated into a query parser),
      or maybe just "enumerate all paths and make union of PhraseQuery", you
      should get 100% correct matches (not sure about "proper" scoring
      though...).

      This new syn filter still cannot consume an arbitrary graph.

        Attachments

        1. LUCENE-6664.patch
          71 kB
          Michael McCandless
        2. usa_flat.png
          32 kB
          Michael McCandless
        3. usa.png
          36 kB
          Michael McCandless
        4. LUCENE-6664.patch
          154 kB
          Michael McCandless
        5. LUCENE-6664.patch
          169 kB
          Michael McCandless
        6. LUCENE-6664.patch
          156 kB
          Michael McCandless
        7. LUCENE-6664.patch
          153 kB
          Michael McCandless

          Issue Links

            Activity

              People

              • Assignee:
                mikemccand Michael McCandless
                Reporter:
                mikemccand Michael McCandless
              • Votes:
                4 Vote for this issue
                Watchers:
                22 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved: