Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6664

Replace SynonymFilter with SynonymGraphFilter

    XMLWordPrintableJSON

Details

    • New Feature
    • Status: Resolved
    • Major
    • Resolution: Fixed
    • None
    • 6.4, 7.0
    • None
    • None
    • New

    Description

      Spinoff from LUCENE-6582.

      I created a new SynonymGraphFilter (to replace the current buggy
      SynonymFilter), that produces correct graphs (does no "graph
      flattening" itself). I think this makes it simpler.

      This means you must add the FlattenGraphFilter yourself, if you are
      applying synonyms during indexing.

      Index-time syn expansion is a necessarily "lossy" graph transformation
      when multi-token (input or output) synonyms are applied, because the
      index does not store posLength, so there will always be phrase
      queries that should match but do not, and then phrase queries that
      should not match but do.
      http://blog.mikemccandless.com/2012/04/lucenes-tokenstreams-are-actually.html
      goes into detail about this.

      However, with this new SynonymGraphFilter, if instead you do synonym
      expansion at query time (and don't do the flattening), and you use
      TermAutomatonQuery (future: somehow integrated into a query parser),
      or maybe just "enumerate all paths and make union of PhraseQuery", you
      should get 100% correct matches (not sure about "proper" scoring
      though...).

      This new syn filter still cannot consume an arbitrary graph.

      Attachments

        1. LUCENE-6664.patch
          153 kB
          Michael McCandless
        2. LUCENE-6664.patch
          156 kB
          Michael McCandless
        3. LUCENE-6664.patch
          169 kB
          Michael McCandless
        4. LUCENE-6664.patch
          154 kB
          Michael McCandless
        5. usa.png
          36 kB
          Michael McCandless
        6. usa_flat.png
          32 kB
          Michael McCandless
        7. LUCENE-6664.patch
          71 kB
          Michael McCandless

        Issue Links

          Activity

            People

              mikemccand Michael McCandless
              mikemccand Michael McCandless
              Votes:
              4 Vote for this issue
              Watchers:
              22 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: