Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7824

Multi-word synonyms rule with common terms at the same position are buggy

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Resolved
    • Affects Version/s: 6.5.1, 7.0
    • Fix Version/s: 6.6, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      The automaton built from the graph token stream tries to pack common terms in multi word synonyms that appear at the same position. This means that some states inside a multi word synonym can have multiple transitions.
      As a result the intersection point of the graph are not computed correctly.

      For example the synonym rule: "ny, new york city, new york" is not applied correctly to the query "ny police".
      In this case "police" is detected as part of the multi synonyms path and we create the disjunction between:
      "ny police", "new york police", ...

      I pushed a patch that removes this optim (and creates a single transition from each state) in order to ensure that the intersection points of the graph always showed up at the end of the multi synonym paths.
      Matt Weber can you take a look ?

        Attachments

          Activity

            People

            • Assignee:
              Unassigned
              Reporter:
              jimczi Jim Ferenczi
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved: