Lucene - Core
  1. Lucene - Core
  2. LUCENE-4991

QueryParser doesnt handle synonyms correctly for chinese

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.3.1, 6.0
    • Component/s: modules/queryparser
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      As reported multiple times on the user list:
      http://find.searchhub.org/document/eaf0e88a6a0d4d1f
      http://find.searchhub.org/document/abf28043c52b6efc
      http://find.searchhub.org/document/1313794632c90826

      The logic here is not forming the right query structures and ignoring positionIncrementAttribute from the tokenStream.

      • when default operator is AND, you can see it more clearly, as synonyms are wrongly inserted as additional MUST terms:
        expected:<+field:中 +(field:国 field:國)>
        but was:<+field:中 +field:国 +field:國>
      • even when default operator is OR, its still wrong, because we ignore posInc and this means coord computation is not correct (so scoring is wrong)

      This also screws up scoring and queries for decompounding too (because they go thru this exact situation if they add the original compound as a synonym).

        Activity

        Hide
        Robert Muir added a comment -

        Here's a patch. I broke out the current logic into two cases (more code, but simpler).

        Also added a lot of tests (most of the tests i added actually pass today, its to ensure we aren't breaking other things).

        I fixed classic QP only because its not obvious to me how to fix the flexible QP. Ideally in the future we'd factor these tests back into QueryParserTestBase

        Show
        Robert Muir added a comment - Here's a patch. I broke out the current logic into two cases (more code, but simpler). Also added a lot of tests (most of the tests i added actually pass today, its to ensure we aren't breaking other things). I fixed classic QP only because its not obvious to me how to fix the flexible QP. Ideally in the future we'd factor these tests back into QueryParserTestBase
        Hide
        Michael McCandless added a comment -

        +1

        Show
        Michael McCandless added a comment - +1
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] rmuir
        http://svn.apache.org/viewvc?view=revision&revision=1481100

        LUCENE-4991: QueryParser doesnt handle synonyms correctly for chinese

        Show
        Commit Tag Bot added a comment - [trunk commit] rmuir http://svn.apache.org/viewvc?view=revision&revision=1481100 LUCENE-4991 : QueryParser doesnt handle synonyms correctly for chinese
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] rmuir
        http://svn.apache.org/viewvc?view=revision&revision=1481116

        LUCENE-4991: QueryParser doesnt handle synonyms correctly for chinese

        Show
        Commit Tag Bot added a comment - [branch_4x commit] rmuir http://svn.apache.org/viewvc?view=revision&revision=1481116 LUCENE-4991 : QueryParser doesnt handle synonyms correctly for chinese
        Hide
        Steve Rowe added a comment -

        If there are no objections, I'd like to backport this to 4.3.1.

        Show
        Steve Rowe added a comment - If there are no objections, I'd like to backport this to 4.3.1.
        Hide
        Shalin Shekhar Mangar added a comment -

        Back ported to 4.3.1 r1483364

        Show
        Shalin Shekhar Mangar added a comment - Back ported to 4.3.1 r1483364
        Hide
        Artem Lukanin added a comment -

        I guess, this should fix my issue SOLR-4533 as well? I have to check it out.

        Show
        Artem Lukanin added a comment - I guess, this should fix my issue SOLR-4533 as well? I have to check it out.
        Hide
        Shalin Shekhar Mangar added a comment -

        Bulk closing after 4.3.1 release

        Show
        Shalin Shekhar Mangar added a comment - Bulk closing after 4.3.1 release

          People

          • Assignee:
            Unassigned
            Reporter:
            Robert Muir
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development