Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4991

QueryParser doesnt handle synonyms correctly for chinese

Attach filesAttach ScreenshotVotersWatch issueWatchersCreate sub-taskLinkCloneUpdate Comment AuthorReplace String in CommentUpdate Comment VisibilityDelete Comments
    XMLWordPrintableJSON

Details

    • Bug
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 4.3.1, 6.0
    • modules/queryparser
    • None
    • New

    Description

      As reported multiple times on the user list:
      http://find.searchhub.org/document/eaf0e88a6a0d4d1f
      http://find.searchhub.org/document/abf28043c52b6efc
      http://find.searchhub.org/document/1313794632c90826

      The logic here is not forming the right query structures and ignoring positionIncrementAttribute from the tokenStream.

      • when default operator is AND, you can see it more clearly, as synonyms are wrongly inserted as additional MUST terms:
        expected:<+field:中 +(field:国 field:國)>
        but was:<+field:中 +field:国 +field:國>
      • even when default operator is OR, its still wrong, because we ignore posInc and this means coord computation is not correct (so scoring is wrong)

      This also screws up scoring and queries for decompounding too (because they go thru this exact situation if they add the original compound as a synonym).

      Attachments

        Activity

          This comment will be Viewable by All Users Viewable by All Users
          Cancel

          People

            Unassigned Unassigned
            rcmuir Robert Muir
            Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

              Created:
              Updated:
              Resolved:

              Slack

                Issue deployment