Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7472

MultiFieldQueryParser.getFieldQuery() drops queries that are neither BooleanQuery nor TermQuery

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.3, 6.2.2, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      From http://mail-archives.apache.org/mod_mbox/lucene-java-user/201609.mbox/%3c944985a6ac27425681bd27abe9d90602@ska-wn-e132.ptvag.ptv.de%3e, Oliver Kaleske reports:

      Hi,

      in updating Lucene from 6.1.0 to 6.2.0 I came across the following:

      We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom type of Query, which calls getFieldQuery() on its base class (MFQP).
      For each of its search fields, this method has a Query created by calling getFieldQuery() on QueryParserBase.
      Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which depending on the number of tokens (etc.) decides what type of Query to return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.

      Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending on the type of Query returned: for a TermQuery or a BooleanQuery, its value will in general be nonzero, clauses are created, and a non-null Query is returned.
      However, other Query subclasses result in maxTerms=0, an empty list of clauses, and finally null is returned.

      To me, this seems like a bug, but I might as well be missing something. The comment "// happens for stopwords" on the return null statement, however, seems to suggest that Query types other than TermQuery and BooleanQuery were not considered properly here.
      I should point out that our custom MFQP subclass so far does some rather unsophisticated tokenization before calling getFieldQuery() on each token, so characters like '*' may still slip through. So perhaps with proper tokenization, it is guaranteed that only TermQuery and BooleanQuery can come out of the chain of getFieldQuery() calls, and not handling (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?

      The code in MFQP.getFieldQuery dates back to
      LUCENE-2605: Add classic QueryParser option setSplitOnWhitespace() to control whether to split on whitespace prior to text analysis. Default behavior remains unchanged: split-on-whitespace=true.
      (06 Jul 2016), when it was substantially expanded.

      Best regards,
      Oliver

        Issue Links

          Activity

          Hide
          shalinmangar Shalin Shekhar Mangar added a comment -

          Closing after 6.3.0 release.

          Show
          shalinmangar Shalin Shekhar Mangar added a comment - Closing after 6.3.0 release.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit f9e915b3dac62b101ae7b4be343dbf918ccd0389 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f9e915b ]

          LUCENE-7472: remove unused import

          Show
          jira-bot ASF subversion and git services added a comment - Commit f9e915b3dac62b101ae7b4be343dbf918ccd0389 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=f9e915b ] LUCENE-7472 : remove unused import
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4e7c6141a2afaff454cfc364dd02c8abb838c218 in lucene-solr's branch refs/heads/branch_6_2 from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4e7c614 ]

          LUCENE-7472: remove unused import

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4e7c6141a2afaff454cfc364dd02c8abb838c218 in lucene-solr's branch refs/heads/branch_6_2 from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4e7c614 ] LUCENE-7472 : remove unused import
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 09e03c47c2c1842cbbd2b35bb698248737ba330d in lucene-solr's branch refs/heads/branch_6x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=09e03c4 ]

          LUCENE-7472: remove unused import

          Show
          jira-bot ASF subversion and git services added a comment - Commit 09e03c47c2c1842cbbd2b35bb698248737ba330d in lucene-solr's branch refs/heads/branch_6x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=09e03c4 ] LUCENE-7472 : remove unused import
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 64ed2b6492f9d9218ab26550127c5c206f3e25b1 in lucene-solr's branch refs/heads/branch_6_2 from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=64ed2b6 ]

          LUCENE-7472: remove unused import

          Show
          jira-bot ASF subversion and git services added a comment - Commit 64ed2b6492f9d9218ab26550127c5c206f3e25b1 in lucene-solr's branch refs/heads/branch_6_2 from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=64ed2b6 ] LUCENE-7472 : remove unused import
          Hide
          steve_rowe Steve Rowe added a comment -

          Pushed to master, branch_6x and branch_6_2, with slightly different testing on master versus the other two branches, since the default split-on-whitespace query parser option, which affects multi-term synonyms used in the added test, will change on master/7.0.

          On java-user mailing list, Oliver Kaleske reported:

          I locally applied the patch on branch_6_2 (because that is closest to my current 6.2.1 dependency) and built Lucene from there.
          Using the outcome in my application, the problem observed there is fixed.

          Show
          steve_rowe Steve Rowe added a comment - Pushed to master, branch_6x and branch_6_2, with slightly different testing on master versus the other two branches, since the default split-on-whitespace query parser option, which affects multi-term synonyms used in the added test, will change on master/7.0. On java-user mailing list, Oliver Kaleske reported: I locally applied the patch on branch_6_2 (because that is closest to my current 6.2.1 dependency) and built Lucene from there. Using the outcome in my application, the problem observed there is fixed.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 6739e075b4c1dedab3b49b1d299cd713135c1ec3 in lucene-solr's branch refs/heads/master from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6739e07 ]

          LUCENE-7472: MultiFieldQueryParser.getFieldQuery() drops queries that are neither BooleanQuery nor TermQuery.

          Show
          jira-bot ASF subversion and git services added a comment - Commit 6739e075b4c1dedab3b49b1d299cd713135c1ec3 in lucene-solr's branch refs/heads/master from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=6739e07 ] LUCENE-7472 : MultiFieldQueryParser.getFieldQuery() drops queries that are neither BooleanQuery nor TermQuery.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1963b1701d2c331daa452ae6d16fc754c3e84bc4 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1963b17 ]

          LUCENE-7472: switch TestMultiFieldQueryParser.testSynonyms default split-on-whitespace to true (it's false on master/7.0)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1963b1701d2c331daa452ae6d16fc754c3e84bc4 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1963b17 ] LUCENE-7472 : switch TestMultiFieldQueryParser.testSynonyms default split-on-whitespace to true (it's false on master/7.0)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c398949d3788676bbf8b3f1ae7e819f851d20767 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c398949 ]

          LUCENE-7472: MultiFieldQueryParser.getFieldQuery() drops queries that are neither BooleanQuery nor TermQuery.

          Show
          jira-bot ASF subversion and git services added a comment - Commit c398949d3788676bbf8b3f1ae7e819f851d20767 in lucene-solr's branch refs/heads/branch_6x from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c398949 ] LUCENE-7472 : MultiFieldQueryParser.getFieldQuery() drops queries that are neither BooleanQuery nor TermQuery.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 12e7384b35a92a366e74af5fd4aed4f555ffd2da in lucene-solr's branch refs/heads/branch_6_2 from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12e7384 ]

          LUCENE-7472: switch TestMultiFieldQueryParser.testSynonyms default split-on-whitespace to true (it's false on master/7.0)

          Show
          jira-bot ASF subversion and git services added a comment - Commit 12e7384b35a92a366e74af5fd4aed4f555ffd2da in lucene-solr's branch refs/heads/branch_6_2 from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=12e7384 ] LUCENE-7472 : switch TestMultiFieldQueryParser.testSynonyms default split-on-whitespace to true (it's false on master/7.0)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 4ecc9d8eeac781ecb5f141491057a57226f61c6a in lucene-solr's branch refs/heads/branch_6_2 from Steve Rowe
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ecc9d8 ]

          LUCENE-7472: move CHANGES.txt entry under 6.2.2 section

          Show
          jira-bot ASF subversion and git services added a comment - Commit 4ecc9d8eeac781ecb5f141491057a57226f61c6a in lucene-solr's branch refs/heads/branch_6_2 from Steve Rowe [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=4ecc9d8 ] LUCENE-7472 : move CHANGES.txt entry under 6.2.2 section
          Hide
          jpountz Adrien Grand added a comment -

          +1

          Show
          jpountz Adrien Grand added a comment - +1
          Hide
          steve_rowe Steve Rowe added a comment -

          Patch with a fix that treats all non-BooleanQuery queries opaquely (like TermQuery), and adds a test for the SynonymQuery case that fails without the patch and succeeds with it.

          Show
          steve_rowe Steve Rowe added a comment - Patch with a fix that treats all non-BooleanQuery queries opaquely (like TermQuery), and adds a test for the SynonymQuery case that fails without the patch and succeeds with it.

            People

            • Assignee:
              steve_rowe Steve Rowe
              Reporter:
              steve_rowe Steve Rowe
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development