[LUCENE-7472] MultiFieldQueryParser.getFieldQuery() drops queries that are neither BooleanQuery nor TermQuery - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Bug
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 6.2.2, 6.3, 7.0
Component/s: None
Labels:
None

Lucene Fields:

New, Patch Available

Description

From http://mail-archives.apache.org/mod_mbox/lucene-java-user/201609.mbox/%3c944985a6ac27425681bd27abe9d90602@ska-wn-e132.ptvag.ptv.de%3e, Oliver Kaleske reports:

Hi,

in updating Lucene from 6.1.0 to 6.2.0 I came across the following:

We have a subclass of MultiFieldQueryParser (MFQP) for creating a custom type of Query, which calls getFieldQuery() on its base class (MFQP).
For each of its search fields, this method has a Query created by calling getFieldQuery() on QueryParserBase.
Ultimately, we wind up in QueryBuilder's createFieldQuery() method, which depending on the number of tokens (etc.) decides what type of Query to return: a TermQuery, BooleanQuery, PhraseQuery, or MultiPhraseQuery.

Back in MFQP.getFieldQuery(), a variable maxTerms is determined depending on the type of Query returned: for a TermQuery or a BooleanQuery, its value will in general be nonzero, clauses are created, and a non-null Query is returned.
However, other Query subclasses result in maxTerms=0, an empty list of clauses, and finally null is returned.

To me, this seems like a bug, but I might as well be missing something. The comment "// happens for stopwords" on the return null statement, however, seems to suggest that Query types other than TermQuery and BooleanQuery were not considered properly here.
I should point out that our custom MFQP subclass so far does some rather unsophisticated tokenization before calling getFieldQuery() on each token, so characters like '*' may still slip through. So perhaps with proper tokenization, it is guaranteed that only TermQuery and BooleanQuery can come out of the chain of getFieldQuery() calls, and not handling (Multi)PhraseQuery in MFQP.getFieldQuery() can never cause trouble?

The code in MFQP.getFieldQuery dates back to
~~LUCENE-2605~~: Add classic QueryParser option setSplitOnWhitespace() to control whether to split on whitespace prior to text analysis. Default behavior remains unchanged: split-on-whitespace=true.
(06 Jul 2016), when it was substantially expanded.

Best regards,
Oliver

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-7472.patch
30/Sep/16 19:45
3 kB
Steven Rowe

Issue Links

is broken by

LUCENE-2605 queryparser parses on whitespace

Closed

Activity

People

Assignee:: Steven Rowe

Reporter:: Steven Rowe

Votes:: 0 Vote for this issue

Watchers:: 3 Start watching this issue

Dates

Created:: 30/Sep/16 19:42

Updated:: 28/Aug/22 15:03

Resolved:: 04/Oct/16 15:16