Details
-
Improvement
-
Status: Closed
-
Major
-
Resolution: Fixed
-
None
-
None
Description
The edismax and some other query parsers treat pure whitespace queries as empty queries, but they use Java's String.trim() method to normalise queries. That method only treats characters 0-32 as whitespace. Other whitespace characters exist - such as U+3000 IDEOGRAPHIC SPACE - which bypass the test and lead to 400 Bad Request responses - see for example /solr/mycollection/select?q=%E3%80%80&defType=edismax vs /solr/mycollection/select?q=%20&defType=edismax. The first fails with the exception:
org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ... <TERM> ...
PR 1172 updates the dismax, edismax and rerank query parsers to use StringUtils.isWhitespace() which is aware of all whitespace characters.
Prior to the change, rerank behaves differently for U+3000 and U+0020 - with the change, both the below give the "mandatory parameter" message:
q=greetings&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80 - generic 400 Bad Request
q=greetings&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20 - 400 reporting "reRankQuery parameter is mandatory"
Attachments
Issue Links
- links to