Uploaded image for project: 'Solr'
  1. Solr
  2. SOLR-14189

Some whitespace characters bypass zero-length test in query parsers leading to 400 Bad Request

    XMLWordPrintableJSON

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: master (9.0), 8.5
    • Component/s: query parsers
    • Labels:
      None

      Description

      The edismax and some other query parsers treat pure whitespace queries as empty queries, but they use Java's String.trim() method to normalise queries. That method only treats characters 0-32 as whitespace. Other whitespace characters exist - such as U+3000 IDEOGRAPHIC SPACE - which bypass the test and lead to 400 Bad Request responses - see for example /solr/mycollection/select?q=%E3%80%80&defType=edismax vs /solr/mycollection/select?q=%20&defType=edismax. The first fails with the exception:

      org.apache.solr.search.SyntaxError: Cannot parse '': Encountered "<EOF>" at line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... <BAREOPER> ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... <REGEXPTERM> ... "[" ... "{" ... <LPARAMS> ... "filter(" ... <NUMBER> ... <TERM> ...
      

      PR 1172 updates the dismax, edismax and rerank query parsers to use StringUtils.isWhitespace() which is aware of all whitespace characters.

      Prior to the change, rerank behaves differently for U+3000 and U+0020 - with the change, both the below give the "mandatory parameter" message:

      q=greetings&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%E3%80%80 - generic 400 Bad Request

      q=greetings&rq={!rerank%20reRankQuery=$rqq%20reRankDocs=1000%20reRankWeight=3}&rqq=%20 - 400 reporting "reRankQuery parameter is mandatory"

        Attachments

          Issue Links

            Activity

              People

              • Assignee:
                uschindler Uwe Schindler
                Reporter:
                andywebb1975 Andy Webb
              • Votes:
                0 Vote for this issue
                Watchers:
                3 Start watching this issue

                Dates

                • Created:
                  Updated:
                  Resolved:

                  Time Tracking

                  Estimated:
                  Original Estimate - Not Specified
                  Not Specified
                  Remaining:
                  Remaining Estimate - 0h
                  0h
                  Logged:
                  Time Spent - 2h
                  2h