Solr
  1. Solr
  2. SOLR-4480

EDisMax parser blows up with query containing single plus or minus

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.2, 6.0
    • Component/s: query parsers
    • Labels:
      None

      Description

      We are running solr with sunspot and when we set up a query containing a single plus, Solr blows up with the following error:
      SOLR Request (5.0ms) [ path=#<RSolr::Client:0x4c7464ac> parameters={data: fq=type%3A%28Attachment+OR+User+OR+GpdbDataSource+OR+HadoopInstance+OR+GnipInstance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29&fq=type_name_s%3A%28Attachment+OR+User+OR+Instance+OR+Workspace+OR+Workfile+OR+Tag+OR+Dataset+OR+HdfsEntry%29&fq=%28security_type_name_sm%3A%28Dataset%29+AND+-instance_account_ids_im%3A%282+OR+1%29%29&fq=%28security_type_name_sm%3AChorusView+AND+member_ids_im%3A1+AND+-public_b%3Atrue%29&fq=%28security_type_name_sm%3A%28Dataset%29+AND+instance_account_ids_im%3A%282+OR+1%29%29&fq=%28security_type_name_sm%3AChorusView+AND+-member_ids_im%3A1+AND+-public_b%3Atrue%29&q=%2B&fl=%2A+score&qf=name_texts+first_name_texts+last_name_texts+file_name_texts&defType=edismax&hl=on&hl.simple.pre=%40%40%40hl%40%40%40&hl.simple.post=%40%40%40endhl%40%40%40&start=0&rows=3, method: post, params: {:wt=>:ruby}, query: wt=ruby, headers:

      {"Content-Type"=>"application/x-www-form-urlencoded; charset=UTF-8"}

      , path: select, uri: http://localhost:8982/solr/select?wt=ruby, open_timeout: , read_timeout: } ]

      RSolr::Error::Http (RSolr::Error::Http - 400 Bad Request
      Error: org.apache.lucene.queryParser.ParseException: Cannot parse '': Encountered "<EOF>" at line 1, column 0.
      Was expecting one of:
      <NOT> ...
      "+" ...
      "-" ...
      "(" ...
      "*" ...
      <QUOTED> ...
      <TERM> ...
      <PREFIXTERM> ...
      <WILDTERM> ...

      1. SOLR-4480.patch
        5 kB
        Yonik Seeley
      2. SOLR-4480.patch
        4 kB
        Jan Høydahl
      3. SOLR-4480.patch
        4 kB
        Jan Høydahl

        Activity

        Hide
        Jan Høydahl added a comment -

        Thanks for reporting this. As EDismax is all about being robust and never crash, this must be fixed.

        Show
        Jan Høydahl added a comment - Thanks for reporting this. As EDismax is all about being robust and never crash, this must be fixed.
        Hide
        Jan Høydahl added a comment -

        What should be the logical behavior of a single "+" or "-"? If eDisMax discovers it as one of many words, it is treated as whitespace (I think by mistake since it is caught by the MUST[+]/NOT[-] logic).

        So the least surprising behavior would perhaps be to interpret it the same way - leadning to an empty "q" and fallback to "q.alt".

        Another option would be to detect and escape with "\" to match literally. However for most eDisMax uses you would not expect or want this. It would also be inconsistent with behavior with multi term search. If you like literal match you can always escape yourself.

        Show
        Jan Høydahl added a comment - What should be the logical behavior of a single "+" or "-"? If eDisMax discovers it as one of many words, it is treated as whitespace (I think by mistake since it is caught by the MUST [+] /NOT [-] logic). So the least surprising behavior would perhaps be to interpret it the same way - leadning to an empty "q" and fallback to "q.alt". Another option would be to detect and escape with "\" to match literally. However for most eDisMax uses you would not expect or want this. It would also be inconsistent with behavior with multi term search. If you like literal match you can always escape yourself.
        Hide
        Fiona Tay added a comment -

        > So the least surprising behavior would perhaps be to interpret it the same way - leadning to an empty "q" and fallback to "q.alt".
        This seems like a reasonable solution.

        Show
        Fiona Tay added a comment - > So the least surprising behavior would perhaps be to interpret it the same way - leadning to an empty "q" and fallback to "q.alt". This seems like a reasonable solution.
        Hide
        Jan Høydahl added a comment -

        Patch against trunk which treats single + and single - as empty q

        Show
        Jan Høydahl added a comment - Patch against trunk which treats single + and single - as empty q
        Hide
        Jan Høydahl added a comment -

        Updated patch, also catching exceptions in the case where the +/- comes after a leading whitespace.

        What do people think about the solution? Plan to commit soon.

        Show
        Jan Høydahl added a comment - Updated patch, also catching exceptions in the case where the +/- comes after a leading whitespace. What do people think about the solution? Plan to commit soon.
        Hide
        Yonik Seeley added a comment -

        What should be the logical behavior of a single "+" or "-"? If eDisMax discovers it as one of many words, it is treated as whitespace

        It shouldn't be, and a quick test shows that it is treated as a literal.
        Are you forgetting to URL-encode the "+" when trying it from a browser, or perhaps the analysis of the default field is removing the character because it's not alphanumeric?

        Try this:
        http://localhost:8983/solr/select?debug=query&defType=edismax&df=foo_s&q=hello+%2b+there

        Show
        Yonik Seeley added a comment - What should be the logical behavior of a single "+" or "-"? If eDisMax discovers it as one of many words, it is treated as whitespace It shouldn't be, and a quick test shows that it is treated as a literal. Are you forgetting to URL-encode the "+" when trying it from a browser, or perhaps the analysis of the default field is removing the character because it's not alphanumeric? Try this: http://localhost:8983/solr/select?debug=query&defType=edismax&df=foo_s&q=hello+%2b+there
        Hide
        Jan Høydahl added a comment - - edited

        So let's take the String field example. A single %2B crashes the Lucene query parser, and since we just pass it straight through it crashes eDisMax too.

        For the Lucene parser, it crashes for all query strings ending in a single "+"
        http://localhost:8983/solr/select?debug=query&q=foo%20%2B
        but not for queries where there is a whitespace after the "+"
        http://localhost:8983/solr/select?debug=query&q=%2B%20foo

        eDismax is a bit different. It does not crash on ending "+" but it swallows it:
        http://localhost:8983/solr/select?debug=query&defType=edismax&df=foo_s&q=%2B%20hello%20%2B

        This is probably due to line 700-703 being too quick at guessing that the + or - means MUST or NOT

              if (ch=='+' || ch=='-') {
                clause.must = ch;
                pos++;
              }
        

        I'm ok with saying that a single plus or minus should mean literal matching (given that field type supports it), and thus we add escaping. But then we should do the same at the end of a query string.

        Show
        Jan Høydahl added a comment - - edited So let's take the String field example. A single %2B crashes the Lucene query parser, and since we just pass it straight through it crashes eDisMax too. For the Lucene parser, it crashes for all query strings ending in a single "+" http://localhost:8983/solr/select?debug=query&q=foo%20%2B but not for queries where there is a whitespace after the "+" http://localhost:8983/solr/select?debug=query&q=%2B%20foo eDismax is a bit different. It does not crash on ending "+" but it swallows it: http://localhost:8983/solr/select?debug=query&defType=edismax&df=foo_s&q=%2B%20hello%20%2B This is probably due to line 700-703 being too quick at guessing that the + or - means MUST or NOT if (ch=='+' || ch=='-') { clause.must = ch; pos++; } I'm ok with saying that a single plus or minus should mean literal matching (given that field type supports it), and thus we add escaping. But then we should do the same at the end of a query string.
        Hide
        Yonik Seeley added a comment -

        I'm ok with saying that a single plus or minus should mean literal matching (given that field type supports it), and thus we add escaping. But then we should do the same at the end of a query string.

        Correct.

        Show
        Yonik Seeley added a comment - I'm ok with saying that a single plus or minus should mean literal matching (given that field type supports it), and thus we add escaping. But then we should do the same at the end of a query string. Correct.
        Hide
        Fiona Tay added a comment -

        I'm the person who originally reported the bug, and I just want to commend y'all for the speedy response! You guys are amazing!

        Show
        Fiona Tay added a comment - I'm the person who originally reported the bug, and I just want to commend y'all for the speedy response! You guys are amazing!
        Hide
        Yonik Seeley added a comment -

        I started off trying to modify the parser grammar, but it quickly went down a rat hole...

        Instead, here's a patch that lets the JavaCC parser fail, and makes the escaping better for edismax.

        Show
        Yonik Seeley added a comment - I started off trying to modify the parser grammar, but it quickly went down a rat hole... Instead, here's a patch that lets the JavaCC parser fail, and makes the escaping better for edismax.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] Yonik Seeley
        http://svn.apache.org/viewvc?view=revision&revision=1450369

        SOLR-4480: edismax - fix trailing +-

        Show
        Commit Tag Bot added a comment - [trunk commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1450369 SOLR-4480 : edismax - fix trailing +-
        Hide
        Commit Tag Bot added a comment -

        [branch_4x commit] Yonik Seeley
        http://svn.apache.org/viewvc?view=revision&revision=1450371

        SOLR-4480: edismax - fix trailing +-

        Show
        Commit Tag Bot added a comment - [branch_4x commit] Yonik Seeley http://svn.apache.org/viewvc?view=revision&revision=1450371 SOLR-4480 : edismax - fix trailing +-
        Hide
        Uwe Schindler added a comment -

        Closed after release.

        Show
        Uwe Schindler added a comment - Closed after release.

          People

          • Assignee:
            Unassigned
            Reporter:
            Fiona Tay
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development