Solr
  1. Solr
  2. SOLR-4333

edismax query with escaped colon ignores AND and OR

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 4.0
    • Fix Version/s: 4.4, 6.0
    • Component/s: query parsers
    • Labels:
      None
    • Environment:

      tomcat 7.34
      java 7u11
      windows 2008R2

      Description

      When I use the edismax query handler with qf=samenvatting and have a query of the form

      a\:b AND cde
      

      then the parsedquery ends up as:

      (+(DisjunctionMaxQuery((samenvatting:"a b")) DisjunctionMaxQuery((samenvatting:and)) DisjunctionMaxQuery((samenvatting:cde))))/no_coord
      

      note that the AND operator is ignored, and a search for the word AND is performed.
      As far as I've seen it doesn't matter if the part before the \: is a real field or not.

      1. SOLR-4333.patch
        2 kB
        James Dyer
      2. SOLR-4333.patch
        0.8 kB
        James Dyer
      3. SOLR-4333.patch
        0.8 kB
        James Dyer

        Activity

        Hide
        James Dyer added a comment -

        Here's a failing unit test. If your client escapes the colon, any boolean operators (AND|OR|NOT) after the escaped colon get interpreted as query operators. This leads to incorrect logic. If the default mm=100%, the query likely will return no results.

        Show
        James Dyer added a comment - Here's a failing unit test. If your client escapes the colon, any boolean operators (AND|OR|NOT) after the escaped colon get interpreted as query operators. This leads to incorrect logic. If the default mm=100%, the query likely will return no results.
        Hide
        James Dyer added a comment -

        Here is an improved test that shows off the problem a little better. This happens when you are using boolean syntax with edismax. Either the default query field or the singular field listed in "qf" need to be non-analyzed.

        Given:

        q=text_sw:(theos OR hasa\:colon OR theou)&qf=id

        Get back:

        <str name="parsedquery_toString">+(text_sw:theo (id:OR) (id:hasa:colon) (id:OR) (id:theou)))</str>
        

        It fails a little differently if you use an analyzed field in "qf" (or as default)...With:
        q=text_sw:(theos OR hasa\:colon OR theou)&qf=text

        Get back:

        <str name="parsedquery_toString">+(text_sw:theo (text:"hasa colon") (text:theou))</str>
        

        note the switch from "text_sw" to "text". In either case, removing the colon, or not escaping it resolves the issue. But it would be best if clients could send escaped colons like this so as to prevent users from using colons with real field names (but allowing application code to do so).

        Show
        James Dyer added a comment - Here is an improved test that shows off the problem a little better. This happens when you are using boolean syntax with edismax. Either the default query field or the singular field listed in "qf" need to be non-analyzed. Given: q=text_sw:(theos OR hasa\:colon OR theou)&qf=id Get back: <str name= "parsedquery_toString" > +(text_sw:theo (id:OR) (id:hasa:colon) (id:OR) (id:theou))) </str> It fails a little differently if you use an analyzed field in "qf" (or as default)...With: q=text_sw:(theos OR hasa\:colon OR theou)&qf=text Get back: <str name= "parsedquery_toString" > +(text_sw:theo (text: "hasa colon" ) (text:theou)) </str> note the switch from "text_sw" to "text". In either case, removing the colon, or not escaping it resolves the issue. But it would be best if clients could send escaped colons like this so as to prevent users from using colons with real field names (but allowing application code to do so).
        Hide
        James Dyer added a comment -

        Here's a patch with a fix.

        The "splitIntoClauses" logic escapes colons in a special after-the-fact bit of code, but this does not take into account that some colons might already have been escaped. This results in two backslashes followed by a colon which in turn produces a query parser exception. This causes edismax to go into full-escape mode resulting in the surprise described already here.

        The patch changes this to not escape a colon again if it has already been escaped.

        Show
        James Dyer added a comment - Here's a patch with a fix. The "splitIntoClauses" logic escapes colons in a special after-the-fact bit of code, but this does not take into account that some colons might already have been escaped. This results in two backslashes followed by a colon which in turn produces a query parser exception. This causes edismax to go into full-escape mode resulting in the surprise described already here. The patch changes this to not escape a colon again if it has already been escaped.
        Hide
        James Dyer added a comment -

        All tests pass with this patch. I will commit in a few days.

        Show
        James Dyer added a comment - All tests pass with this patch. I will commit in a few days.
        Hide
        Commit Tag Bot added a comment -

        [trunk commit] jdyer
        http://svn.apache.org/viewvc?view=revision&revision=1470577

        SOLR-4333: edismax to not double-escape colons if already escaped by client application

        Show
        Commit Tag Bot added a comment - [trunk commit] jdyer http://svn.apache.org/viewvc?view=revision&revision=1470577 SOLR-4333 : edismax to not double-escape colons if already escaped by client application
        Hide
        James Dyer added a comment -

        committed

        Trunk: r1470577
        Branch_4x: r1470578 (sorry! did not enter a svn comment!)

        Show
        James Dyer added a comment - committed Trunk: r1470577 Branch_4x: r1470578 (sorry! did not enter a svn comment!)
        Hide
        Steve Rowe added a comment -

        Bulk close resolved 4.4 issues

        Show
        Steve Rowe added a comment - Bulk close resolved 4.4 issues

          People

          • Assignee:
            James Dyer
            Reporter:
            Robert J. van der Boon
          • Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development