Solr
  1. Solr
  2. SOLR-874

Dismax parser exceptions on trailing OPERATOR

    Details

    • Type: Bug Bug
    • Status: Open
    • Priority: Major Major
    • Resolution: Unresolved
    • Affects Version/s: 1.3
    • Fix Version/s: 5.0
    • Component/s: query parsers
    • Labels:
      None

      Description

      Dismax is supposed to be immune to parse exceptions, but alas it's not:

      http://localhost:8983/solr/select?defType=dismax&qf=name&q=ipod+AND

      kaboom!

      Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse 'ipod AND': Encountered "<EOF>" at line 1, column 8.
      Was expecting one of:
      <NOT> ...
      "+" ...
      "-" ...
      "(" ...
      "*" ...
      <QUOTED> ...
      <TERM> ...
      <PREFIXTERM> ...
      <WILDTERM> ...
      "[" ...
      "{" ...
      <NUMBER> ...
      <TERM> ...
      "*" ...

      at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:175)
      at org.apache.solr.search.DismaxQParser.parse(DisMaxQParserPlugin.java:138)
      at org.apache.solr.search.QParser.getQuery(QParser.java:88)

      1. SOLR-874-1.4.1.patch
        14 kB
        Johannes
      2. SOLR-874-1.3.patch
        10 kB
        Chris Darroch
      3. SOLR-874.patch
        0.8 kB
        Peter Wolanin

        Issue Links

          Activity

          Hide
          Mark Miller added a comment -

          Support for AND and OR escaping needed - only I hate to see a scan for AND and OR on every term for every query just to support this...but to quote Erik: "dismax is not to generate a parse error", so I guess it can't be helped?

          My real dream would be to get those darn unprecedent working AND and OR oddities out of Lucene syntax...

          Show
          Mark Miller added a comment - Support for AND and OR escaping needed - only I hate to see a scan for AND and OR on every term for every query just to support this...but to quote Erik: "dismax is not to generate a parse error", so I guess it can't be helped? My real dream would be to get those darn unprecedent working AND and OR oddities out of Lucene syntax...
          Hide
          Peter Wolanin added a comment -

          I get the same sort of exception with a leading operator and the dismax handler.

          Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log
          SEVERE: org.apache.solr.common.SolrException:
          org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR vti OR aut OR author OR dll': Encountered " <OR> "OR "" at line
          1, column 0.
          Was expecting one of:
          <NOT> ...
          "+" ...
          "-" ...
          "(" ...
          "*" ...
          <QUOTED> ...
          <TERM> ...
          <PREFIXTERM> ...
          <WILDTERM> ...
          "[" ...
          "{" ...
          <NUMBER> ...
          <TERM> ...
          "*" ...

          at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110)
          at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)

          Show
          Peter Wolanin added a comment - I get the same sort of exception with a leading operator and the dismax handler. Jul 13, 2009 1:47:06 PM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException: org.apache.lucene.queryParser.ParseException: Cannot parse 'OR vti OR bin OR vti OR aut OR author OR dll': Encountered " <OR> "OR "" at line 1, column 0. Was expecting one of: <NOT> ... "+" ... "-" ... "(" ... "*" ... <QUOTED> ... <TERM> ... <PREFIXTERM> ... <WILDTERM> ... "[" ... "{" ... <NUMBER> ... <TERM> ... "*" ... at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:110) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          Hide
          Peter Wolanin added a comment -

          possibly a fix could be rolled into this existing method in SolrPluginUtils.java ?

            /**
             * Strips operators that are used illegally, otherwise reuturns it's
             * input.  Some examples of illegal user queries are: "chocolate +-
             * chip", "chocolate - - chip", and "chocolate chip -".
             */
            public static CharSequence stripIllegalOperators(CharSequence s) {
              String temp = CONSECUTIVE_OP_PATTERN.matcher( s ).replaceAll( " " );
              return DANGLING_OP_PATTERN.matcher( temp ).replaceAll( "" );
            }
          

          This seems only to be called from:

          org/apache/solr/search/DisMaxQParser.java:156: userQuery = SolrPluginUtils.stripIllegalOperators(userQuery).toString();

          Show
          Peter Wolanin added a comment - possibly a fix could be rolled into this existing method in SolrPluginUtils.java ? /** * Strips operators that are used illegally, otherwise reuturns it's * input. Some examples of illegal user queries are: "chocolate +- * chip ", " chocolate - - chip ", and " chocolate chip -". */ public static CharSequence stripIllegalOperators(CharSequence s) { String temp = CONSECUTIVE_OP_PATTERN.matcher( s ).replaceAll( " " ); return DANGLING_OP_PATTERN.matcher( temp ).replaceAll( "" ); } This seems only to be called from: org/apache/solr/search/DisMaxQParser.java:156: userQuery = SolrPluginUtils.stripIllegalOperators(userQuery).toString();
          Hide
          Peter Wolanin added a comment -

          Here's a simple patch that escapes with a \. It prevents the exception, however, this fails to match and/or/not (after removing those from the stopwords file) so it's clearly not quite right.

          Show
          Peter Wolanin added a comment - Here's a simple patch that escapes with a \. It prevents the exception, however, this fails to match and/or/not (after removing those from the stopwords file) so it's clearly not quite right.
          Hide
          Michael Haag added a comment -

          Peter, thanks for keeping our support group in the loop on this
          issue. Just to make sure I understand: your patch below would work
          ok for Acquia hosted search since our dismax handler config doesn't
          make use of boolean expressions anyway. Correct?

          -m

          On Jul 14, 2009, at 5:27 PM, Peter Wolanin (JIRA) wrote:

          [ https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel
          ]

          Peter Wolanin updated SOLR-874:
          -------------------------------

          Attachment: SOLR-874.patch

          Here's a simple patch that escapes with a \. It prevents the
          exception, however, this fails to match and/or/not (after removing
          those from the stopwords file) so it's clearly not quite right.


          This message is automatically generated by JIRA.
          -
          You can reply to this email to add a comment to the issue online.

          Show
          Michael Haag added a comment - Peter, thanks for keeping our support group in the loop on this issue. Just to make sure I understand: your patch below would work ok for Acquia hosted search since our dismax handler config doesn't make use of boolean expressions anyway. Correct? -m On Jul 14, 2009, at 5:27 PM, Peter Wolanin (JIRA) wrote: [ https://issues.apache.org/jira/browse/SOLR-874?page=com.atlassian.jira.plugin.system.issuetabpanels:all-tabpanel ] Peter Wolanin updated SOLR-874 : ------------------------------- Attachment: SOLR-874 .patch Here's a simple patch that escapes with a \. It prevents the exception, however, this fails to match and/or/not (after removing those from the stopwords file) so it's clearly not quite right. – This message is automatically generated by JIRA. - You can reply to this email to add a comment to the issue online.
          Hide
          Mark Miller added a comment -

          There is also a problem with && and ||

          Show
          Mark Miller added a comment - There is also a problem with && and ||
          Hide
          Peter Wolanin added a comment -

          Anyone have an approach for this bug so we can get it fixed before 1.4 is done?

          Show
          Peter Wolanin added a comment - Anyone have an approach for this bug so we can get it fixed before 1.4 is done?
          Hide
          Jake Brownell added a comment -

          I've also observed dismax blow up if the query starts with more than a single dash, e.g. --john grisham. It doesn't appear to mind multiple leading dashes elsewhere in the query string.

          Show
          Jake Brownell added a comment - I've also observed dismax blow up if the query starts with more than a single dash, e.g. --john grisham. It doesn't appear to mind multiple leading dashes elsewhere in the query string.
          Hide
          Chris Darroch added a comment -

          Hi, I'm one of the httpd devs but I thought I'd throw in this patch for Solr 1.3 (I'll try to make one for trunk later) which handles a number of the issues raised in this report for us.

          First, & and | are escaped, and the dismax logic is changed a little so that if the various query-munging methods return a blank string, we fall back to using the configured default query.

          Next, consecutive + or - chars are flattened to a single char; this handles cases where a user might accidentally type --foo when they just mean -foo.

          Strings of mixed + and - chars are removed, since we have no way of knowing the user's intent without something like +-foo or similar.

          Together these two steps handle one of the reported cases where the query starts with multiple + or - operators.

          Any remaining + or - chars which trail the last term, or which have whitespace on their right side, are removed. Our users found it puzzling in the extreme that a search on "questions 1 - 10" explicitly excluded results with "10" in them, because "- 10" is treated as -10. So we just remove any + or - operators which aren't right up against the following term.

          Finally, we escape AND, OR, and NOT when they appear outside of quotes, and remove any trailing unmatched quote. This changes the previous behaviour which removes all quotes if they aren't perfectly balanced; we felt this was more in line with what users expect if they mistype and enter an extra quote char.

          So far I haven't been able to generate any Lucene query parser exceptions with this code, but it doesn't mean it's perfect, obviously – there may still be some way to slip an invalid Lucene query past it. But I'm cautiously optimistic that it covers all or most of the issues raised so far in the thread.

          Show
          Chris Darroch added a comment - Hi, I'm one of the httpd devs but I thought I'd throw in this patch for Solr 1.3 (I'll try to make one for trunk later) which handles a number of the issues raised in this report for us. First, & and | are escaped, and the dismax logic is changed a little so that if the various query-munging methods return a blank string, we fall back to using the configured default query. Next, consecutive + or - chars are flattened to a single char; this handles cases where a user might accidentally type --foo when they just mean -foo. Strings of mixed + and - chars are removed, since we have no way of knowing the user's intent without something like +-foo or similar. Together these two steps handle one of the reported cases where the query starts with multiple + or - operators. Any remaining + or - chars which trail the last term, or which have whitespace on their right side, are removed. Our users found it puzzling in the extreme that a search on "questions 1 - 10" explicitly excluded results with "10" in them, because "- 10" is treated as -10. So we just remove any + or - operators which aren't right up against the following term. Finally, we escape AND, OR, and NOT when they appear outside of quotes, and remove any trailing unmatched quote. This changes the previous behaviour which removes all quotes if they aren't perfectly balanced; we felt this was more in line with what users expect if they mistype and enter an extra quote char. So far I haven't been able to generate any Lucene query parser exceptions with this code, but it doesn't mean it's perfect, obviously – there may still be some way to slip an invalid Lucene query past it. But I'm cautiously optimistic that it covers all or most of the issues raised so far in the thread.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Geoffrey Young added a comment -

          I stumbled on this bug while researching something else, but we've hit the "trailing AND" condition as well...

          I just want to add the following use case for this fix:

          Portland, OR

          whatever fix is implemented properly account for and handle cases where the trailing operator isn't an operator at all

          Show
          Geoffrey Young added a comment - I stumbled on this bug while researching something else, but we've hit the "trailing AND" condition as well... I just want to add the following use case for this fix: Portland, OR whatever fix is implemented properly account for and handle cases where the trailing operator isn't an operator at all
          Hide
          Peter Wolanin added a comment -

          It seems that a bare double quote mark also causes an exception.

          Show
          Peter Wolanin added a comment - It seems that a bare double quote mark also causes an exception.
          Hide
          Johannes added a comment -

          This is a version of the patch that works against the 1.4.1 branch.
          All our local tests indicates that it works as intended.

          Show
          Johannes added a comment - This is a version of the patch that works against the 1.4.1 branch. All our local tests indicates that it works as intended.
          Hide
          Erik Hatcher added a comment -

          Johannes - thanks! Test cases look thorough from a glance. Kinda hairy stuff in there, so give me a few days to scratch my head and review this, but something worthwhile getting fixed finally.

          Many other commenters on this issue - maybe we can get a few more folks to try this out and confirm it fixes their cases too.

          Show
          Erik Hatcher added a comment - Johannes - thanks! Test cases look thorough from a glance. Kinda hairy stuff in there, so give me a few days to scratch my head and review this, but something worthwhile getting fixed finally. Many other commenters on this issue - maybe we can get a few more folks to try this out and confirm it fixes their cases too.
          Hide
          James Gilliland added a comment -

          I don't know if its directly related to this issue but I found the same error with people searching for "foo AND - AND bar"

          Show
          James Gilliland added a comment - I don't know if its directly related to this issue but I found the same error with people searching for "foo AND - AND bar"
          Hide
          Robert Muir added a comment -

          Bulk move 3.2 -> 3.3

          Show
          Robert Muir added a comment - Bulk move 3.2 -> 3.3
          Hide
          Robert Muir added a comment -

          3.4 -> 3.5

          Show
          Robert Muir added a comment - 3.4 -> 3.5
          Hide
          Erik Hatcher added a comment -

          Bump. Apologies for letting this issue collect dust. Any +1's or -1's to the patches? I'll aim to commit within a week after a deeper review barring any objections.

          Show
          Erik Hatcher added a comment - Bump. Apologies for letting this issue collect dust. Any +1's or -1's to the patches? I'll aim to commit within a week after a deeper review barring any objections.
          Hide
          Hoss Man added a comment -

          bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment

          Show
          Hoss Man added a comment - bulk fixing the version info for 4.0-ALPHA and 4.0 all affected issues have "hoss20120711-bulk-40-change" in comment
          Hide
          Robert Muir added a comment -

          rmuir20120906-bulk-40-change

          Show
          Robert Muir added a comment - rmuir20120906-bulk-40-change
          Hide
          Alexander S. added a comment - - edited

          Hi, sorry for asking this here, but is the next error related to this issue?

          Aug 26, 2012 8:36:24 AM org.apache.solr.common.SolrException log
          SEVERE: org.apache.solr.common.SolrException
                  at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:134)
                  at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165)
                  at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129)
                  at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376)
                  at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365)
                  at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260)
                  at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243)
                  at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210)
                  at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224)
                  at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175)
                  at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472)
                  at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164)
                  at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100)
                  at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929)
                  at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118)
                  at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405)
                  at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279)
                  at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515)
                  at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300)
                  at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110)
                  at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603)
                  at java.lang.Thread.run(Thread.java:679)
          Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse '"hgps" "hhho" "hhrh"  ...truncated...  "kidney stones" "kidney transplant" "kidney trafq=type:Tweet': Lexical error at line 1, column 6783.  Encountered: <EOF> after : "\"kidney trafq=type:Tweet"
                  at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216)
                  at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79)
                  at org.apache.solr.search.QParser.getQuery(QParser.java:143)
                  at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105)
                  ... 21 more
          Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 6783.  Encountered: <EOF> after : "\"kidney trafq=type:Tweet"
                  at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229)
                  at org.apache.lucene.queryParser.QueryParser.jj_ntk(QueryParser.java:1772)
                  at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1555)
                  at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317)
                  at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1274)
                  at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234)
                  at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206)
                  ... 24 more
          

          "kidney trafq=" should be "kidney transplantation" fq='type:Tweet', so it looks like the query string was truncated.

          And this one also looks very similar
          http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3C007b01ccf78e$9171c1f0$b45545d0$@gmail.com%3E

          Best,
          Alex

          Show
          Alexander S. added a comment - - edited Hi, sorry for asking this here, but is the next error related to this issue? Aug 26, 2012 8:36:24 AM org.apache.solr.common.SolrException log SEVERE: org.apache.solr.common.SolrException at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:134) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:165) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:129) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1376) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:365) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:260) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:243) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:210) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:224) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:175) at org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:472) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:164) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:100) at org.apache.catalina.valves.AccessLogValve.invoke(AccessLogValve.java:929) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:118) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:405) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:279) at org.apache.coyote.AbstractProtocol$AbstractConnectionHandler.process(AbstractProtocol.java:515) at org.apache.tomcat.util.net.JIoEndpoint$SocketProcessor.run(JIoEndpoint.java:300) at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1110) at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:603) at java.lang. Thread .run( Thread .java:679) Caused by: org.apache.lucene.queryParser.ParseException: Cannot parse ' "hgps" "hhho" "hhrh" ...truncated... "kidney stones" "kidney transplant" "kidney trafq=type:Tweet': Lexical error at line 1, column 6783. Encountered: <EOF> after : " \ "kidney trafq=type:Tweet" at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:216) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:79) at org.apache.solr.search.QParser.getQuery(QParser.java:143) at org.apache.solr.handler.component.QueryComponent.prepare(QueryComponent.java:105) ... 21 more Caused by: org.apache.lucene.queryParser.TokenMgrError: Lexical error at line 1, column 6783. Encountered: <EOF> after : "\" kidney trafq=type:Tweet" at org.apache.lucene.queryParser.QueryParserTokenManager.getNextToken(QueryParserTokenManager.java:1229) at org.apache.lucene.queryParser.QueryParser.jj_ntk(QueryParser.java:1772) at org.apache.lucene.queryParser.QueryParser.Term(QueryParser.java:1555) at org.apache.lucene.queryParser.QueryParser.Clause(QueryParser.java:1317) at org.apache.lucene.queryParser.QueryParser.Query(QueryParser.java:1274) at org.apache.lucene.queryParser.QueryParser.TopLevelQuery(QueryParser.java:1234) at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:206) ... 24 more "kidney trafq=" should be "kidney transplantation" fq='type:Tweet', so it looks like the query string was truncated. And this one also looks very similar http://mail-archives.apache.org/mod_mbox/lucene-solr-user/201203.mbox/%3C007b01ccf78e$9171c1f0$b45545d0$@gmail.com%3E Best, Alex
          Hide
          Erik Hatcher added a comment - - edited

          I started to dig into this for 4.1, but it's hairier than I thought with edge cases that need to be accounted for. Moving this to 5.0 since I won't have time to deal with this for 4.1, sorry.

          Show
          Erik Hatcher added a comment - - edited I started to dig into this for 4.1, but it's hairier than I thought with edge cases that need to be accounted for. Moving this to 5.0 since I won't have time to deal with this for 4.1, sorry.

            People

            • Assignee:
              Unassigned
              Reporter:
              Erik Hatcher
            • Votes:
              9 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:

                Development