Lucene - Core
  1. Lucene - Core
  2. LUCENE-881

QueryParser escaping/parsin issue with strings starting/ending with ||

    Details

    • Type: Bug Bug
    • Status: Resolved
    • Priority: Trivial Trivial
    • Resolution: Fixed
    • Affects Version/s: 2.1, 2.2
    • Fix Version/s: None
    • Component/s: core/queryparser
    • Labels:
      None
    • Environment:

      MAC OS X 10.4.7, J2SE 5.0 Release 4

    • Lucene Fields:
      New

      Description

      There is a problem with query parser when search string starts/ends with ||. When string contains || in the middle like 'something || something' everything runs without a problem.

      Part of code:
      searchText = QueryParser.escape(searchText);
      QueryParser parser = null;
      parser = new QueryParser(fieldName, new CustomAnalyser());
      parser.parse(searchText);

      CustomAnalyser class extends Analyser. Here is the only redefined method:

      @Override
      public TokenStream tokenStream(String fieldName, Reader reader)

      { return new PorterStemFilter( (new StopAnalyzer()).tokenStream(fieldName, reader)); }

      I have tested this on Lucene 2.1 and latest source I have checked-out from SVN (Revision 538867) and in both cases parsing exception was thrown.

      Part of Stack Trace (Lucene - SVN checkout - Revision 538867):
      Cannot parse 'someting ||': Encountered "<EOF>" at line 1, column 11.
      Was expecting one of:
      <NOT> ...
      "+" ...
      "-" ...
      "(" ...
      "*" ...
      <QUOTED> ...
      <TERM> ...
      <PREFIXTERM> ...
      <WILDTERM> ...
      "[" ...
      "{" ...
      <NUMBER> ...

      org.apache.lucene.queryParser.ParseException: Cannot parse 'someting ||': Encountered "<EOF>" at line 1, column 11.
      Was expecting one of:
      <NOT> ...
      "+" ...
      "-" ...
      "(" ...
      "*" ...
      <QUOTED> ...
      <TERM> ...
      <PREFIXTERM> ...
      <WILDTERM> ...
      "[" ...
      "{" ...
      <NUMBER> ...

      at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:150)

      Part of Stack Trace (Lucene 2.1):
      Cannot parse 'something ||': Encountered "<EOF>" at line 1, column 12.
      Was expecting one of:
      <NOT> ...
      "+" ...
      "-" ...
      "(" ...
      "*" ...
      <QUOTED> ...
      <TERM> ...
      <PREFIXTERM> ...
      <WILDTERM> ...
      "[" ...
      "{" ...
      <NUMBER> ...

      org.apache.lucene.queryParser.ParseException: Cannot parse 'something ||': Encountered "<EOF>" at line 1, column 12.
      Was expecting one of:
      <NOT> ...
      "+" ...
      "-" ...
      "(" ...
      "*" ...
      <QUOTED> ...
      <TERM> ...
      <PREFIXTERM> ...
      <WILDTERM> ...
      "[" ...
      "{" ...
      <NUMBER> ...

      at org.apache.lucene.queryParser.QueryParser.parse(QueryParser.java:149)

      1. lucene-881.patch
        2 kB
        Michael Busch

        Activity

        Hide
        Michael Busch added a comment -

        The problem here is that QueryParser.escape() does not escape | and &. This should be easy to fix, I'll submit a patch soon.

        Show
        Michael Busch added a comment - The problem here is that QueryParser.escape() does not escape | and &. This should be easy to fix, I'll submit a patch soon.
        Hide
        Yonik Seeley added a comment -

        Sorry, I don't quite understand the problem. Could someone provide an actual query string that should work but doesn't? "||" is reserved since it means OR, AFAIK.

        Show
        Yonik Seeley added a comment - Sorry, I don't quite understand the problem. Could someone provide an actual query string that should work but doesn't? "||" is reserved since it means OR, AFAIK.
        Hide
        Michael Busch added a comment -

        You are right Yonik, || is reserved.

        The QueryParser itself works correctly:

        "|| test ||" yields a ParseException, which is correct because in this case || means OR
        "|| test ||" yields "|| test ||", this is correct, too, because the two | are escaped

        The problem here is the escape() method:

        /**

        • Returns a String where those characters that QueryParser
        • expects to be escaped are escaped by a preceding <code>\</code>.
          */
          public static String escape(String s);

        It escapes chars like +, -, ! and so on. Example:

        escape("++ test ++") yields "++ test ++"

        but

        escape("|| test ||") yields "|| test ||".

        I believe to be consistent escape() should escape the two chars | and & as well, no?

        Show
        Michael Busch added a comment - You are right Yonik, || is reserved. The QueryParser itself works correctly: "|| test ||" yields a ParseException, which is correct because in this case || means OR "|| test ||" yields "|| test ||", this is correct, too, because the two | are escaped The problem here is the escape() method: /** Returns a String where those characters that QueryParser expects to be escaped are escaped by a preceding <code>\</code>. */ public static String escape(String s); It escapes chars like +, -, ! and so on. Example: escape("++ test ++") yields "++ test ++" but escape("|| test ||") yields "|| test ||". I believe to be consistent escape() should escape the two chars | and & as well, no?
        Hide
        Yonik Seeley added a comment -

        > escape() should escape the two chars | and & as well, no?

        Agree.

        Show
        Yonik Seeley added a comment - > escape() should escape the two chars | and & as well, no? Agree.
        Hide
        Michael Busch added a comment -

        Patch with additional unit tests.

        All tests pass.

        Show
        Michael Busch added a comment - Patch with additional unit tests. All tests pass.
        Hide
        Michael Busch added a comment -

        I just committed this patch. Thank you for finding this bug, Slobodan!

        Show
        Michael Busch added a comment - I just committed this patch. Thank you for finding this bug, Slobodan!

          People

          • Assignee:
            Michael Busch
            Reporter:
            Slobodan Marjanovic
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development