Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-1248

Helper Method to escape illegal XPath Search Term

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5
    • Component/s: jackrabbit-jcr-commons
    • Labels:
      None

      Description

      If you try to perform a search like this

      //element(*, nt:base)[jcr:contains(., 'test!')]

      you get this exception

      javax.jcr.RepositoryException: Exception building query: org.apache.jackrabbit.core.query.lucene.fulltext.ParseException: Encountered "<EOF>" at line 1, column 6.

      1. patch.txt
        1 kB
        Claus Köll

        Issue Links

          Activity

          Claus Köll created issue -
          Hide
          Ard Schrijvers added a comment -

          Repeated from user-list:

          It seems that in LuceneQueryBuilder at

          Object visit(TextsearchQueryNode node, Object data) {

          it breaks at

          Query context = parser.parse(query.toString());

          where the parser is o.a.j.core.query.lucene.fulltext.QueryParser. It seems to break on string ending with a "!". Unfortunately, I do not have insight in how the QueryParser works. Perhaps somebody else knows where to look in the QueryParser .

          Show
          Ard Schrijvers added a comment - Repeated from user-list: It seems that in LuceneQueryBuilder at Object visit(TextsearchQueryNode node, Object data) { it breaks at Query context = parser.parse(query.toString()); where the parser is o.a.j.core.query.lucene.fulltext.QueryParser. It seems to break on string ending with a "!". Unfortunately, I do not have insight in how the QueryParser works. Perhaps somebody else knows where to look in the QueryParser .
          Hide
          Marcel Reutegger added a comment -

          In addition to the already specified set of special character in JSR 170, Jackrabbit uses more of those characters for extended functionality.

          This set of characters should be limited to the ones really required (e.g. ! is equivalent to -) and clearly documented. It would be nice to also have a utility class that automatically escapes the special characters used in Jackrabbit.

          Show
          Marcel Reutegger added a comment - In addition to the already specified set of special character in JSR 170, Jackrabbit uses more of those characters for extended functionality. This set of characters should be limited to the ones really required (e.g. ! is equivalent to -) and clearly documented. It would be nice to also have a utility class that automatically escapes the special characters used in Jackrabbit.
          Jukka Zitting made changes -
          Field Original Value New Value
          Component/s jackrabbit-core [ 12310114 ]
          Hide
          Claus Köll added a comment -

          I added a helper Method in org.apache.jackrabbit.util.Text to escape illegal XPathChars.
          It checks illegal chars at the end of a XPatch search term.

          Show
          Claus Köll added a comment - I added a helper Method in org.apache.jackrabbit.util.Text to escape illegal XPathChars. It checks illegal chars at the end of a XPatch search term.
          Claus Köll made changes -
          Attachment patch.txt [ 12392463 ]
          Claus Köll made changes -
          Summary ParseException if search string ends with '!' Helper Method to escape illegal XPath Search Term
          Claus Köll made changes -
          Assignee Claus Köll [ c_koell ]
          Hide
          Claus Köll added a comment -

          Committed in Rev: 706242

          Show
          Claus Köll added a comment - Committed in Rev: 706242
          Claus Köll made changes -
          Status Open [ 1 ] Resolved [ 5 ]
          Resolution Fixed [ 1 ]
          Claus Köll made changes -
          Fix Version/s 1.5.0 [ 12312920 ]
          Hide
          Jukka Zitting added a comment -

          This turned out to be implemented as a new feature in jcr-commons, changing issue metadata accordingly.

          I guess the original problem (ParseException) is the expected (though undocumented) behavior, so there's no need to fix this for clients that don't use the new helper method.

          Show
          Jukka Zitting added a comment - This turned out to be implemented as a new feature in jcr-commons, changing issue metadata accordingly. I guess the original problem (ParseException) is the expected (though undocumented) behavior, so there's no need to fix this for clients that don't use the new helper method.
          Jukka Zitting made changes -
          Affects Version/s 1.3.3 [ 12312770 ]
          Component/s jackrabbit-jcr-commons [ 12312057 ]
          Component/s jackrabbit-core [ 12310114 ]
          Priority Major [ 3 ] Minor [ 4 ]
          Component/s query [ 11656 ]
          Issue Type Bug [ 1 ] New Feature [ 2 ]
          Jukka Zitting made changes -
          Status Resolved [ 5 ] Closed [ 6 ]
          Jukka Zitting made changes -
          Workflow jira [ 12418251 ] no-reopen-closed, patch-avail [ 12468343 ]
          Hide
          Paco Avila added a comment - - edited

          A query like this will fail:

          //element(*, nt:base)[jcr:contains(., 'test \ done')]

          Specification JSR-170 at point 6.6.5.2 says that literal instances like single quote ( ' ), double quote ( " ) and hyphen ( - ) must be escaped with a backslash ( \ ), and backslash itself should be escaped as a double backslash (
          ). Also, I have noted that some chars like [ and ] need to be escaped also.

          Show
          Paco Avila added a comment - - edited A query like this will fail: //element(*, nt:base) [jcr:contains(., 'test \ done')] Specification JSR-170 at point 6.6.5.2 says that literal instances like single quote ( ' ), double quote ( " ) and hyphen ( - ) must be escaped with a backslash ( \ ), and backslash itself should be escaped as a double backslash ( ). Also, I have noted that some chars like [ and ] need to be escaped also.
          Hide
          Alexander Klimetschek added a comment -

          > A query like this will fail:
          > //element(*, nt:base)[jcr:contains(., 'test \ done')]

          Did you use org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars() ? In case that one has a bug, please file a new issue.

          See also http://wiki.apache.org/jackrabbit/EncodingAndEscaping

          Show
          Alexander Klimetschek added a comment - > A query like this will fail: > //element(*, nt:base) [jcr:contains(., 'test \ done')] Did you use org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars() ? In case that one has a bug, please file a new issue. See also http://wiki.apache.org/jackrabbit/EncodingAndEscaping
          Hide
          Paco Avila added a comment -

          By the way, this sample code at http://wiki.apache.org/jackrabbit/EncodingAndEscaping is recursive:

          String q =
          "/jcr:root/foo/element(*, foo)" +
          "[jcr:contains(@title, '" + Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')]" +
          "[@itemID = '" + itemID.replaceAll("'", "''") + "']";

          Show
          Paco Avila added a comment - By the way, this sample code at http://wiki.apache.org/jackrabbit/EncodingAndEscaping is recursive: String q = "/jcr:root/foo/element(*, foo)" + " [jcr:contains(@title, '" + Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')] " + " [@itemID = '" + itemID.replaceAll("'", "''") + "'] ";
          Hide
          Paco Avila added a comment -

          I'm not sure if is a bug or a "feature". The query

          String term = "pe[]pe";
          String scapedTerm = Text.escapeIllegalXpathSearchChars(term).replaceAll("'", "''")
          String query = "/jcr:root//*[jcr:contains(okm:content,'"+escapedTerm+"')]"

          should fail or the term "pe[]pe" should be escaped as "pe[]pe"?

          Show
          Paco Avila added a comment - I'm not sure if is a bug or a "feature". The query String term = "pe[]pe"; String scapedTerm = Text.escapeIllegalXpathSearchChars(term).replaceAll("'", "''") String query = "/jcr:root//* [jcr:contains(okm:content,'"+escapedTerm+"')] " should fail or the term "pe[]pe" should be escaped as "pe[]pe"?
          Marcel Reutegger made changes -
          Link This issue is related to JCR-3800 [ JCR-3800 ]
          Transition Time In Source Status Execution Times Last Executer Last Execution Date
          Open Open Resolved Resolved
          325d 2h 26m 1 Claus Köll 20/Oct/08 12:38
          Resolved Resolved Closed Closed
          48d 23h 31m 1 Jukka Zitting 08/Dec/08 11:09

            People

            • Assignee:
              Claus Köll
              Reporter:
              Claus Köll
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development