Jackrabbit Content Repository
  1. Jackrabbit Content Repository
  2. JCR-1248

Helper Method to escape illegal XPath Search Term

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5
    • Component/s: jackrabbit-jcr-commons
    • Labels:
      None

      Description

      If you try to perform a search like this

      //element(*, nt:base)[jcr:contains(., 'test!')]

      you get this exception

      javax.jcr.RepositoryException: Exception building query: org.apache.jackrabbit.core.query.lucene.fulltext.ParseException: Encountered "<EOF>" at line 1, column 6.

      1. patch.txt
        1 kB
        Claus Köll

        Activity

        Hide
        Ard Schrijvers added a comment -

        Repeated from user-list:

        It seems that in LuceneQueryBuilder at

        Object visit(TextsearchQueryNode node, Object data) {

        it breaks at

        Query context = parser.parse(query.toString());

        where the parser is o.a.j.core.query.lucene.fulltext.QueryParser. It seems to break on string ending with a "!". Unfortunately, I do not have insight in how the QueryParser works. Perhaps somebody else knows where to look in the QueryParser .

        Show
        Ard Schrijvers added a comment - Repeated from user-list: It seems that in LuceneQueryBuilder at Object visit(TextsearchQueryNode node, Object data) { it breaks at Query context = parser.parse(query.toString()); where the parser is o.a.j.core.query.lucene.fulltext.QueryParser. It seems to break on string ending with a "!". Unfortunately, I do not have insight in how the QueryParser works. Perhaps somebody else knows where to look in the QueryParser .
        Hide
        Marcel Reutegger added a comment -

        In addition to the already specified set of special character in JSR 170, Jackrabbit uses more of those characters for extended functionality.

        This set of characters should be limited to the ones really required (e.g. ! is equivalent to -) and clearly documented. It would be nice to also have a utility class that automatically escapes the special characters used in Jackrabbit.

        Show
        Marcel Reutegger added a comment - In addition to the already specified set of special character in JSR 170, Jackrabbit uses more of those characters for extended functionality. This set of characters should be limited to the ones really required (e.g. ! is equivalent to -) and clearly documented. It would be nice to also have a utility class that automatically escapes the special characters used in Jackrabbit.
        Hide
        Claus Köll added a comment -

        I added a helper Method in org.apache.jackrabbit.util.Text to escape illegal XPathChars.
        It checks illegal chars at the end of a XPatch search term.

        Show
        Claus Köll added a comment - I added a helper Method in org.apache.jackrabbit.util.Text to escape illegal XPathChars. It checks illegal chars at the end of a XPatch search term.
        Hide
        Claus Köll added a comment -

        Committed in Rev: 706242

        Show
        Claus Köll added a comment - Committed in Rev: 706242
        Hide
        Jukka Zitting added a comment -

        This turned out to be implemented as a new feature in jcr-commons, changing issue metadata accordingly.

        I guess the original problem (ParseException) is the expected (though undocumented) behavior, so there's no need to fix this for clients that don't use the new helper method.

        Show
        Jukka Zitting added a comment - This turned out to be implemented as a new feature in jcr-commons, changing issue metadata accordingly. I guess the original problem (ParseException) is the expected (though undocumented) behavior, so there's no need to fix this for clients that don't use the new helper method.
        Hide
        Paco Avila added a comment - - edited

        A query like this will fail:

        //element(*, nt:base)[jcr:contains(., 'test \ done')]

        Specification JSR-170 at point 6.6.5.2 says that literal instances like single quote ( ' ), double quote ( " ) and hyphen ( - ) must be escaped with a backslash ( \ ), and backslash itself should be escaped as a double backslash (
        ). Also, I have noted that some chars like [ and ] need to be escaped also.

        Show
        Paco Avila added a comment - - edited A query like this will fail: //element(*, nt:base) [jcr:contains(., 'test \ done')] Specification JSR-170 at point 6.6.5.2 says that literal instances like single quote ( ' ), double quote ( " ) and hyphen ( - ) must be escaped with a backslash ( \ ), and backslash itself should be escaped as a double backslash ( ). Also, I have noted that some chars like [ and ] need to be escaped also.
        Hide
        Alexander Klimetschek added a comment -

        > A query like this will fail:
        > //element(*, nt:base)[jcr:contains(., 'test \ done')]

        Did you use org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars() ? In case that one has a bug, please file a new issue.

        See also http://wiki.apache.org/jackrabbit/EncodingAndEscaping

        Show
        Alexander Klimetschek added a comment - > A query like this will fail: > //element(*, nt:base) [jcr:contains(., 'test \ done')] Did you use org.apache.jackrabbit.util.Text.escapeIllegalXpathSearchChars() ? In case that one has a bug, please file a new issue. See also http://wiki.apache.org/jackrabbit/EncodingAndEscaping
        Hide
        Paco Avila added a comment -

        By the way, this sample code at http://wiki.apache.org/jackrabbit/EncodingAndEscaping is recursive:

        String q =
        "/jcr:root/foo/element(*, foo)" +
        "[jcr:contains(@title, '" + Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')]" +
        "[@itemID = '" + itemID.replaceAll("'", "''") + "']";

        Show
        Paco Avila added a comment - By the way, this sample code at http://wiki.apache.org/jackrabbit/EncodingAndEscaping is recursive: String q = "/jcr:root/foo/element(*, foo)" + " [jcr:contains(@title, '" + Text.escapeIllegalXpathSearchChars(q).replaceAll("'", "''") + "')] " + " [@itemID = '" + itemID.replaceAll("'", "''") + "'] ";
        Hide
        Paco Avila added a comment -

        I'm not sure if is a bug or a "feature". The query

        String term = "pe[]pe";
        String scapedTerm = Text.escapeIllegalXpathSearchChars(term).replaceAll("'", "''")
        String query = "/jcr:root//*[jcr:contains(okm:content,'"+escapedTerm+"')]"

        should fail or the term "pe[]pe" should be escaped as "pe[]pe"?

        Show
        Paco Avila added a comment - I'm not sure if is a bug or a "feature". The query String term = "pe[]pe"; String scapedTerm = Text.escapeIllegalXpathSearchChars(term).replaceAll("'", "''") String query = "/jcr:root//* [jcr:contains(okm:content,'"+escapedTerm+"')] " should fail or the term "pe[]pe" should be escaped as "pe[]pe"?

          People

          • Assignee:
            Claus Köll
            Reporter:
            Claus Köll
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development