Chemistry
  1. Chemistry
  2. CMIS-511

Full text search still is incomplete

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: None
    • Component/s: opencmis-server-jcr
    • Labels:
      None

      Description

      Full text search still is incomplete: it's not possible query by content.

      In org.apache.chemistry.opencmis.jcr.query.ParseTreeWalker you can find:

      private T walkTextAnd(Evaluator<T> evaluator2, Tree node)

      { // TODO Auto-generated method stub return null; }

      private T walkTextOr(Evaluator<T> evaluator2, Tree node) { // TODO Auto-generated method stub return null; }

      private T walkTextMinus(Evaluator<T> evaluator2, Tree node)

      { // TODO Auto-generated method stub return null; }

      private T walkTextWord(Evaluator<T> evaluator2, Tree node) { // TODO Auto-generated method stub return null; }

      private T walkTextPhrase(Evaluator<T> evaluator2, Tree node)

      { // TODO Auto-generated method stub return null; }
      1. XPathBuilderTest.java
        11 kB
        Jose Carlos Campanero
      2. QueryTranslatorTest.java
        14 kB
        Jose Carlos Campanero
      3. CMIS-511.patch
        10 kB
        Jose Carlos Campanero
      4. CMIS-511.patch
        10 kB
        Michael Dürig

        Activity

        Hide
        Jose Carlos Campanero added a comment -

        I think I found one possible implementation of such functionality. This functionality involves replacing the method walkExprTextSearch(Evaluator<T> evaluator, Tree node) with the following:

            private T walkExprTextSearch(Evaluator<T> evaluator, Tree node) {
                String value = walkExprTextSearch(node);
                return evaluator.value(value);
            }
        
            private String walkExprTextSearch(Tree node) {
                switch (node.getType()) {
                    case TextSearchLexer.TEXT_AND:
                        return walkTextAnd(node);
                    case TextSearchLexer.TEXT_OR:
                        return walkTextOr(node);
                    case TextSearchLexer.TEXT_MINUS:
                        return walkTextMinus(node);
                    case TextSearchLexer.TEXT_SEARCH_WORD_LIT:
                        return walkTextWord(node);
                    case TextSearchLexer.TEXT_SEARCH_PHRASE_STRING_LIT:
                        return walkTextPhrase(node);
                    default:
                    	throw new CmisRuntimeException("Unknown node type: " + node.getType() + " (" + node.getText() + ")");
                }
            }
        

        And drop the different walkTextXXX(Evaluator<T> evaluator, Tree node) methods and use the following implementation instead:

            
            private String walkTextAnd(Tree node) {
            	List<Tree> terms = getChildrenAsList(node);
            	StringBuilder sb = new StringBuilder();
            	String sep = "";
                for (Tree term: terms) {
                    sb.append(sep).append(walkExprTextSearch(term));
                    sep = " ";
                }
                
                return sb.toString();
            }
            
            private String walkTextOr(Tree node) {
            	List<Tree> terms = getChildrenAsList(node);
            	StringBuilder sb = new StringBuilder();
            	String sep = "";
                for (Tree term: terms) {
                    sb.append(sep).append(walkExprTextSearch(term));
                    sep = " OR ";
                }
                
                return sb.toString();
            }
            
            private String walkTextMinus(Tree node) {
            	return "-" + escape(node.getChild(0).getText());
            }
            
            private String walkTextWord(Tree node) {
            	String text = node.getText();
                return escape(text);
            }
            
            private String walkTextPhrase(Tree node) {
            	String phrase = node.getText();
                return "\"" + escape(phrase.substring(1, phrase.length()-1)) + "\"";
            }    
            
            private List<Tree> getChildrenAsList(Tree node) {
                List<Tree> res = new ArrayList<Tree>(node.getChildCount());
                for (int i=0; i<node.getChildCount(); i++) {
                    Tree childNnode =  node.getChild(i);
                    res.add(childNnode);
                }
                return res;
            } 
            
            // Within the searchexp literal instances of single quote (“'”), double quote (“"”) 
            // and hyphen (“-”) must be escaped with a backslash (“\”). Backslash itself must 
            // therefore also be escaped, ending up as double backslash (“\\”). 
            private String escape(String s)
            {
            	if (s == null)
            	{
            		return "";
            	}
            	
            	s = s.replaceAll("'", "\\'");
            	s = s.replaceAll("\"", "\\\"");
            	s = s.replaceAll("-", "\\-");
            	s = s.replaceAll("\\\\", "\\\\\\\\");
            	return s;
            }
        

        Finally, the contains method in EvaluatorXPath must set the search axis to "jcr.content" instead of self ("."):

            @Override
            public XPathBuilder contains(XPathBuilder op1, XPathBuilder op2) {
                return new FunctionBuilder("jcr:contains", "jcr:content", op2);
            }
        
        Show
        Jose Carlos Campanero added a comment - I think I found one possible implementation of such functionality. This functionality involves replacing the method walkExprTextSearch(Evaluator<T> evaluator, Tree node) with the following: private T walkExprTextSearch(Evaluator<T> evaluator, Tree node) { String value = walkExprTextSearch(node); return evaluator.value(value); } private String walkExprTextSearch(Tree node) { switch (node.getType()) { case TextSearchLexer.TEXT_AND: return walkTextAnd(node); case TextSearchLexer.TEXT_OR: return walkTextOr(node); case TextSearchLexer.TEXT_MINUS: return walkTextMinus(node); case TextSearchLexer.TEXT_SEARCH_WORD_LIT: return walkTextWord(node); case TextSearchLexer.TEXT_SEARCH_PHRASE_STRING_LIT: return walkTextPhrase(node); default : throw new CmisRuntimeException( "Unknown node type: " + node.getType() + " (" + node.getText() + ")" ); } } And drop the different walkTextXXX(Evaluator<T> evaluator, Tree node) methods and use the following implementation instead: private String walkTextAnd(Tree node) { List<Tree> terms = getChildrenAsList(node); StringBuilder sb = new StringBuilder(); String sep = ""; for (Tree term: terms) { sb.append(sep).append(walkExprTextSearch(term)); sep = " " ; } return sb.toString(); } private String walkTextOr(Tree node) { List<Tree> terms = getChildrenAsList(node); StringBuilder sb = new StringBuilder(); String sep = ""; for (Tree term: terms) { sb.append(sep).append(walkExprTextSearch(term)); sep = " OR " ; } return sb.toString(); } private String walkTextMinus(Tree node) { return "-" + escape(node.getChild(0).getText()); } private String walkTextWord(Tree node) { String text = node.getText(); return escape(text); } private String walkTextPhrase(Tree node) { String phrase = node.getText(); return "\" " + escape(phrase.substring(1, phrase.length()-1)) + " \""; } private List<Tree> getChildrenAsList(Tree node) { List<Tree> res = new ArrayList<Tree>(node.getChildCount()); for ( int i=0; i<node.getChildCount(); i++) { Tree childNnode = node.getChild(i); res.add(childNnode); } return res; } // Within the searchexp literal instances of single quote (“'”), double quote (“"”) // and hyphen (“-”) must be escaped with a backslash (“\”). Backslash itself must // therefore also be escaped, ending up as double backslash (“\\”). private String escape( String s) { if (s == null ) { return ""; } s = s.replaceAll( "'" , "\\'" ); s = s.replaceAll( "\" ", " \\\""); s = s.replaceAll( "-" , "\\-" ); s = s.replaceAll( "\\\\" , "\\\\\\\\" ); return s; } Finally, the contains method in EvaluatorXPath must set the search axis to "jcr.content" instead of self ("."): @Override public XPathBuilder contains(XPathBuilder op1, XPathBuilder op2) { return new FunctionBuilder( "jcr:contains" , "jcr:content" , op2); }
        Hide
        Michael Dürig added a comment -

        Thanks for the implementation sketch. I'll come up with a patch which fits this into the overall design.

        Show
        Michael Dürig added a comment - Thanks for the implementation sketch. I'll come up with a patch which fits this into the overall design.
        Hide
        Michael Dürig added a comment -

        Proposed patch. Could you please verify on your side and add some unit test for this in QueryTranslatorTest and XPathBuilderTest. I will then commit it.

        Show
        Michael Dürig added a comment - Proposed patch. Could you please verify on your side and add some unit test for this in QueryTranslatorTest and XPathBuilderTest. I will then commit it.
        Hide
        Jose Carlos Campanero added a comment -

        Of course, let me test it.

        Show
        Jose Carlos Campanero added a comment - Of course, let me test it.
        Hide
        Jose Carlos Campanero added a comment -

        Michael, I attached a modified version of the patch: mostly everything works fine, but for phrase searching the implementation provided changes the phrase itself.

        I changed line 50 of the patch to include blank spaces with "OR" in the following class method EvaluatorXPath:

            public XPathBuilder textOr(List<XPathBuilder> ops) {
                return new TextOpBuilder(ops, " OR ");
            }
        

        And the line 138, in the implementation of the method xPath() in TextOpBuilder:

        sep = "" + RelOp + "" -> sep = RelOp;

        I have eliminated also some strange characters that appear in the comments of the method escape().

        I attached also the test cases modified to include some tests about the new functionality.

        Michael, I hope that this will solve the problem. Please, if you need anything, please tell me.

        Thank you.

        Show
        Jose Carlos Campanero added a comment - Michael, I attached a modified version of the patch: mostly everything works fine, but for phrase searching the implementation provided changes the phrase itself. I changed line 50 of the patch to include blank spaces with "OR" in the following class method EvaluatorXPath: public XPathBuilder textOr(List<XPathBuilder> ops) { return new TextOpBuilder(ops, " OR " ); } And the line 138, in the implementation of the method xPath() in TextOpBuilder: sep = "" + RelOp + "" -> sep = RelOp; I have eliminated also some strange characters that appear in the comments of the method escape() . I attached also the test cases modified to include some tests about the new functionality. Michael, I hope that this will solve the problem. Please, if you need anything, please tell me. Thank you.
        Hide
        Michael Dürig added a comment -

        Fixed at revision 1325411:

        • applied modified patch with some javadoc added
        • added test cases

        Thanks for providing test cases and patches

        Show
        Michael Dürig added a comment - Fixed at revision 1325411: applied modified patch with some javadoc added added test cases Thanks for providing test cases and patches
        Hide
        Jose Carlos Campanero added a comment -

        Thank you very much Michael. The issue is closed.

        Show
        Jose Carlos Campanero added a comment - Thank you very much Michael. The issue is closed.

          People

          • Assignee:
            Michael Dürig
            Reporter:
            Jose Carlos Campanero
          • Votes:
            0 Vote for this issue
            Watchers:
            0 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development