Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0-ALPHA
    • Component/s: core/queryparser
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      patch that adds RegexpQuery if you /enter an expression between slashes like this/

      i didnt do the contrib ones but could add it there too if it seems like a good idea.

      1. LUCENE-2604.patch
        32 kB
        Robert Muir
      2. LUCENE-2604.patch
        34 kB
        Robert Muir
      3. LUCENE-2604.patch
        114 kB
        Simon Willnauer

        Issue Links

          Activity

          Hide
          Robert Muir added a comment -

          This issue has been committed for a while.

          Show
          Robert Muir added a comment - This issue has been committed for a while.
          Hide
          Robert Muir added a comment -

          Lance, its true there is a precedence bug, but your example is not correct.

          There are no implicit ".*" in the regexp syntax: /2541/ only matches 2541 exactly.

          Lets open a new issue for this though. This issue needs to be closed.

          Show
          Robert Muir added a comment - Lance, its true there is a precedence bug, but your example is not correct. There are no implicit ".*" in the regexp syntax: /2541/ only matches 2541 exactly. Lets open a new issue for this though. This issue needs to be closed.
          Hide
          Lance Norskog added a comment -

          This feature does not work as of the trunk today.

          Run the example.
          Post the exampledocs files.
          The first document in hd.xml is id=SP2514N.
          This should return that document but does not:

          http://localhost:8983/solr/select?q=id:/2541/&start=0&rows=10&wt=xml
          

          From mail with Robert Muir:

          This looks like a queryparser precedence bug to me. REGEXPTERM should
          be moved up above PREFIXTERM and WILDTERM or queries with * will be
          treated as those first, even if they are inside /..../. Open an issue
          and I'll take care of it.

          Show
          Lance Norskog added a comment - This feature does not work as of the trunk today. Run the example. Post the exampledocs files. The first document in hd.xml is id=SP2514N. This should return that document but does not: http: //localhost:8983/solr/select?q=id:/2541/&start=0&rows=10&wt=xml From mail with Robert Muir: This looks like a queryparser precedence bug to me. REGEXPTERM should be moved up above PREFIXTERM and WILDTERM or queries with * will be treated as those first, even if they are inside /..../. Open an issue and I'll take care of it.
          Hide
          Simon Willnauer added a comment -

          +1 thanks hoss those are valuable and important additions... feel free to assign this issue to you...

          simon

          Show
          Simon Willnauer added a comment - +1 thanks hoss those are valuable and important additions... feel free to assign this issue to you... simon
          Hide
          Robert Muir added a comment -

          +1, thanks for cleaning up.

          Show
          Robert Muir added a comment - +1, thanks for cleaning up.
          Hide
          Hoss Man added a comment -

          Re-opening ... some issues were pointed out in a recent mailing list thread that definitely seem like they should be addressed before this is officially released...

          • queryparsersyntax.xml doesn't mention this feature at all – as major new syntax is should really get it's own section with an example showing the syntax
          • queryparsersyntax.xml's section on "Escaping Special Characters" needs to mention that '/' is a special character
          • QueryParser.escape needs to include '/' in the characters it escapes
            • ditto for Solr's ClientUtils.escapeQueryChars and SolrPluginUtils.partialEscape
            • i've got a patch for this i'll commit as soon as the tests finish.

          Also: Given that Yury encountered some real world situations in which the new syntax caused problems with existing queries, it seems like we should definitely make a note about this possibility more promonient ... i'm not sure if it makes sense in MIGRATE.txt but at a minimum it seems like the existing CHANGES.txt entry should mention it, maybe something like...

          * LUCENE-2604: Added RegexpQuery support to QueryParser. Regular expressions
            are now directly supported by the standard queryparser using the syntax... 
               fieldName:/expression/ OR /expression against default field/
            Users who wish to search for literal "/" characters are advised to 
            backslash-escape or quote those characters as needed. 
            (Simon Willnauer, Robert Muir)
          

          ...thoughts?

          Show
          Hoss Man added a comment - Re-opening ... some issues were pointed out in a recent mailing list thread that definitely seem like they should be addressed before this is officially released... queryparsersyntax.xml doesn't mention this feature at all – as major new syntax is should really get it's own section with an example showing the syntax queryparsersyntax.xml's section on "Escaping Special Characters" needs to mention that '/' is a special character QueryParser.escape needs to include '/' in the characters it escapes ditto for Solr's ClientUtils.escapeQueryChars and SolrPluginUtils.partialEscape i've got a patch for this i'll commit as soon as the tests finish. Also: Given that Yury encountered some real world situations in which the new syntax caused problems with existing queries, it seems like we should definitely make a note about this possibility more promonient ... i'm not sure if it makes sense in MIGRATE.txt but at a minimum it seems like the existing CHANGES.txt entry should mention it, maybe something like... * LUCENE-2604: Added RegexpQuery support to QueryParser. Regular expressions are now directly supported by the standard queryparser using the syntax... fieldName:/expression/ OR /expression against default field/ Users who wish to search for literal "/" characters are advised to backslash-escape or quote those characters as needed. (Simon Willnauer, Robert Muir) ...thoughts?
          Hide
          Robert Muir added a comment -

          Its no bug: the regexp syntax is optionalField:/stuff/

          Show
          Robert Muir added a comment - Its no bug: the regexp syntax is optionalField:/stuff/
          Hide
          Yury Kats added a comment -

          I just came across a problem with this, where regular expressions are being parsed across fields. Feels like a bug to me.

          Queries like these fail to parse:
          fieldName:/a fieldName:/a
          SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no defaultSearchField defined in schema.xml
          at org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:106)
          at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:124)
          at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1058)
          at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358)
          at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257)
          at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212)
          at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
          at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118)
          at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74)
          at org.apache.solr.search.QParser.getQuery(QParser.java:143)

          fieldName:/* fieldName:/*
          java.lang.NullPointerException
          at org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(IndexSchema.java:747)
          at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1026)
          at org.apache.solr.schema.IndexSchema.getFieldType(IndexSchema.java:980)
          at org.apache.solr.search.SolrQueryParser.getWildcardQuery(SolrQueryParser.java:172)
          at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1039)
          at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358)
          at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257)
          at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212)
          at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170)
          at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118)
          at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74)
          at org.apache.solr.search.QParser.getQuery(QParser.java:143)

          Show
          Yury Kats added a comment - I just came across a problem with this, where regular expressions are being parsed across fields. Feels like a bug to me. Queries like these fail to parse: fieldName:/a fieldName:/a SEVERE: org.apache.solr.common.SolrException: no field name specified in query and no defaultSearchField defined in schema.xml at org.apache.solr.search.SolrQueryParser.checkNullField(SolrQueryParser.java:106) at org.apache.solr.search.SolrQueryParser.getFieldQuery(SolrQueryParser.java:124) at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1058) at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358) at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257) at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212) at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170) at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74) at org.apache.solr.search.QParser.getQuery(QParser.java:143) fieldName:/* fieldName:/* java.lang.NullPointerException at org.apache.solr.schema.IndexSchema$DynamicReplacement.matches(IndexSchema.java:747) at org.apache.solr.schema.IndexSchema.getDynamicFieldType(IndexSchema.java:1026) at org.apache.solr.schema.IndexSchema.getFieldType(IndexSchema.java:980) at org.apache.solr.search.SolrQueryParser.getWildcardQuery(SolrQueryParser.java:172) at org.apache.lucene.queryparser.classic.QueryParserBase.handleBareTokenQuery(QueryParserBase.java:1039) at org.apache.lucene.queryparser.classic.QueryParser.Term(QueryParser.java:358) at org.apache.lucene.queryparser.classic.QueryParser.Clause(QueryParser.java:257) at org.apache.lucene.queryparser.classic.QueryParser.Query(QueryParser.java:212) at org.apache.lucene.queryparser.classic.QueryParser.TopLevelQuery(QueryParser.java:170) at org.apache.lucene.queryparser.classic.QueryParserBase.parse(QueryParserBase.java:118) at org.apache.solr.search.LuceneQParser.parse(LuceneQParserPlugin.java:74) at org.apache.solr.search.QParser.getQuery(QParser.java:143)
          Hide
          Simon Willnauer added a comment -

          Committed revision 990836.

          Show
          Simon Willnauer added a comment - Committed revision 990836.
          Hide
          Simon Willnauer added a comment -

          It seems nobody who is more familiar with that contrib query parser code is around to take a look into that. I plan to commit this soon... Any objections?

          Show
          Simon Willnauer added a comment - It seems nobody who is more familiar with that contrib query parser code is around to take a look into that. I plan to commit this soon... Any objections?
          Hide
          Robert Muir added a comment -

          Simon, I took a quick look and the changes look good to me, but i am no expert on that contrib queryparser either

          Thanks for your work here.

          Show
          Robert Muir added a comment - Simon, I took a quick look and the changes look good to me, but i am no expert on that contrib queryparser either Thanks for your work here.
          Hide
          Simon Willnauer added a comment -

          This patch adds RegexpQuery to remaining contrib queryparsers. It also extends the original testcases to cache several other cases.

          I would appreciate if somebody more familiar with the contrib queryparser (the IBM one ) could review that code. Buschmi, would you take a look?

          simon

          Show
          Simon Willnauer added a comment - This patch adds RegexpQuery to remaining contrib queryparsers. It also extends the original testcases to cache several other cases. I would appreciate if somebody more familiar with the contrib queryparser (the IBM one ) could review that code. Buschmi, would you take a look? simon
          Hide
          Simon Willnauer added a comment -

          Simon, thats great if you would like to, only thing is i would like to try to do contrib ones too.

          will do - no problem...

          simon

          Show
          Simon Willnauer added a comment - Simon, thats great if you would like to, only thing is i would like to try to do contrib ones too. will do - no problem... simon
          Hide
          Robert Muir added a comment -

          Simon, thats great if you would like to, only thing is i would like to try to do contrib ones too.

          if you don't have the time to do this part also, i'll get around to it eventually, but in general i think its best to try to keep all the qp's in sync.

          Show
          Robert Muir added a comment - Simon, thats great if you would like to, only thing is i would like to try to do contrib ones too. if you don't have the time to do this part also, i'll get around to it eventually, but in general i think its best to try to keep all the qp's in sync.
          Hide
          Simon Willnauer added a comment -

          Looks good to me robert, want me to take the issue?

          simon

          Show
          Simon Willnauer added a comment - Looks good to me robert, want me to take the issue? simon
          Hide
          Robert Muir added a comment -

          Thanks Simon: you are right! I updated the patch with your test

          I modified the grammar slightly to allow "/" when escaped:

          <REGEXPTERM: "/" (~[ "/" ] | "\\/" )* "/" >
          

          Additionally i removed any un-escaping in QP itself so that it passes
          through unchanged to RegexpQuery: this way you don't have to
          double-escape operators.

          Show
          Robert Muir added a comment - Thanks Simon: you are right! I updated the patch with your test I modified the grammar slightly to allow "/" when escaped: <REGEXPTERM: "/" (~[ "/" ] | "\\/" )* "/" > Additionally i removed any un-escaping in QP itself so that it passes through unchanged to RegexpQuery: this way you don't have to double-escape operators.
          Hide
          Simon Willnauer added a comment -

          hehe - since regexquery is now in core this can be easily integrated. By skimming through your patch I figured that you can not run parse a query like "/[A-Z]
          /[123]/" (already a valid java string) because of the slash in the regex. I had similar issues in LUCENE-2039 which I can't really remember how I solved them but it would be nice if you could escape the slashes if you wanna search for regexp with those chars.

          simon

          Show
          Simon Willnauer added a comment - hehe - since regexquery is now in core this can be easily integrated. By skimming through your patch I figured that you can not run parse a query like "/ [A-Z] / [123] /" (already a valid java string) because of the slash in the regex. I had similar issues in LUCENE-2039 which I can't really remember how I solved them but it would be nice if you could escape the slashes if you wanna search for regexp with those chars. simon

            People

            • Assignee:
              Simon Willnauer
              Reporter:
              Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development