Lucene - Core
  1. Lucene - Core
  2. LUCENE-995

Add open ended range query syntax to QueryParser

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: 1.9, 2.0.0, 2.1, 2.2
    • Fix Version/s: 3.6, 4.0-ALPHA
    • Component/s: core/queryparser
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      The QueryParser fails to generate open ended range queries.
      Parsing e.g. "date:[1990 TO *]" gives zero results,
      but
      ConstantRangeQuery("date","1990",null,true,true)
      does produce the expected results.

      "date:[* TO 1990]" gives the same results as ConstantRangeQuery("date",null,"1990",true,true).

      1. LUCENE-995-backport-3x.patch
        9 kB
        Ingo Renner
      2. LUCENE-995.patch
        10 kB
        Yonik Seeley
      3. LUCENE-995.patch
        11 kB
        Yonik Seeley
      4. LUCENE-995.patch
        17 kB
        Michael McCandless
      5. LUCENE-995_09_21_2009.patch
        94 kB
        Adriano Crestani

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          That looks like Solr syntax. The Lucene QueryParser doesn't support open-ended range queries.

          Show
          Yonik Seeley added a comment - That looks like Solr syntax. The Lucene QueryParser doesn't support open-ended range queries.
          Hide
          Jonas Gorski added a comment -

          But if it doesn't support them, shouldn't it throw an ParseException? And why does "[* TO x]" work?
          That looks inconsistent in behaviour to me.

          Show
          Jonas Gorski added a comment - But if it doesn't support them, shouldn't it throw an ParseException? And why does " [* TO x] " work? That looks inconsistent in behaviour to me.
          Hide
          Yonik Seeley added a comment -

          It doesn't support them in that it has no special syntax to specify an open endpoint. So * is a literal *, and is a valid endpoint for a range query (hence no exception).

          For your date strings, you could use something like "0" and "z" for your endpoints.

          Show
          Yonik Seeley added a comment - It doesn't support them in that it has no special syntax to specify an open endpoint. So * is a literal *, and is a valid endpoint for a range query (hence no exception). For your date strings, you could use something like "0" and "z" for your endpoints.
          Hide
          Luis Alves added a comment -

          I propose to improve the syntax to allow the following

          "date>1990"
          "date<1990"
          "date=1990"
          "date>=1990"
          "date<=1990"

          What do you guys think?

          We probably could do it at same time we implement LUCENE-1823

          Show
          Luis Alves added a comment - I propose to improve the syntax to allow the following "date>1990" "date<1990" "date=1990" "date>=1990" "date<=1990" What do you guys think? We probably could do it at same time we implement LUCENE-1823
          Hide
          Luis Alves added a comment -

          The fix I propose, will only be fixed in the new queryparser implementation.

          Show
          Luis Alves added a comment - The fix I propose, will only be fixed in the new queryparser implementation.
          Hide
          Adriano Crestani added a comment -

          The patch adds open ended range query to oal.queryParser.QueryParser. By default it's disabled and may be enabled invoking setEnableOpenEndedRangeQueries. I also added the capability to redefine the open ended token, the default is '*'.

          I also added some test cases.

          Show
          Adriano Crestani added a comment - The patch adds open ended range query to oal.queryParser.QueryParser. By default it's disabled and may be enabled invoking setEnableOpenEndedRangeQueries. I also added the capability to redefine the open ended token, the default is '*'. I also added some test cases.
          Hide
          Yonik Seeley added a comment -

          FYI, this last patch has the same issue as Solr does, described in SOLR-2189.
          I'm going to take a quick crack at fixing it in the QP, before getRangeQuery() is called.

          Show
          Yonik Seeley added a comment - FYI, this last patch has the same issue as Solr does, described in SOLR-2189 . I'm going to take a quick crack at fixing it in the QP, before getRangeQuery() is called.
          Hide
          Yonik Seeley added a comment -

          Here's a draft patch (no tests yet, so I don't know if it works).
          But the proposal is simple: * in a range query would be an open end.
          "*" or * would be ways to represent a literal asterisk instead of an open end.
          The QP will now pass null for an open end to getRangeQuery(). This will match
          the constructors on our actual range query objects.

          Show
          Yonik Seeley added a comment - Here's a draft patch (no tests yet, so I don't know if it works). But the proposal is simple: * in a range query would be an open end. "*" or * would be ways to represent a literal asterisk instead of an open end. The QP will now pass null for an open end to getRangeQuery(). This will match the constructors on our actual range query objects.
          Hide
          Yonik Seeley added a comment -

          OK, here's the final patch - adds a few tests and changes TermRangeQuery.toString() to output * for a literal asterisk.

          Show
          Yonik Seeley added a comment - OK, here's the final patch - adds a few tests and changes TermRangeQuery.toString() to output * for a literal asterisk.
          Hide
          Masoud added a comment -

          Is this a fix just for the date fields, or does it work for all fields, for example will Author:[* TO *] work to get all Authors that have something in them (and also, query like : -Author:[* TO *] to get all records that are missing the Author field?)

          Any idea what version of Lucene this will be included with and when it will be released?

          Show
          Masoud added a comment - Is this a fix just for the date fields, or does it work for all fields, for example will Author: [* TO *] work to get all Authors that have something in them (and also, query like : -Author: [* TO *] to get all records that are missing the Author field?) Any idea what version of Lucene this will be included with and when it will be released?
          Hide
          Ingo Renner added a comment -

          would it be possible to back port this to 3.x?

          Show
          Ingo Renner added a comment - would it be possible to back port this to 3.x?
          Hide
          Ingo Renner added a comment -

          BTW, I applied the patch to 3.x, testing now...

          Show
          Ingo Renner added a comment - BTW, I applied the patch to 3.x, testing now...
          Hide
          Ingo Renner added a comment -

          Here's the 3.x back port. Unit Test for TestQueryParser is green, but I'd still like someone else to have a look at it as Java is not my primary language.

          Show
          Ingo Renner added a comment - Here's the 3.x back port. Unit Test for TestQueryParser is green, but I'd still like someone else to have a look at it as Java is not my primary language.
          Hide
          Michael McCandless added a comment -

          3.x patch looks great!

          Only thing I noticed is that QueryParser.java has changes not present in QueryParser.jj

          Also, I see a Solr test failure with this patch; just run:

            ant test -Dtestcase=ConvertedLegacyTest -Dtestmethod=testABunchOfConvertedStuff -Dtests.seed=-217957745e3f4a9e:-1e9dfcec76c86042:439d76b910f98fee -Dargs="-Dfile.encoding=UTF-8"
          

          Looks like the test has a TermRangeQuery whose * need to be escaped as \ *

          Show
          Michael McCandless added a comment - 3.x patch looks great! Only thing I noticed is that QueryParser.java has changes not present in QueryParser.jj Also, I see a Solr test failure with this patch; just run: ant test -Dtestcase=ConvertedLegacyTest -Dtestmethod=testABunchOfConvertedStuff -Dtests.seed=-217957745e3f4a9e:-1e9dfcec76c86042:439d76b910f98fee -Dargs="-Dfile.encoding=UTF-8" Looks like the test has a TermRangeQuery whose * need to be escaped as \ *
          Hide
          Ingo Renner added a comment -

          As I said I'm not yet that familiar with Java so I actually had to look up what that .jj file is about and now I understand that QueryParser.java actually is compiled/generated from the .jj file using javacc.

          I don't know where the other changes come from, all I did was basically applying the 4.x patch and fixing incompatibilities manually.

          Regarding the asterisk, could it be that this is related to SOLR-2189 ?

          Show
          Ingo Renner added a comment - As I said I'm not yet that familiar with Java so I actually had to look up what that .jj file is about and now I understand that QueryParser.java actually is compiled/generated from the .jj file using javacc. I don't know where the other changes come from, all I did was basically applying the 4.x patch and fixing incompatibilities manually. Regarding the asterisk, could it be that this is related to SOLR-2189 ?
          Hide
          Ingo Renner added a comment -

          I would guess though, that someone changed QueryParser.java and forgot to apply those changes to the .jj file, too obviously

          Show
          Ingo Renner added a comment - I would guess though, that someone changed QueryParser.java and forgot to apply those changes to the .jj file, too obviously
          Hide
          Yonik Seeley added a comment -

          Regarding the asterisk, could it be that this is related to SOLR-2189 ?

          Hmmm, I guess I opened SOLR-2189 before I took a crack at fixing it the right way (so solr no longer needs to check for "*" since the QP grammar does).
          I'll close it.

          Show
          Yonik Seeley added a comment - Regarding the asterisk, could it be that this is related to SOLR-2189 ? Hmmm, I guess I opened SOLR-2189 before I took a crack at fixing it the right way (so solr no longer needs to check for "*" since the QP grammar does). I'll close it.
          Hide
          Michael McCandless added a comment -

          OK I see what's causing the test failure ... the 3.x QP has separate code for the inclusive vs exclusive range cases (in trunk they were merged at some point I guess...), so, I'll fix the patch to also fix the exclusive case.

          I'll also sync up QueryParser.jj so that when gen'd it matches the QueryParser.java from the patch.

          Show
          Michael McCandless added a comment - OK I see what's causing the test failure ... the 3.x QP has separate code for the inclusive vs exclusive range cases (in trunk they were merged at some point I guess...), so, I'll fix the patch to also fix the exclusive case. I'll also sync up QueryParser.jj so that when gen'd it matches the QueryParser.java from the patch.
          Hide
          Michael McCandless added a comment -

          Patch, fixing the above issues.

          I also fixed QueryParser.jj to include the [untested] changes to QueryParser.java, from SOLR-2348.

          Show
          Michael McCandless added a comment - Patch, fixing the above issues. I also fixed QueryParser.jj to include the [untested] changes to QueryParser.java, from SOLR-2348 .
          Hide
          Michael McCandless added a comment -

          Thanks Ingo!

          Show
          Michael McCandless added a comment - Thanks Ingo!
          Hide
          Ingo Renner added a comment -

          Thanks for committing Michael!

          Show
          Ingo Renner added a comment - Thanks for committing Michael!

            People

            • Assignee:
              Michael McCandless
              Reporter:
              Jonas Gorski
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development