Lucene - Core
  1. Lucene - Core
  2. LUCENE-3338

Flexible query parser does not support open ranges and range queries with mixed inclusive and exclusive ranges

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 3.3
    • Fix Version/s: 4.0-ALPHA
    • Component/s: modules/queryparser
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      Flexible query parser does not support open ranges and range queries with mixed inclusive and exclusive ranges.

      These two problems were found while developing LUCENE-1768.

      1. week9-merged-nosurround.patch
        74 kB
        Uwe Schindler
      2. week9-merged-nosurround_with_failing_junit.patch
        74 kB
        Vinicius Barros
      3. week9-merged.patch
        78 kB
        Uwe Schindler
      4. week9.patch
        104 kB
        Vinicius Barros
      5. LUCENE_3338_and_3343_2011_07_30.patch
        104 kB
        Adriano Crestani

        Issue Links

          Activity

          Hide
          Vinicius Barros added a comment -

          The patch available includes week-8.patch attached to LUCENE-1768.

          Open ranges now work on flexible standard query parser.

          Mixed include and exclude bounds are now supported on flexible standard query parser.

          Added junits to cover both problems.

          Added the same junit tests to classic query parser.

          I found something while creating junits for the mixed include and exclude bounds. [1 TO } will always return nothing. It seems the exclusive operator there will take precedence over the inclusive operator. I double checked, and the NumericRangeQuery is correct, so I suppose this is the expected behavour of NumericRangeQuery.

          Show
          Vinicius Barros added a comment - The patch available includes week-8.patch attached to LUCENE-1768 . Open ranges now work on flexible standard query parser. Mixed include and exclude bounds are now supported on flexible standard query parser. Added junits to cover both problems. Added the same junit tests to classic query parser. I found something while creating junits for the mixed include and exclude bounds. [1 TO } will always return nothing. It seems the exclusive operator there will take precedence over the inclusive operator. I double checked, and the NumericRangeQuery is correct, so I suppose this is the expected behavour of NumericRangeQuery.
          Hide
          Uwe Schindler added a comment -

          I found something while creating junits for the mixed include and exclude bounds. [1 TO } will always return nothing. It seems the exclusive operator there will take precedence over the inclusive operator. I double checked, and the NumericRangeQuery is correct, so I suppose this is the expected behavour of NumericRangeQuery.

          There is no such limitation in NRQ, can you show the issue with a NRQ-specific test case? Maybe its a problem of the test setup?

          I will for now commit the javadoc improvements and post a delta patch.

          Show
          Uwe Schindler added a comment - I found something while creating junits for the mixed include and exclude bounds. [1 TO } will always return nothing. It seems the exclusive operator there will take precedence over the inclusive operator. I double checked, and the NumericRangeQuery is correct, so I suppose this is the expected behavour of NumericRangeQuery. There is no such limitation in NRQ, can you show the issue with a NRQ-specific test case? Maybe its a problem of the test setup? I will for now commit the javadoc improvements and post a delta patch.
          Hide
          Uwe Schindler added a comment -

          Here the merged patch.

          Show
          Uwe Schindler added a comment - Here the merged patch.
          Hide
          Uwe Schindler added a comment -

          Hi Vinicius, can you check, that the merged patch is fine?

          I have one question about the regened files: surround was not touched, so maybe it can be reverted? Which version of javacc did you use, because the files look different and seem to produce more warnings.

          Maybe we should check which versions are available and used and document this somewhere.

          Show
          Uwe Schindler added a comment - Hi Vinicius, can you check, that the merged patch is fine? I have one question about the regened files: surround was not touched, so maybe it can be reverted? Which version of javacc did you use, because the files look different and seem to produce more warnings. Maybe we should check which versions are available and used and document this somewhere.
          Hide
          Uwe Schindler added a comment -

          Hi Vinicius,

          I attached the same patch, with changes in surround QP reverted.

          Show
          Uwe Schindler added a comment - Hi Vinicius, I attached the same patch, with changes in surround QP reverted.
          Hide
          Vinicius Barros added a comment -

          Hi Uwe,

          I used javacc-4.1. When I ran ant javacc by the first time and it didn't find it, it told me I should use javacc-4.1, so I installed it.

          Show
          Vinicius Barros added a comment - Hi Uwe, I used javacc-4.1. When I ran ant javacc by the first time and it didn't find it, it told me I should use javacc-4.1, so I installed it.
          Hide
          Vinicius Barros added a comment -

          hi Uwe,

          I changed the patch to make the junit fail and you see the problem with [x TO x} matching nothing.

          Show
          Vinicius Barros added a comment - hi Uwe, I changed the patch to make the junit fail and you see the problem with [x TO x} matching nothing.
          Hide
          Uwe Schindler added a comment -

          OK, now I understand:

          I changed the patch to make the junit fail and you see the problem with [x TO x} matching nothing.

          It seems that somehow the formatting of your original comment in JIRA was broken, I misunderstood and thought you mean half open ranges like: [x TO *] behaving different like [x TO *}, which is not the case (i am glad).

          This of course returns no results, because the upper bound explicitely excludes the lower bound, so its in fact a query where upper<lower bound, so returns nothing, this is not bug or limitation, its in my opinion correct:

          assertRangeQuery(NumberType.NEGATIVE, NumberType.NEGATIVE, false, true, 1);

          I would change that line to assert 0 results like in the next test with (true, false).

          Show
          Uwe Schindler added a comment - OK, now I understand: I changed the patch to make the junit fail and you see the problem with [x TO x} matching nothing. It seems that somehow the formatting of your original comment in JIRA was broken, I misunderstood and thought you mean half open ranges like: [x TO *] behaving different like [x TO *}, which is not the case (i am glad). This of course returns no results, because the upper bound explicitely excludes the lower bound, so its in fact a query where upper<lower bound, so returns nothing, this is not bug or limitation, its in my opinion correct: assertRangeQuery(NumberType.NEGATIVE, NumberType.NEGATIVE, false, true, 1); I would change that line to assert 0 results like in the next test with (true, false).
          Hide
          Uwe Schindler added a comment -

          I used javacc-4.1. When I ran ant javacc by the first time and it didn't find it, it told me I should use javacc-4.1, so I installed it.

          Thats fine, I just was confused about some changes, but this was caused by regenerating surround QP which i reverted.

          We may update once to a later version of javacc, but it still is not full Java5 generics compatible, so the warnings are expected, so for full compatibility we shoudl stay with 4.1 for now. I am thinking about a way to disable those for generated classes (using the regexes to add SuppressWarnings(

          {"unchecked","rawtypes"}

          ) in the build.xml).

          Show
          Uwe Schindler added a comment - I used javacc-4.1. When I ran ant javacc by the first time and it didn't find it, it told me I should use javacc-4.1, so I installed it. Thats fine, I just was confused about some changes, but this was caused by regenerating surround QP which i reverted. We may update once to a later version of javacc, but it still is not full Java5 generics compatible, so the warnings are expected, so for full compatibility we shoudl stay with 4.1 for now. I am thinking about a way to disable those for generated classes (using the regexes to add SuppressWarnings( {"unchecked","rawtypes"} ) in the build.xml).
          Hide
          Adriano Crestani added a comment -

          Hi, I just reviewed the patch and it looks good.

          However, I have one question. It seems (haven't tested yet) that the user can enter [* TO *], should we support it? If we support it, it basically means match all terms for that field. Isn't there any other query we could use to match all fields like (field:*)?

          I personally vote for disallowing this kind of range query.

          Thoughts?

          Show
          Adriano Crestani added a comment - Hi, I just reviewed the patch and it looks good. However, I have one question. It seems (haven't tested yet) that the user can enter [* TO *] , should we support it? If we support it, it basically means match all terms for that field. Isn't there any other query we could use to match all fields like (field:*)? I personally vote for disallowing this kind of range query. Thoughts?
          Hide
          Uwe Schindler added a comment -

          Solr uses exactly this range query to achieve this effect. And its different than MatchAllDocsQuery!

          For numeric queries its no problem at all, for TermRange of course it is slow

          Show
          Uwe Schindler added a comment - Solr uses exactly this range query to achieve this effect. And its different than MatchAllDocsQuery! For numeric queries its no problem at all, for TermRange of course it is slow
          Hide
          Adriano Crestani added a comment -

          Yes, it's different than MatchAllDocsQuery.

          Does WildcardQuery("field", "*",...) work the same way as TermRangeQuery("field", null, null,...)? I mean, do they have the same performance?

          I just tested classic qp and it does support [* TO *], so I think it's fine if we support the same on standard query parser. Just wondering weather we could somehow optimize it.

          Show
          Adriano Crestani added a comment - Yes, it's different than MatchAllDocsQuery. Does WildcardQuery("field", "*",...) work the same way as TermRangeQuery("field", null, null,...)? I mean, do they have the same performance? I just tested classic qp and it does support [* TO *] , so I think it's fine if we support the same on standard query parser. Just wondering weather we could somehow optimize it.
          Hide
          Mike Sokolov added a comment -

          I wondered that too, so I checked. Both AutomatonQuery and TermRangeQuery special-case these to generate a simple TermsEnum, so the same. It makes sense to optimize: this query could easily be generated by some app.

          Show
          Mike Sokolov added a comment - I wondered that too, so I checked. Both AutomatonQuery and TermRangeQuery special-case these to generate a simple TermsEnum, so the same. It makes sense to optimize: this query could easily be generated by some app.
          Hide
          Adriano Crestani added a comment -

          Thanks for checking that Mike. I took a look at the code as well, makes more sense now. So no more extra optimization seems to be required then.

          Show
          Adriano Crestani added a comment - Thanks for checking that Mike. I took a look at the code as well, makes more sense now. So no more extra optimization seems to be required then.
          Hide
          Adriano Crestani added a comment -

          I merged this patch with LUCENE-3343, since this last one had a bug that the first one fixes.

          • LUCENE-3343 only was making TestQPHelper fail, since it was passing null to ParametricQueryNode. Instead, it's now passing "*", which is later being treated by OpenRangeQueryNodeProcessor (from LUCENE-3338).
          • I added extra tests to TestNumericQueryParser and TestQPHelper to make sure we cover all possible cases.

          I haven't changed changes.txt yet, since we are not sure whether these changes will be pushed to 3.x, it's my next task, to investigate that.

          I will wait one or two days before committing this patch.

          Uwe, raise your hand if you want me to hold the commit and sorry for taking over your JIRA

          Show
          Adriano Crestani added a comment - I merged this patch with LUCENE-3343 , since this last one had a bug that the first one fixes. LUCENE-3343 only was making TestQPHelper fail, since it was passing null to ParametricQueryNode. Instead, it's now passing "*", which is later being treated by OpenRangeQueryNodeProcessor (from LUCENE-3338 ). I added extra tests to TestNumericQueryParser and TestQPHelper to make sure we cover all possible cases. I haven't changed changes.txt yet, since we are not sure whether these changes will be pushed to 3.x, it's my next task, to investigate that. I will wait one or two days before committing this patch. Uwe, raise your hand if you want me to hold the commit and sorry for taking over your JIRA
          Hide
          Uwe Schindler added a comment -

          I wondered that too, so I checked. Both AutomatonQuery and TermRangeQuery special-case these to generate a simple TermsEnum, so the same. It makes sense to optimize: this query could easily be generated by some app.

          And for NumericRangeQuery this is even faster, as it will generate a query that only enumerates the lowest precision terms (all of them). For e.g. precisionStep 4 this would be 16 terms only So no such optimization is needed for NRQ.

          Uwe, raise your hand if you want me to hold the commit and sorry for taking over your JIRA

          It's all fine to combine both commits! I had not looked in 3343 because I thought it's for the good old default QP... We should add changes at the end!

          Show
          Uwe Schindler added a comment - I wondered that too, so I checked. Both AutomatonQuery and TermRangeQuery special-case these to generate a simple TermsEnum, so the same. It makes sense to optimize: this query could easily be generated by some app. And for NumericRangeQuery this is even faster, as it will generate a query that only enumerates the lowest precision terms (all of them). For e.g. precisionStep 4 this would be 16 terms only So no such optimization is needed for NRQ. Uwe, raise your hand if you want me to hold the commit and sorry for taking over your JIRA It's all fine to combine both commits! I had not looked in 3343 because I thought it's for the good old default QP... We should add changes at the end!
          Hide
          Adriano Crestani added a comment -

          The last patch was just committed to the repository (rev 1152892)

          Show
          Adriano Crestani added a comment - The last patch was just committed to the repository (rev 1152892)

            People

            • Assignee:
              Adriano Crestani
              Reporter:
              Vinicius Barros
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development