Lucene - Core
  1. Lucene - Core
  2. LUCENE-2541

NumericRangeQuery errors with endpoints near long min and max values

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 2.9
    • Fix Version/s: 2.9.4, 3.0.3, 3.1, 4.0-ALPHA
    • Component/s: core/search
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available
    1. LUCENE-2541.patch
      7 kB
      Yonik Seeley
    2. LUCENE-2541.patch
      0.9 kB
      Yonik Seeley
    3. LUCENE-2541-29.patch
      17 kB
      Uwe Schindler
    4. LUCENE-2541-30.patch
      17 kB
      Uwe Schindler
    5. LUCENE-2541-trunk+3x.patch
      18 kB
      Uwe Schindler
    6. LUCENE-2541-uwe.patch
      7 kB
      Uwe Schindler
    7. LUCENE-2541-uwe.patch
      3 kB
      Uwe Schindler
    8. LUCENE-2541-uwe-BigDecimal.patch
      21 kB
      Uwe Schindler
    9. LUCENE-2541-uwe-longs.patch
      17 kB
      Uwe Schindler
    10. TestNumericRangeQuery.java
      2 kB
      Koji Sekiguchi

      Activity

      Koji Sekiguchi created issue -
      Hide
      Koji Sekiguchi added a comment -

      I attached the test case. I expect totalHits is 1, but it is 3.

      Show
      Koji Sekiguchi added a comment - I attached the test case. I expect totalHits is 1, but it is 3.
      Koji Sekiguchi made changes -
      Field Original Value New Value
      Attachment TestNumericRangeQuery.java [ 12449654 ]
      Hide
      Yonik Seeley added a comment -

      Ouch. I just tried Solr 1.4.1, and this shows a bug there too... so this bug would seem to go back to the 2.9 line

      Show
      Yonik Seeley added a comment - Ouch. I just tried Solr 1.4.1, and this shows a bug there too... so this bug would seem to go back to the 2.9 line
      Hide
      Yonik Seeley added a comment -

      The bug is due to an overflow in splitRange.
      This patch seems to fix the issue (it would happen with very small or very large numbers).

      Show
      Yonik Seeley added a comment - The bug is due to an overflow in splitRange. This patch seems to fix the issue (it would happen with very small or very large numbers).
      Yonik Seeley made changes -
      Attachment LUCENE-2541.patch [ 12449665 ]
      Hide
      Yonik Seeley added a comment -

      It's difficult to nail down exactly when things fail (in order to describe it in CHANGES.txt) since overflow can occur on the last shift, but the code stops on the last shift regardless (the "shift+precisionStep>=valSize" test). So a lower bound of 1 with an upper bound of MAX_VALUE will overflow but still work, but a lower bound greater than MAX_VALUE-(1L<<60)+2 will fail for example.

      Show
      Yonik Seeley added a comment - It's difficult to nail down exactly when things fail (in order to describe it in CHANGES.txt) since overflow can occur on the last shift, but the code stops on the last shift regardless (the "shift+precisionStep>=valSize" test). So a lower bound of 1 with an upper bound of MAX_VALUE will overflow but still work, but a lower bound greater than MAX_VALUE-(1L<<60)+2 will fail for example.
      Hide
      Yonik Seeley added a comment -

      I'm currently working on putting together a random test to catch this.

      Show
      Yonik Seeley added a comment - I'm currently working on putting together a random test to catch this.
      Hide
      Uwe Schindler added a comment -

      When I am awake from my Tokyo trip, I will take care... Give me a few hours.

      Show
      Uwe Schindler added a comment - When I am awake from my Tokyo trip, I will take care... Give me a few hours.
      Hide
      Yonik Seeley added a comment -

      OK, here's an updated patch with random tests that consistently fail w/o checking for both underflow and overflow.

      Show
      Yonik Seeley added a comment - OK, here's an updated patch with random tests that consistently fail w/o checking for both underflow and overflow.
      Yonik Seeley made changes -
      Attachment LUCENE-2541.patch [ 12449695 ]
      Yonik Seeley made changes -
      Summary range query Long.MAX_VALUE to Long.MAX_VALUE on trie long field returns all docs in index NumericRangeQuery errors with endpoints near long min and max values
      Affects Version/s 2.9 [ 12312682 ]
      Hide
      Uwe Schindler added a comment -

      Yonik: there is a Test framework in TestNumericUtils, that asserts the correctness of ranges. It just tests no random ranges with large/low ranges. I will update the patch to use the framework.

      Else your test should be in TestNumericRangeQuery64

      Show
      Uwe Schindler added a comment - Yonik: there is a Test framework in TestNumericUtils, that asserts the correctness of ranges. It just tests no random ranges with large/low ranges. I will update the patch to use the framework. Else your test should be in TestNumericRangeQuery64
      Uwe Schindler made changes -
      Assignee Uwe Schindler [ thetaphi ]
      Hide
      Uwe Schindler added a comment -

      The explanation (for changes):
      The bug happens when the range covers abs(bounds)>2^31 and both bounds are close together in the same range bracket (as Mike calls it). The problem is, as yonik corrected, the exit condition, because the <-comparision is invalid for all these numbers, because their signed longs are compares signed not unsigned.

      Please let me also confirm that the solution is correct and embed the test in my splitRange tests in TestNumericUtils. The bug is not inside NRQ, its the underlying bit magic, so it should be added to TestNumericUtils.

      Show
      Uwe Schindler added a comment - The explanation (for changes): The bug happens when the range covers abs(bounds)>2^31 and both bounds are close together in the same range bracket (as Mike calls it). The problem is, as yonik corrected, the exit condition, because the <-comparision is invalid for all these numbers, because their signed longs are compares signed not unsigned. Please let me also confirm that the solution is correct and embed the test in my splitRange tests in TestNumericUtils. The bug is not inside NRQ, its the underlying bit magic, so it should be added to TestNumericUtils.
      Hide
      Yonik Seeley added a comment -

      The bug happens when the range covers abs(bounds)>2^31 and both bounds are close together in the same range bracket (as Mike calls it).

      Do you mean 2^63 (we're talking longs here)?

      Anyway, it's much more complex than that and depends on the precision step also. Numbers less than 2^63 (and with a distance between them less than 2^63) can still cause failures.

      Show
      Yonik Seeley added a comment - The bug happens when the range covers abs(bounds)>2^31 and both bounds are close together in the same range bracket (as Mike calls it). Do you mean 2^63 (we're talking longs here)? Anyway, it's much more complex than that and depends on the precision step also. Numbers less than 2^63 (and with a distance between them less than 2^63) can still cause failures.
      Hide
      Uwe Schindler added a comment -

      Here my solution (its yours), but different test. The tests in NumericRangeQuery also verify that the range is really covered completely, so I tend to use these tests.

      I just tried some values, we have to add randomness and other precsteps. Its just a start before going to sleep - will extend tests tomorrow, to test different precSteps.

      Yonik: The bug appears with 2^63 and, yes, it depends on precStep. What I meant is, that the outer bounds must be in the "bracket" that covers the 2^63 limit. So with larger precision steps, also smaller values overflow (but only for longs, ints are safe, as they internally also use longs).

      Show
      Uwe Schindler added a comment - Here my solution (its yours), but different test. The tests in NumericRangeQuery also verify that the range is really covered completely, so I tend to use these tests. I just tried some values, we have to add randomness and other precsteps. Its just a start before going to sleep - will extend tests tomorrow, to test different precSteps. Yonik: The bug appears with 2^63 and, yes, it depends on precStep. What I meant is, that the outer bounds must be in the "bracket" that covers the 2^63 limit. So with larger precision steps, also smaller values overflow (but only for longs, ints are safe, as they internally also use longs).
      Uwe Schindler made changes -
      Attachment LUCENE-2541-uwe.patch [ 12449707 ]
      Hide
      Uwe Schindler added a comment -

      After going to sleep I found out the following (in my dreams...):

      Your patch fixes the bug but it breaks NRQ for such bounds. Whenever the "+/- diff" operation overflows the long, it stops creating new sub ranges. This leads to the fact that NRQ will enumerate all terms and not use lower precisions! So as soon as the min or max bound is near +/- MAX_VALUE, the NRQ behaves like TRQ - bummer.

      The easy solution would be to use BigIntegers with 65 bits Maybe thats the easiest solution to correctly calculate this crazy stuff.

      Show
      Uwe Schindler added a comment - After going to sleep I found out the following (in my dreams...): Your patch fixes the bug but it breaks NRQ for such bounds. Whenever the "+/- diff" operation overflows the long, it stops creating new sub ranges. This leads to the fact that NRQ will enumerate all terms and not use lower precisions! So as soon as the min or max bound is near +/- MAX_VALUE, the NRQ behaves like TRQ - bummer. The easy solution would be to use BigIntegers with 65 bits Maybe thats the easiest solution to correctly calculate this crazy stuff.
      Hide
      Yonik Seeley added a comment -

      our patch fixes the bug but it breaks NRQ for such bounds. Whenever the "+/- diff" operation overflows the long, it stops creating new sub ranges.

      Hmmm, I had the same fear earlier, but then I discounted it.
      If nextMinBound wraps, isn't it the case that if we did use BigIntegers that nextMinBound would be greater than nextMaxBound anyway (and hence the "nextMinBound>nextMaxBound" clause would still stop creating subranges)?

      Show
      Yonik Seeley added a comment - our patch fixes the bug but it breaks NRQ for such bounds. Whenever the "+/- diff" operation overflows the long, it stops creating new sub ranges. Hmmm, I had the same fear earlier, but then I discounted it. If nextMinBound wraps, isn't it the case that if we did use BigIntegers that nextMinBound would be greater than nextMaxBound anyway (and hence the "nextMinBound>nextMaxBound" clause would still stop creating subranges)?
      Hide
      Uwe Schindler added a comment -

      Hi Yonik,

      here the biginteger variant that works exactly as the old long-based variant (without your extra checks). This is my favourite solution as it never overflows and we can maybe extend the whole NumericRange to BigInteger To reduce object creation, it uses a precalculated (1<<shift) array. I had to change the logic a little bit as it was broken with precStep=Integer.MAX_VALUE (OOME).

      The tests pass, I did not test your additional test, but I think it should too.

      I will test some range splits tomorrow on the solution you found but as far as i see without trying out, it seem to not work.

      Show
      Uwe Schindler added a comment - Hi Yonik, here the biginteger variant that works exactly as the old long-based variant (without your extra checks). This is my favourite solution as it never overflows and we can maybe extend the whole NumericRange to BigInteger To reduce object creation, it uses a precalculated (1<<shift) array. I had to change the logic a little bit as it was broken with precStep=Integer.MAX_VALUE (OOME). The tests pass, I did not test your additional test, but I think it should too. I will test some range splits tomorrow on the solution you found but as far as i see without trying out, it seem to not work.
      Uwe Schindler made changes -
      Attachment LUCENE-2541-uwe.patch [ 12449724 ]
      Hide
      Uwe Schindler added a comment -

      Missed one shift...

      Show
      Uwe Schindler added a comment - Missed one shift...
      Uwe Schindler made changes -
      Attachment LUCENE-2541-uwe.patch [ 12449727 ]
      Uwe Schindler made changes -
      Attachment LUCENE-2541-uwe.patch [ 12449724 ]
      Hide
      Yonik Seeley added a comment -

      Uwe I & discussed some on IRC, but for others following along, I believe the original patch (checking for overflow) will generate the same ranges as BigIneger use would.

      Here's the logic for the overflow case: nextMaxBound always either stays the same or decreases.
      If nextMinBound does overflow a long, then if we had been using BigInts, it would be bigger than MAX_LONG, and thus the "nextMinBound>nextMaxBound" condition would be satisfied to break out of the loop. Thus, "nextMinBound>nextMaxBound" using Bigints, is equivalent to "nextMinBound>nextMaxBound || lowerWrapped" using longs.

      The same logic applies to the lower bound.

      Show
      Yonik Seeley added a comment - Uwe I & discussed some on IRC, but for others following along, I believe the original patch (checking for overflow) will generate the same ranges as BigIneger use would. Here's the logic for the overflow case: nextMaxBound always either stays the same or decreases. If nextMinBound does overflow a long, then if we had been using BigInts, it would be bigger than MAX_LONG, and thus the "nextMinBound>nextMaxBound" condition would be satisfied to break out of the loop. Thus, "nextMinBound>nextMaxBound" using Bigints, is equivalent to "nextMinBound>nextMaxBound || lowerWrapped" using longs. The same logic applies to the lower bound.
      Hide
      Uwe Schindler added a comment -

      Here again the BigDecimal patch with additional tests. When writing the tests, I verified, that also Yoniks patch works correct, although it looks more "hackish". The Bigdecimal approach looks in my eyes much better, the overhead is neglectible (the range split is in no inner loop and object creation is minimized by precalculated 1<<shifts and corresponding masks. This makes the code look much more easy (and its also more easy to understand ant all - I also added more comments).

      I also added a test for these extreme values and also to verify that the range split works as exspected (maybe also check shift values vor each bound by also adding to list - will post updated patch later).

      The problem happens when the following is true:

      • the range contains a minimum bound that is > (MAX_VALUE - 1L<<precisionStep)
      • the range contains a maximum bound that is < (MIN_VALUE + 1L<<precisionStep)

      This affects dates not really and also for doubles its only happening for values near +/-infinity when the lower bound is close to the +infinity or the upper bound is close to -infinity. The same for longs (as noted before).

      Show
      Uwe Schindler added a comment - Here again the BigDecimal patch with additional tests. When writing the tests, I verified, that also Yoniks patch works correct, although it looks more "hackish". The Bigdecimal approach looks in my eyes much better, the overhead is neglectible (the range split is in no inner loop and object creation is minimized by precalculated 1<<shifts and corresponding masks. This makes the code look much more easy (and its also more easy to understand ant all - I also added more comments). I also added a test for these extreme values and also to verify that the range split works as exspected (maybe also check shift values vor each bound by also adding to list - will post updated patch later). The problem happens when the following is true: the range contains a minimum bound that is > (MAX_VALUE - 1L<<precisionStep) the range contains a maximum bound that is < (MIN_VALUE + 1L<<precisionStep) This affects dates not really and also for doubles its only happening for values near +/-infinity when the lower bound is close to the +infinity or the upper bound is close to -infinity. The same for longs (as noted before).
      Uwe Schindler made changes -
      Attachment LUCENE-2541-uwe- BigDecimal.patch [ 12449752 ]
      Hide
      Uwe Schindler added a comment -

      Here two final variants:

      • one with BigDecimals and improved range spliut logic (I like this patch more!)
      • one with Yoniks fix

      Both patches contain the same tests that pass. I converted Yoniks test and added to TestNumericUtils, which tests more intensive, if the spliutted range is correct: I ensures that all sub range bounds are inside the whole range and that there are no holes in the range. It also checks (important for yoniks fix) that one example range is splitted into the correct sub-ranges. I removed inverse ranges from yoniks patch, as they can never break (NRQ and also splitRange exits early if lower > upper). Also to enable the tests for ranges without holes, the bitset size cannot grow unlimited, so the maximum length of the range is limited to 8192*1024.

      I still did not use autoboxing and varargs in the TestNumericUtils, as now this patch should apply easy to 2.9 (with minor changes), so the initial merging works. After committing this to all branches, i would update 3.x and trunk to use varargs and autoboxing, which makes the test more readable.

      Without NumericUtils fix, the new testcases both fail.

      I just want to know: Which patch do you prefer? Else this is ready to commit with the cleanup work in 3.x and trunk later.

      Show
      Uwe Schindler added a comment - Here two final variants: one with BigDecimals and improved range spliut logic (I like this patch more!) one with Yoniks fix Both patches contain the same tests that pass. I converted Yoniks test and added to TestNumericUtils, which tests more intensive, if the spliutted range is correct: I ensures that all sub range bounds are inside the whole range and that there are no holes in the range. It also checks (important for yoniks fix) that one example range is splitted into the correct sub-ranges. I removed inverse ranges from yoniks patch, as they can never break (NRQ and also splitRange exits early if lower > upper). Also to enable the tests for ranges without holes, the bitset size cannot grow unlimited, so the maximum length of the range is limited to 8192*1024. I still did not use autoboxing and varargs in the TestNumericUtils, as now this patch should apply easy to 2.9 (with minor changes), so the initial merging works. After committing this to all branches, i would update 3.x and trunk to use varargs and autoboxing, which makes the test more readable. Without NumericUtils fix, the new testcases both fail. I just want to know: Which patch do you prefer? Else this is ready to commit with the cleanup work in 3.x and trunk later.
      Uwe Schindler made changes -
      Attachment LUCENE-2541-uwe-BigDecimal.patch [ 12449756 ]
      Attachment LUCENE-2541-uwe-longs.patch [ 12449757 ]
      Uwe Schindler made changes -
      Attachment LUCENE-2541-uwe- BigDecimal.patch [ 12449752 ]
      Uwe Schindler made changes -
      Fix Version/s 2.9.4 [ 12315148 ]
      Fix Version/s 3.0.3 [ 12315147 ]
      Fix Version/s 3.1 [ 12314822 ]
      Fix Version/s 4.0 [ 12314025 ]
      Lucene Fields [New] [New, Patch Available]
      Hide
      Yonik Seeley added a comment -

      Which patch do you prefer?

      Look at the size difference in code changes... it's no contest

      I also did a quick performance test of splitRange with 1M random bounds.
      patch with longs: 250ms (and no garbage creation)
      patch with BigInts: 11714ms (and a lot of garbage creation)

      BigInts was 46x slower.

      Show
      Yonik Seeley added a comment - Which patch do you prefer? Look at the size difference in code changes... it's no contest I also did a quick performance test of splitRange with 1M random bounds. patch with longs: 250ms (and no garbage creation) patch with BigInts: 11714ms (and a lot of garbage creation) BigInts was 46x slower.
      Hide
      Uwe Schindler added a comment -

      BigInts was 46x slower.

      You convinced me I go with your fix - its verified to work correct, thanks!

      Show
      Uwe Schindler added a comment - BigInts was 46x slower. You convinced me I go with your fix - its verified to work correct, thanks!
      Hide
      Uwe Schindler added a comment -

      I will commit the long variant soon to trunk and merge 3x, then backport to 3.0 and 2.9.

      After that i optimize the test in trunk/3x for varargs and autoboxing, which makes it more readable.

      Show
      Uwe Schindler added a comment - I will commit the long variant soon to trunk and merge 3x, then backport to 3.0 and 2.9. After that i optimize the test in trunk/3x for varargs and autoboxing, which makes it more readable.
      Uwe Schindler made changes -
      Attachment LUCENE-2541-trunk+3x.patch [ 12449759 ]
      Hide
      Uwe Schindler added a comment -

      Committed trunk revision: 965103
      Committed 3.x branch revision: 965104
      Committed 3.0 branch revision: 965105
      Committed 2.9 branch revision: 965107

      Thanks Yonik! This bug was really crazy and is severe (maybe rectifies a 2.9.4 / 3.0.3).

      Show
      Uwe Schindler added a comment - Committed trunk revision: 965103 Committed 3.x branch revision: 965104 Committed 3.0 branch revision: 965105 Committed 2.9 branch revision: 965107 Thanks Yonik! This bug was really crazy and is severe (maybe rectifies a 2.9.4 / 3.0.3).
      Hide
      Uwe Schindler added a comment -

      Committed test improvements in trunk revision: 965110
      Committed test improvements in branch 3x revision: 965111

      Show
      Uwe Schindler added a comment - Committed test improvements in trunk revision: 965110 Committed test improvements in branch 3x revision: 965111
      Uwe Schindler made changes -
      Status Open [ 1 ] Resolved [ 5 ]
      Resolution Fixed [ 1 ]
      Hide
      Uwe Schindler added a comment -

      For completeness I added the patch for 2.9 and 3.0, too. If we dont release a new bugfix version, somebody has the chance to patch his checkout!

      Show
      Uwe Schindler added a comment - For completeness I added the patch for 2.9 and 3.0, too. If we dont release a new bugfix version, somebody has the chance to patch his checkout!
      Uwe Schindler made changes -
      Attachment LUCENE-2541-29.patch [ 12449785 ]
      Attachment LUCENE-2541-30.patch [ 12449786 ]
      Uwe Schindler made changes -
      Status Resolved [ 5 ] Closed [ 6 ]
      Mark Thomas made changes -
      Workflow jira [ 12515845 ] Default workflow, editable Closed status [ 12564014 ]
      Mark Thomas made changes -
      Workflow Default workflow, editable Closed status [ 12564014 ] jira [ 12585487 ]

        People

        • Assignee:
          Uwe Schindler
          Reporter:
          Koji Sekiguchi
        • Votes:
          0 Vote for this issue
          Watchers:
          1 Start watching this issue

          Dates

          • Created:
            Updated:
            Resolved:

            Development