Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6780

GeoPointDistanceQuery doesn't work with a large radius?

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I'm working on LUCENE-6698 but struggling with test failures ...

      Then I noticed that TestGeoPointQuery's test never tests on large distances, so I modified the test to sometimes do so (like TestBKDTree) and hit test failures.

      1. LUCENE-6780.patch
        258 kB
        Nicholas Knize
      2. LUCENE-6780.patch
        161 kB
        Nicholas Knize
      3. LUCENE-6780.patch
        8 kB
        Nicholas Knize
      4. LUCENE-6780.patch
        8 kB
        Nicholas Knize
      5. LUCENE-6780.patch
        21 kB
        Nicholas Knize
      6. LUCENE-6780.patch
        16 kB
        Nicholas Knize
      7. LUCENE-6780.patch
        7 kB
        Nicholas Knize
      8. LUCENE-6780.patch
        23 kB
        Nicholas Knize
      9. LUCENE-6780.patch
        14 kB
        Michael McCandless
      10. LUCENE-6780.patch
        12 kB
        Nicholas Knize
      11. LUCENE-6780.patch
        4 kB
        Michael McCandless
      12. LUCENE-6780-heap-used-hack.patch
        4 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          mikemccand Michael McCandless added a comment -

          Patch, putting back the smallBBox boolean ...

          First problem is the test runs very slowly sometimes. I thought LUCENE-6712 was supposed to fix this (so we don't need to do LUCENE-6685)?

          Second problem is it then hits failures:

          [junit4:pickseed] Seed property 'tests.seed' already defined: EE938F778B784339
             [junit4] <JUnit4> says hallo! Master seed: EE938F778B784339
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(7661@localhost).
             [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
             [junit4]   1> T0: id=9 docID=9 lat=-76.34720742943381 lon=-36.15344492760056 deleted?=false expected=true but got false query=GeoPointDistanceQuery: field=geoField: Center: [-91.32459786462451,-62.68808104026591] Distance: 2652065.508673892 m Lower Left: [-135.1399256521194,-86.45202902120825] Upper Right: [-47.50927007712964,-38.847306191642886]
             [junit4]   2> 9月 03, 2015 11:39:46 下午 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
             [junit4]   2> java.lang.AssertionError: wrong hit
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([EE938F778B784339]:0)
             [junit4]   2> 	at org.junit.Assert.fail(Assert.java:93)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:570)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:511)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:402)
             [junit4]   2> 
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=EE938F778B784339 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=zh_HK -Dtests.timezone=Portugal -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   63.4s | TestGeoPointQuery.testRandomTiny <<<
          

          Is it known/expected that the math doesn't work for large distance queries?

          Finally, I noticed the randomized test is not testing "crosses dateline" cases .. is this supposed to work / known not to work? I can open a separate issue for that.

          Show
          mikemccand Michael McCandless added a comment - Patch, putting back the smallBBox boolean ... First problem is the test runs very slowly sometimes. I thought LUCENE-6712 was supposed to fix this (so we don't need to do LUCENE-6685 )? Second problem is it then hits failures: [junit4:pickseed] Seed property 'tests.seed' already defined: EE938F778B784339 [junit4] <JUnit4> says hallo! Master seed: EE938F778B784339 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(7661@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] 1> T0: id=9 docID=9 lat=-76.34720742943381 lon=-36.15344492760056 deleted?=false expected=true but got false query=GeoPointDistanceQuery: field=geoField: Center: [-91.32459786462451,-62.68808104026591] Distance: 2652065.508673892 m Lower Left: [-135.1399256521194,-86.45202902120825] Upper Right: [-47.50927007712964,-38.847306191642886] [junit4] 2> 9月 03, 2015 11:39:46 下午 com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] [junit4] 2> java.lang.AssertionError: wrong hit [junit4] 2> at __randomizedtesting.SeedInfo.seed([EE938F778B784339]:0) [junit4] 2> at org.junit.Assert.fail(Assert.java:93) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:570) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:511) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:402) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=EE938F778B784339 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=zh_HK -Dtests.timezone=Portugal -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 63.4s | TestGeoPointQuery.testRandomTiny <<< Is it known/expected that the math doesn't work for large distance queries? Finally, I noticed the randomized test is not testing "crosses dateline" cases .. is this supposed to work / known not to work? I can open a separate issue for that.
          Hide
          nknize Nicholas Knize added a comment -

          There's a small change to GeoPointTermsEnum in LUCENE-6777 that this depends upon.

          Apply LUCENE-6777.patch then LUCENE-6780.patch

          This patch also fixes an issue noted in LUCENE-6698 (a side effect of simplifying the cellCrossesCircle logic)

          Show
          nknize Nicholas Knize added a comment - There's a small change to GeoPointTermsEnum in LUCENE-6777 that this depends upon. Apply LUCENE-6777 .patch then LUCENE-6780 .patch This patch also fixes an issue noted in LUCENE-6698 (a side effect of simplifying the cellCrossesCircle logic)
          Hide
          mikemccand Michael McCandless added a comment -

          Can we fix closestPointOnBBox to return void, and requiring incoming arg is non-null? I think it makes the API confusing to both take as an argument, and return, the result.

          Can we expand this out to a full if statement?:

               @Override
          +    protected short computeMaxShift() {
          +      return (short)(GeoPointField.PRECISION_STEP * ((query.radius > 2000000) ? 5 : 4));
          +    }
          

          Maybe add javadoc to the new closestPointOnBBox and computeMaxShift?

          In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?

          A general geo API question: why do we sometimes use x/y (rMinX, rMinY) and other times use lon/lat (centerLon, centerLat)?

          It looks like your new patch lost my original patch (that just made the test more evil) ... when I apply both, I still see test failures, e.g.:

             [junit4] Started J0 PID(390@localhost).
             [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
             [junit4]   1> T0: id=3454 docID=3389 lat=16.958122319434622 lon=91.51666186085828 deleted?=false expected=true but got false query=GeoPointDistanceQuery: field=geoField: Center: [28.3906677508417,9.349277744701268] Distance: 6883636.144942287 m Lower Left: [-33.7590864185931,-52.77655846609843] Upper Right: [90.5404219202765,71.33132841950409]
             [junit4]   2> sep 10, 2015 4:07:41 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
             [junit4]   2> java.lang.AssertionError: wrong hit
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([94245EC72F3A129]:0)
             [junit4]   2> 	at org.junit.Assert.fail(Assert.java:93)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:572)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:513)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:406)
             [junit4]   2> 
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=94245EC72F3A129 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_MX -Dtests.timezone=Canada/Atlantic -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   0.78s | TestGeoPointQuery.testRandom <<<
          

          I'll attach the combined patch ...

          I think a simple random test could ferret out self-consistency bugs in these new geo APIs (similar to what we did on LUCENE-6699): make a random circle and random bbox, then ask the GeoUtil APIs what their relationship is, then randomly sample points from each and confirm no point ever violates the relationship ... I'll try to code this up.

          Show
          mikemccand Michael McCandless added a comment - Can we fix closestPointOnBBox to return void , and requiring incoming arg is non-null? I think it makes the API confusing to both take as an argument, and return, the result. Can we expand this out to a full if statement?: @Override + protected short computeMaxShift() { + return (short)(GeoPointField.PRECISION_STEP * ((query.radius > 2000000) ? 5 : 4)); + } Maybe add javadoc to the new closestPointOnBBox and computeMaxShift ? In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur? A general geo API question: why do we sometimes use x/y ( rMinX , rMinY ) and other times use lon/lat ( centerLon , centerLat )? It looks like your new patch lost my original patch (that just made the test more evil) ... when I apply both, I still see test failures, e.g.: [junit4] Started J0 PID(390@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] 1> T0: id=3454 docID=3389 lat=16.958122319434622 lon=91.51666186085828 deleted?=false expected=true but got false query=GeoPointDistanceQuery: field=geoField: Center: [28.3906677508417,9.349277744701268] Distance: 6883636.144942287 m Lower Left: [-33.7590864185931,-52.77655846609843] Upper Right: [90.5404219202765,71.33132841950409] [junit4] 2> sep 10, 2015 4:07:41 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] [junit4] 2> java.lang.AssertionError: wrong hit [junit4] 2> at __randomizedtesting.SeedInfo.seed([94245EC72F3A129]:0) [junit4] 2> at org.junit.Assert.fail(Assert.java:93) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:572) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:513) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:406) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=94245EC72F3A129 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_MX -Dtests.timezone=Canada/Atlantic -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.78s | TestGeoPointQuery.testRandom <<< I'll attach the combined patch ... I think a simple random test could ferret out self-consistency bugs in these new geo APIs (similar to what we did on LUCENE-6699 ): make a random circle and random bbox, then ask the GeoUtil APIs what their relationship is, then randomly sample points from each and confirm no point ever violates the relationship ... I'll try to code this up.
          Hide
          mikemccand Michael McCandless added a comment -

          Patch combining Nicholas Knize's last patch and the original "make test more evil" patch.

          Show
          mikemccand Michael McCandless added a comment - Patch combining Nicholas Knize 's last patch and the original "make test more evil" patch.
          Hide
          nknize Nicholas Knize added a comment -

          Looks like one of my patches stepped on the evil test (blame -> git apply ). I'll start with the failure first:

          when I apply both, I still see test failures,

          A quick investigation reveals the bounding box for the point-distance slightly off (as in upwards of a degree in both directions). At present there are 2 approaches to computing that bbox:

          1. use vincenty distance to compute a location along a range (distance) bearing (azimuth). Since vincenty can fail to converge on nearly anti-podal points (hence the need for iteration thresholds and the fudge factor whack-a-mole game as seen before) this can obviously be problematic for large distance queries.
          2. inverse haversine. USGS claims an average error of 22km over large distances, and this error certainly falls within that threshold (its 15km if interested).

          This fix can come in a few ways, 1. dynamically expand the computed bbox based on computed error (using distance as the independent variable). Maybe overkill? 2. add a static "fudge factor" of 1 degree to min/max lon/lat. This would probably need to be verified through some beasting.

          In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?

          The logic handles it. 0, 0 means closest point is the same as centerLon, centerLat - which is what it gets set to in the method. Though thanks for getting me to look at that closer. There's a superfluous logic block.

          A general geo API question: why do we sometimes use x/y (rMinX, rMinY) and other times use lon/lat (centerLon, centerLat)?

          Short answer: lazy inconsistencies. Longer answer: I like to use x/y when I'm either going to swap out with cartesian logic or I'm lazy typing. Since neither are good answers I agree it would be a good idea to refactor to lon/lat for geodesic and x/y cartesian.

          Good comments on the code cleanup.

          Show
          nknize Nicholas Knize added a comment - Looks like one of my patches stepped on the evil test (blame -> git apply ). I'll start with the failure first: when I apply both, I still see test failures, A quick investigation reveals the bounding box for the point-distance slightly off (as in upwards of a degree in both directions). At present there are 2 approaches to computing that bbox: 1. use vincenty distance to compute a location along a range (distance) bearing (azimuth). Since vincenty can fail to converge on nearly anti-podal points (hence the need for iteration thresholds and the fudge factor whack-a-mole game as seen before) this can obviously be problematic for large distance queries. 2. inverse haversine. USGS claims an average error of 22km over large distances, and this error certainly falls within that threshold (its 15km if interested). This fix can come in a few ways, 1. dynamically expand the computed bbox based on computed error (using distance as the independent variable). Maybe overkill? 2. add a static "fudge factor" of 1 degree to min/max lon/lat. This would probably need to be verified through some beasting. In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur? The logic handles it. 0, 0 means closest point is the same as centerLon, centerLat - which is what it gets set to in the method. Though thanks for getting me to look at that closer. There's a superfluous logic block. A general geo API question: why do we sometimes use x/y (rMinX, rMinY) and other times use lon/lat (centerLon, centerLat)? Short answer: lazy inconsistencies. Longer answer: I like to use x/y when I'm either going to swap out with cartesian logic or I'm lazy typing. Since neither are good answers I agree it would be a good idea to refactor to lon/lat for geodesic and x/y cartesian. Good comments on the code cleanup.
          Hide
          nknize Nicholas Knize added a comment -

          New patch that computes the BBox given the WGS84 ellipsoid. This skips all haversine and vincenty issues at the expense of a few extra trig functions (only used to compute bbox once).

          The patch also addresses other comments (e.g., exploding ternary operator, adding java doc). nocommits are left alone for time being.

          Show
          nknize Nicholas Knize added a comment - New patch that computes the BBox given the WGS84 ellipsoid. This skips all haversine and vincenty issues at the expense of a few extra trig functions (only used to compute bbox once). The patch also addresses other comments (e.g., exploding ternary operator, adding java doc). nocommits are left alone for time being.
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Nicholas Knize, but I'm having trouble applying the patch:

          patching file lucene/sandbox/src/java/org/apache/lucene/search/GeoPointTermsEnum.java
          Hunk #1 FAILED at 42.
          Hunk #2 FAILED at 102.
          Hunk #3 FAILED at 146.
          Hunk #4 FAILED at 157.
          Hunk #5 FAILED at 185.
          Hunk #6 FAILED at 206.
          6 out of 6 hunks FAILED -- saving rejects to file lucene/sandbox/src/java/org/apache/lucene/search/GeoPointTermsEnum.java.rej
          

          It looks like your dev area isn't up to date? (Missing LUCENE-6777 that I committed yesterday...)

          Show
          mikemccand Michael McCandless added a comment - Thanks Nicholas Knize , but I'm having trouble applying the patch: patching file lucene/sandbox/src/java/org/apache/lucene/search/GeoPointTermsEnum.java Hunk #1 FAILED at 42. Hunk #2 FAILED at 102. Hunk #3 FAILED at 146. Hunk #4 FAILED at 157. Hunk #5 FAILED at 185. Hunk #6 FAILED at 206. 6 out of 6 hunks FAILED -- saving rejects to file lucene/sandbox/src/java/org/apache/lucene/search/GeoPointTermsEnum.java.rej It looks like your dev area isn't up to date? (Missing LUCENE-6777 that I committed yesterday...)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1702470 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1702470 ]

          LUCENE-6780: make branch

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1702470 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1702470 ] LUCENE-6780 : make branch
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1702471 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1702471 ]

          LUCENE-6780: commit current patch

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1702471 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1702471 ] LUCENE-6780 : commit current patch
          Hide
          mikemccand Michael McCandless added a comment -

          OK I attempted to sync up the patch (parts of the GeoPointTermsEnum.java changes were new, others were old) and made some other small test changes, and then committed to a branch here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6780

          Let's iterate on this branch?

          I'll work on the new randomized test ...

          Show
          mikemccand Michael McCandless added a comment - OK I attempted to sync up the patch (parts of the GeoPointTermsEnum.java changes were new, others were old) and made some other small test changes, and then committed to a branch here: https://svn.apache.org/repos/asf/lucene/dev/branches/lucene6780 Let's iterate on this branch? I'll work on the new randomized test ...
          Hide
          mikemccand Michael McCandless added a comment -

          I'm seeing very slow query execution times, e.g. ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomMedium -Dtests.seed=A48E30E0280A6CAF takes ~60 seconds on my box, when the test normally takes 1-2 seconds with other seeds...

          Show
          mikemccand Michael McCandless added a comment - I'm seeing very slow query execution times, e.g. ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomMedium -Dtests.seed=A48E30E0280A6CAF takes ~60 seconds on my box, when the test normally takes 1-2 seconds with other seeds...
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1702476 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1702476 ]

          LUCENE-6780: remove wasted annots; add failing test

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1702476 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1702476 ] LUCENE-6780 : remove wasted annots; add failing test
          Hide
          mikemccand Michael McCandless added a comment -

          In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur?

          The logic handles it.

          I still think there's really a bug here I committed a failing test on the branch to TestGeoUtils.java showing it ... maybe my test case is buggy, but all I did was translate the bbox so that it's upper right corner landed on 0, 0 ...

          Show
          mikemccand Michael McCandless added a comment - In closestPointOnBBox should you maybe use Double.NaN as the marker value instead of 0.0 since 0.0 can legitimately occur? The logic handles it. I still think there's really a bug here I committed a failing test on the branch to TestGeoUtils.java showing it ... maybe my test case is buggy, but all I did was translate the bbox so that it's upper right corner landed on 0, 0 ...
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1702490 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1702490 ]

          LUCENE-6780: add randomized test; remove more annots; add some nocommits; rename some radius -> radiusMeters

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1702490 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1702490 ] LUCENE-6780 : add randomized test; remove more annots; add some nocommits; rename some radius -> radiusMeters
          Hide
          mikemccand Michael McCandless added a comment -

          OK I committed a new randomized test TestGeoUtils.testRandomRectsAndCircles ...

          It's a simple test: make a random rect and circle, use the GeoUtils API to see if they are disjoint or if the circle fully contains the rect (else, skip it), and then randomly pick points inside the rect and confirm they are either within or not within the circle.

          It sometimes fails, e.g.:

          ant test  -Dtestcase=TestGeoUtils -Dtests.method=testRandomRectsAndCircles -Dtests.seed=D4C2F5423D90BDD8 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=mt_MT -Dtests.timezone=Australia/Yancowinna -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true
          

          But it's entirely possible we need to relax the test for boundary conditions, however when I look at the GeoUtils impls, we seem to consistently use haversine for all the checks ... but maybe even so the math can produce false errors?

          Show
          mikemccand Michael McCandless added a comment - OK I committed a new randomized test TestGeoUtils.testRandomRectsAndCircles ... It's a simple test: make a random rect and circle, use the GeoUtils API to see if they are disjoint or if the circle fully contains the rect (else, skip it), and then randomly pick points inside the rect and confirm they are either within or not within the circle. It sometimes fails, e.g.: ant test -Dtestcase=TestGeoUtils -Dtests.method=testRandomRectsAndCircles -Dtests.seed=D4C2F5423D90BDD8 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=mt_MT -Dtests.timezone=Australia/Yancowinna -Dtests.asserts=true -Dtests.file.encoding=UTF-8 -Dtests.verbose=true But it's entirely possible we need to relax the test for boundary conditions, however when I look at the GeoUtils impls, we seem to consistently use haversine for all the checks ... but maybe even so the math can produce false errors?
          Hide
          nknize Nicholas Knize added a comment -

          Next iteration patch off the feature branch.

          OK I committed a new randomized test

          The test was buggy using maxLon where it expected minLon. I also updated it to handle the case where the circle is fully contained by the rectangle. All beasting passed.

          I'm seeing very slow query execution times

          I had noticed this before posting the initial patch and isolated one test that took ~45 secs to complete. The slow times were related to recomputing the bbox and high resolution ranges for every segment on very large distance queries. The new patch relaxes the distance criteria for the range resolution. There are still some boundary outliers that take ~15 seconds (instead of 60+) to complete. I can further improve this by optimizing the GeoPointDistanceQuery.computeBBox method... or, what are your thoughts on computing the BBox and reusing across segments??

          I still think there's really a bug here I committed a failing test on the branch

          bleh...indeed!! Fixed it up.

          Show
          nknize Nicholas Knize added a comment - Next iteration patch off the feature branch. OK I committed a new randomized test The test was buggy using maxLon where it expected minLon. I also updated it to handle the case where the circle is fully contained by the rectangle. All beasting passed. I'm seeing very slow query execution times I had noticed this before posting the initial patch and isolated one test that took ~45 secs to complete. The slow times were related to recomputing the bbox and high resolution ranges for every segment on very large distance queries. The new patch relaxes the distance criteria for the range resolution. There are still some boundary outliers that take ~15 seconds (instead of 60+) to complete. I can further improve this by optimizing the GeoPointDistanceQuery.computeBBox method... or, what are your thoughts on computing the BBox and reusing across segments?? I still think there's really a bug here I committed a failing test on the branch bleh...indeed!! Fixed it up.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703062 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703062 ]

          LUCENE-6780: fix test bug; fix bug in closestPoinOnBBox; add some nocommits

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703062 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703062 ] LUCENE-6780 : fix test bug; fix bug in closestPoinOnBBox; add some nocommits
          Hide
          mikemccand Michael McCandless added a comment -

          The test was buggy using maxLon where it expected minLon

          Duh! Thanks for fixing

          I committed the patch w/ some minor code style changes, but added some nocommits, e.g. I'm not sure how circleFullInside testing helps since it seems to always assert to true in that case.

          I'll beast the new test!

          Show
          mikemccand Michael McCandless added a comment - The test was buggy using maxLon where it expected minLon Duh! Thanks for fixing I committed the patch w/ some minor code style changes, but added some nocommits, e.g. I'm not sure how circleFullInside testing helps since it seems to always assert to true in that case. I'll beast the new test!
          Hide
          mikemccand Michael McCandless added a comment -

          The slow times were related to recomputing the bbox

          Wait: we only compute this (call GeoPointDistanceQuery.computeBBox) once for the query, on init, right?

          Show
          mikemccand Michael McCandless added a comment - The slow times were related to recomputing the bbox Wait: we only compute this (call GeoPointDistanceQuery.computeBBox ) once for the query, on init, right?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703281 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703281 ]

          LUCENE-6780: let random radius be big

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703281 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703281 ] LUCENE-6780 : let random radius be big
          Hide
          mikemccand Michael McCandless added a comment -

          I was working on getting BKDDistanceQuery working again, but hit failures with a large radius, so I also fixed TestGeoPointQuery to use a large radius (just committed) and it causes test failures, e.g.:

          [junit4:pickseed] Seed property 'tests.seed' already defined: 651AF6100346C910
             [junit4] <JUnit4> says g'day! Master seed: 651AF6100346C910
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(28673@localhost).
             [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
             [junit4]   1> T0: id=5 docID=5 lat=-30.70619555695309 lon=-119.07908745059306 deleted?=false expected=false but got true query=GeoPointDistanceQuery: field=geoField: Center: [119.51155579512795,-29.46313019984308] Distance: 9353340.676650032 meters]
             [junit4]   2> Zář 15, 2015 10:00:15 ODP. com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
             [junit4]   2> java.lang.AssertionError: wrong hit
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([651AF6100346C910]:0)
             [junit4]   2> 	at org.junit.Assert.fail(Assert.java:93)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:560)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:501)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:396)
             [junit4]   2> 
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=651AF6100346C910 -Dtests.multiplier=5 -Dtests.locale=cs -Dtests.timezone=Africa/Windhoek -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   0.24s | TestGeoPointQuery.testRandomTiny <<<
             [junit4]    > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=14, name=T0, state=RUNNABLE, group=TGRP-TestGeoPointQuery]
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([651AF6100346C910:2C5D28565D67F1BC]:0)
             [junit4]    > Caused by: java.lang.AssertionError: wrong hit
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([651AF6100346C910]:0)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:560)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:501)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:396)
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=LuceneVarGapFixedInterval), geoField=FST50}, docValues:{id=DocValuesFormat(name=Direct), geoField=DocValuesFormat(name=Lucene50)}, sim=RandomSimilarityProvider(queryNorm=false,coord=yes): {}, locale=cs, timezone=Africa/Windhoek
             [junit4]   2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=383358232,total=504889344
             [junit4]   2> NOTE: All tests run in this JVM: [TestGeoPointQuery]
             [junit4] Completed [1/1] in 0.57s, 1 test, 1 error <<< FAILURES!
             [junit4] 
             [junit4] 
             [junit4] Tests with failures:
             [junit4]   - org.apache.lucene.search.TestGeoPointQuery.testRandomTiny
          

          A very large radius (such that it can span the entire earth) should work?

          I'll also commit how far I got with the BKDDistanceQuery to the branch ...

          Show
          mikemccand Michael McCandless added a comment - I was working on getting BKDDistanceQuery working again, but hit failures with a large radius, so I also fixed TestGeoPointQuery to use a large radius (just committed) and it causes test failures, e.g.: [junit4:pickseed] Seed property 'tests.seed' already defined: 651AF6100346C910 [junit4] <JUnit4> says g'day! Master seed: 651AF6100346C910 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(28673@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] 1> T0: id=5 docID=5 lat=-30.70619555695309 lon=-119.07908745059306 deleted?=false expected=false but got true query=GeoPointDistanceQuery: field=geoField: Center: [119.51155579512795,-29.46313019984308] Distance: 9353340.676650032 meters] [junit4] 2> Zář 15, 2015 10:00:15 ODP. com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] [junit4] 2> java.lang.AssertionError: wrong hit [junit4] 2> at __randomizedtesting.SeedInfo.seed([651AF6100346C910]:0) [junit4] 2> at org.junit.Assert.fail(Assert.java:93) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:560) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:501) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:396) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=651AF6100346C910 -Dtests.multiplier=5 -Dtests.locale=cs -Dtests.timezone=Africa/Windhoek -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 0.24s | TestGeoPointQuery.testRandomTiny <<< [junit4] > Throwable #1: com.carrotsearch.randomizedtesting.UncaughtExceptionError: Captured an uncaught exception in thread: Thread[id=14, name=T0, state=RUNNABLE, group=TGRP-TestGeoPointQuery] [junit4] > at __randomizedtesting.SeedInfo.seed([651AF6100346C910:2C5D28565D67F1BC]:0) [junit4] > Caused by: java.lang.AssertionError: wrong hit [junit4] > at __randomizedtesting.SeedInfo.seed([651AF6100346C910]:0) [junit4] > at org.apache.lucene.search.TestGeoPointQuery$VerifyHits.test(TestGeoPointQuery.java:560) [junit4] > at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:501) [junit4] > at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:396) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=LuceneVarGapFixedInterval), geoField=FST50}, docValues:{id=DocValuesFormat(name=Direct), geoField=DocValuesFormat(name=Lucene50)}, sim=RandomSimilarityProvider(queryNorm=false,coord=yes): {}, locale=cs, timezone=Africa/Windhoek [junit4] 2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=383358232,total=504889344 [junit4] 2> NOTE: All tests run in this JVM: [TestGeoPointQuery] [junit4] Completed [1/1] in 0.57s, 1 test, 1 error <<< FAILURES! [junit4] [junit4] [junit4] Tests with failures: [junit4] - org.apache.lucene.search.TestGeoPointQuery.testRandomTiny A very large radius (such that it can span the entire earth) should work? I'll also commit how far I got with the BKDDistanceQuery to the branch ...
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703282 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703282 ]

          LUCENE-6698, LUCENE-6780: add BKDDistanceQuery

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703282 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703282 ] LUCENE-6698 , LUCENE-6780 : add BKDDistanceQuery
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703791 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703791 ]

          LUCENE-6780: BKDDistanceQuery must take min/maxLon/Lat into account in hashCode and equals else we get illegal collisions in caching

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703791 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703791 ] LUCENE-6780 : BKDDistanceQuery must take min/maxLon/Lat into account in hashCode and equals else we get illegal collisions in caching
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703792 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703792 ]

          LUCENE-6780: make tests more evil

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703792 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703792 ] LUCENE-6780 : make tests more evil
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703793 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703793 ]

          LUCENE-6780: code style / nocommits

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703793 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703793 ] LUCENE-6780 : code style / nocommits
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703794 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703794 ]

          LUCENE-6780: remove log line

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703794 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703794 ] LUCENE-6780 : remove log line
          Hide
          mikemccand Michael McCandless added a comment -

          I'm still struggling to get BKDDistanceQuery working here, even with the good GeoUtil fixes on this branch.

          So I switched back to making the GeoUtilsTest more evil:

          First, I noticed that TestGeoUtils was always using a tiny lat/lon area (1-3 degrees) and fixed that to sometimes be the full range, and then I improved the "self consistency" test (TestGeoUtils.testGeoRelations) to stress the GeoUtils APIs like BKD does, and now there are some fun failures, e.g.:

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=48F4A913CBA1095 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_NI -Dtests.timezone=America/Argentina/Salta -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 0.14s | TestGeoUtils.testGeoRelations <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: 243 incorrect hits (see above)
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([48F4A913CBA1095:C6AC5E24484D662B]:0)
             [junit4]    > 	at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:509)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
          

          If you run with -Dtests.verbose=true you see lots of diagnostics explaining which GeoUtils API went wrong... in this case, GeoUtils.rectWithinCircle returned true when it should have returned false. Now I need Nicholas Knize's help!

          Show
          mikemccand Michael McCandless added a comment - I'm still struggling to get BKDDistanceQuery working here, even with the good GeoUtil fixes on this branch. So I switched back to making the GeoUtilsTest more evil: First, I noticed that TestGeoUtils was always using a tiny lat/lon area (1-3 degrees) and fixed that to sometimes be the full range, and then I improved the "self consistency" test ( TestGeoUtils.testGeoRelations ) to stress the GeoUtils APIs like BKD does, and now there are some fun failures, e.g.: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=48F4A913CBA1095 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es_NI -Dtests.timezone=America/Argentina/Salta -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.14s | TestGeoUtils.testGeoRelations <<< [junit4] > Throwable #1: java.lang.AssertionError: 243 incorrect hits (see above) [junit4] > at __randomizedtesting.SeedInfo.seed([48F4A913CBA1095:C6AC5E24484D662B]:0) [junit4] > at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:509) [junit4] > at java.lang.Thread.run(Thread.java:745) If you run with -Dtests.verbose=true you see lots of diagnostics explaining which GeoUtils API went wrong... in this case, GeoUtils.rectWithinCircle returned true when it should have returned false. Now I need Nicholas Knize 's help!
          Hide
          mikemccand Michael McCandless added a comment -

          Also, the failures all seem related to using a big lon/lat area: if I hardwire the useSmallRanges to true, the test seems to pass for quite a while under beasting ...

          Show
          mikemccand Michael McCandless added a comment - Also, the failures all seem related to using a big lon/lat area: if I hardwire the useSmallRanges to true , the test seems to pass for quite a while under beasting ...
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703876 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703876 ]

          LUCENE-6780: remove dup code, cleanup nocommits

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703876 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703876 ] LUCENE-6780 : remove dup code, cleanup nocommits
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703879 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703879 ]

          LUCENE-6780: move GeoBBox -> GeoRect; turn off BKD verbosity

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703879 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703879 ] LUCENE-6780 : move GeoBBox -> GeoRect; turn off BKD verbosity
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1703881 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1703881 ]

          LUCENE-6780: merge trunk

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1703881 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1703881 ] LUCENE-6780 : merge trunk
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1704496 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1704496 ]

          LUCENE-6780: factor out base geo point test class

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1704496 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1704496 ] LUCENE-6780 : factor out base geo point test class
          Hide
          mikemccand Michael McCandless added a comment -

          I factored out a common test base class, for TestBKDTree and TestGeoPointQuery, and in the process found a test bug in TestGeoPointQuery.bboxQueryCanBeWrong where it was being way too lenient in returning true.

          I fixed that to instead check whether GeoUtils.compare returns 0.0 for any of the rect's boundaries, but now there are new test failures, e.g.:

             [junit4] Started J0 PID(11346@localhost).
             [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
             [junit4]   1> TEST-TestGeoPointQuery.testMultiValued-seed#[880EAE9C08D4DB54]: id=4591 docID=4591 should not match but did
             [junit4]   1>   rect=GeoRect(lon=-153.18749899012866 TO -152.61749242472143 lat=33.44312055800268 TO 33.95438073558091)
             [junit4]   1>   lat=32.553960682161375 lon=-154.2675789624302
             [junit4]   1>   lat=33.95438025524065 lon=-152.86846841320704
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testMultiValued -Dtests.seed=880EAE9C08D4DB54 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=nl_BE -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 10.7s | TestGeoPointQuery.testMultiValued <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: some hits were wrong
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([880EAE9C08D4DB54:5C2ECAAEC6169B1C]:0)
             [junit4]    > 	at org.apache.lucene.util.BaseGeoPointTestCase.testMultiValued(BaseGeoPointTestCase.java:280)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=LuceneVarGapFixedInterval), point=PostingsFormat(name=Asserting)}, docValues:{point=DocValuesFormat(name=Asserting)}, sim=ClassicSimilarity, locale=nl_BE, timezone=Europe/Tallinn
             [junit4]   2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=366930072,total=501219328
             [junit4]   2> NOTE: All tests run in this JVM: [TestGeoPointQuery]
             [junit4] Completed [1/1] in 11.10s, 1 test, 1 failure <<< FAILURES!
          

          The failure is clearly a boundary case, so maybe we just need a better "is boundary case" check?

          It's also entirely possible I created new exciting test bugs in the refactoring

          And we still have the test failures for both BKD and GeoPoint distance queries when distance is biggish...

          Show
          mikemccand Michael McCandless added a comment - I factored out a common test base class, for TestBKDTree and TestGeoPointQuery, and in the process found a test bug in TestGeoPointQuery.bboxQueryCanBeWrong where it was being way too lenient in returning true . I fixed that to instead check whether GeoUtils.compare returns 0.0 for any of the rect's boundaries, but now there are new test failures, e.g.: [junit4] Started J0 PID(11346@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] 1> TEST-TestGeoPointQuery.testMultiValued-seed#[880EAE9C08D4DB54]: id=4591 docID=4591 should not match but did [junit4] 1> rect=GeoRect(lon=-153.18749899012866 TO -152.61749242472143 lat=33.44312055800268 TO 33.95438073558091) [junit4] 1> lat=32.553960682161375 lon=-154.2675789624302 [junit4] 1> lat=33.95438025524065 lon=-152.86846841320704 [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testMultiValued -Dtests.seed=880EAE9C08D4DB54 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=nl_BE -Dtests.timezone=Europe/Tallinn -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 10.7s | TestGeoPointQuery.testMultiValued <<< [junit4] > Throwable #1: java.lang.AssertionError: some hits were wrong [junit4] > at __randomizedtesting.SeedInfo.seed([880EAE9C08D4DB54:5C2ECAAEC6169B1C]:0) [junit4] > at org.apache.lucene.util.BaseGeoPointTestCase.testMultiValued(BaseGeoPointTestCase.java:280) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=LuceneVarGapFixedInterval), point=PostingsFormat(name=Asserting)}, docValues:{point=DocValuesFormat(name=Asserting)}, sim=ClassicSimilarity, locale=nl_BE, timezone=Europe/Tallinn [junit4] 2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=366930072,total=501219328 [junit4] 2> NOTE: All tests run in this JVM: [TestGeoPointQuery] [junit4] Completed [1/1] in 11.10s, 1 test, 1 failure <<< FAILURES! The failure is clearly a boundary case, so maybe we just need a better "is boundary case" check? It's also entirely possible I created new exciting test bugs in the refactoring And we still have the test failures for both BKD and GeoPoint distance queries when distance is biggish...
          Hide
          mikemccand Michael McCandless added a comment -

          I was concerned about the peak heap used for "worst case" queries here, so I hacked up a quick patch (attached) to test this:

          GeoPointInBBoxQuery: field=point: Lower Left: [-170.79577068315723,-88.3524701239041] Upper Right: [115.75692731020496,51.78004322487766]
            --> 940,015 terms = 34,065,696 bytes
          
          GeoPointDistanceQuery: field=point: Center: [-95.87683480747508,-83.99672364681616] Distance: 826889.911703281 meters]
            --> 179,562 terms = 12,446,344 bytes
          

          The patch just records over time the largest number of terms created by the query, and then I ran the test for many iterations.

          I think this is too high, e.g. too many of these queries in flight at once can mean an unexpected OOME.

          But I think before we address this we should address the correctness issues (the failing seeds for TestGeoUtils.testGeoRelations).

          It could be that to fix these, we place soft limits on how large each query is allowed to be? Meaning, a user who's willing to have more error, willing to use more heap, can increase the limit if they want, but by default the limit protects the more common use case with smaller shapes.

          Show
          mikemccand Michael McCandless added a comment - I was concerned about the peak heap used for "worst case" queries here, so I hacked up a quick patch (attached) to test this: GeoPointInBBoxQuery: field=point: Lower Left: [-170.79577068315723,-88.3524701239041] Upper Right: [115.75692731020496,51.78004322487766] --> 940,015 terms = 34,065,696 bytes GeoPointDistanceQuery: field=point: Center: [-95.87683480747508,-83.99672364681616] Distance: 826889.911703281 meters] --> 179,562 terms = 12,446,344 bytes The patch just records over time the largest number of terms created by the query, and then I ran the test for many iterations. I think this is too high, e.g. too many of these queries in flight at once can mean an unexpected OOME. But I think before we address this we should address the correctness issues (the failing seeds for TestGeoUtils.testGeoRelations ). It could be that to fix these, we place soft limits on how large each query is allowed to be? Meaning, a user who's willing to have more error, willing to use more heap, can increase the limit if they want, but by default the limit protects the more common use case with smaller shapes.
          Hide
          nknize Nicholas Knize added a comment -

          The failure is clearly a boundary case, so maybe we just need a better "is boundary case" check?

          Definitely the situation with this failure. The new GeoUtils.compare method was returning null for these boundary points so the test was always setting expected = false. I've changed it to ignore those boundary test points that are within the accepted error tolerance.

          Show
          nknize Nicholas Knize added a comment - The failure is clearly a boundary case, so maybe we just need a better "is boundary case" check? Definitely the situation with this failure. The new GeoUtils.compare method was returning null for these boundary points so the test was always setting expected = false. I've changed it to ignore those boundary test points that are within the accepted error tolerance.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709065 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1709065 ]

          LUCENE-6780: fix a couple test bugs, remove nocommits, improve javadocs, cut all tests back to small==true

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709065 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1709065 ] LUCENE-6780 : fix a couple test bugs, remove nocommits, improve javadocs, cut all tests back to small==true
          Hide
          mikemccand Michael McCandless added a comment -

          OK I found a couple test bugs, which explained some of the failures.

          I also cut all "small" booleans back to true always, so we are only testing a region b/w 1-3 degrees in lat and lon.

          There is still a lurking boundary case failure, where a doc's lat/lon is on the edge of the bbox rect we query.

          And then things are still sometimes very very slow, even when restricting queries to small shapes, e.g. when I run ant test -Dtestcase=TestGeoPointQuery -Dtests.seed=0, I get these times:

          -test:
          [junit4:pickseed] Seed property 'tests.seed' already defined: 0
             [junit4] <JUnit4> says hello! Master seed: 0
             [junit4] Executing 1 suite with 1 JVM.
             [junit4] 
             [junit4] Started J0 PID(30211@localhost).
             [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
             [junit4] OK      0.01s | TestGeoPointQuery.testInvalidGeoDistanceQuery
             [junit4] OK      0.02s | TestGeoPointQuery.testBBoxQuery
             [junit4] OK      9.10s | TestGeoPointQuery.testMultiValued
             [junit4] OK      2.07s | TestGeoPointQuery.testRandomMedium
             [junit4] OK      0.01s | TestGeoPointQuery.testGeoDistanceQueryHuge
             [junit4] OK      1.05s | TestGeoPointQuery.testRandomTiny
             [junit4] OK      0.27s | TestGeoPointQuery.testWholeMap
             [junit4] OK      0.00s | TestGeoPointQuery.testRectCrossesCircle
             [junit4] OK      2.48s | TestGeoPointQuery.testAllLonEqual
             [junit4] OK      0.00s | TestGeoPointQuery.testMortonEncoding
             [junit4] OK      13.2s | TestGeoPointQuery.testSamePointManyTimes
             [junit4] OK      0.01s | TestGeoPointQuery.testGeoDistanceQueryCrossDateline
             [junit4] IGNOR/A 0.03s | TestGeoPointQuery.testRandomBig
             [junit4]    > Assumption #1: 'nightly' test group is disabled (@Nightly())
             [junit4] OK      0.00s | TestGeoPointQuery.testPacManPolyQuery
             [junit4] OK      0.01s | TestGeoPointQuery.testMultiValuedQuery
             [junit4] OK      0.00s | TestGeoPointQuery.testPolyQuery
             [junit4] OK      0.00s | TestGeoPointQuery.testGeoDistanceQuery
             [junit4] HEARTBEAT J0 PID(30211@localhost): 2015-10-16T15:07:46, stalled for 66.9s at: TestGeoPointQuery.testAllLatEqual
             [junit4] OK      78.3s | TestGeoPointQuery.testAllLatEqual
             [junit4] OK      0.00s | TestGeoPointQuery.testInvalidBBox
             [junit4] OK      0.00s | TestGeoPointQuery.testBBoxCrossDateline
             [junit4] Completed [1/1] in 106.94s, 20 tests, 1 skipped
             [junit4] 
             [junit4] JVM J0:     0.36 ..   107.82 =   107.47s
             [junit4] Execution time total: 1 minute 47 seconds
             [junit4] Tests summary: 1 suite, 20 tests, 1 ignored (1 assumption)
               [echo] 5 slowest tests:
          

          Why is testAllLatEqual so slow? This test case came from BKD's test (since we refactored to a common base test class) ... I just don't understand why it's so slow for geo point query.

          Show
          mikemccand Michael McCandless added a comment - OK I found a couple test bugs, which explained some of the failures. I also cut all "small" booleans back to true always, so we are only testing a region b/w 1-3 degrees in lat and lon. There is still a lurking boundary case failure, where a doc's lat/lon is on the edge of the bbox rect we query. And then things are still sometimes very very slow, even when restricting queries to small shapes, e.g. when I run ant test -Dtestcase=TestGeoPointQuery -Dtests.seed=0 , I get these times: -test: [junit4:pickseed] Seed property 'tests.seed' already defined: 0 [junit4] <JUnit4> says hello! Master seed: 0 [junit4] Executing 1 suite with 1 JVM. [junit4] [junit4] Started J0 PID(30211@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] OK 0.01s | TestGeoPointQuery.testInvalidGeoDistanceQuery [junit4] OK 0.02s | TestGeoPointQuery.testBBoxQuery [junit4] OK 9.10s | TestGeoPointQuery.testMultiValued [junit4] OK 2.07s | TestGeoPointQuery.testRandomMedium [junit4] OK 0.01s | TestGeoPointQuery.testGeoDistanceQueryHuge [junit4] OK 1.05s | TestGeoPointQuery.testRandomTiny [junit4] OK 0.27s | TestGeoPointQuery.testWholeMap [junit4] OK 0.00s | TestGeoPointQuery.testRectCrossesCircle [junit4] OK 2.48s | TestGeoPointQuery.testAllLonEqual [junit4] OK 0.00s | TestGeoPointQuery.testMortonEncoding [junit4] OK 13.2s | TestGeoPointQuery.testSamePointManyTimes [junit4] OK 0.01s | TestGeoPointQuery.testGeoDistanceQueryCrossDateline [junit4] IGNOR/A 0.03s | TestGeoPointQuery.testRandomBig [junit4] > Assumption #1: 'nightly' test group is disabled (@Nightly()) [junit4] OK 0.00s | TestGeoPointQuery.testPacManPolyQuery [junit4] OK 0.01s | TestGeoPointQuery.testMultiValuedQuery [junit4] OK 0.00s | TestGeoPointQuery.testPolyQuery [junit4] OK 0.00s | TestGeoPointQuery.testGeoDistanceQuery [junit4] HEARTBEAT J0 PID(30211@localhost): 2015-10-16T15:07:46, stalled for 66.9s at: TestGeoPointQuery.testAllLatEqual [junit4] OK 78.3s | TestGeoPointQuery.testAllLatEqual [junit4] OK 0.00s | TestGeoPointQuery.testInvalidBBox [junit4] OK 0.00s | TestGeoPointQuery.testBBoxCrossDateline [junit4] Completed [1/1] in 106.94s, 20 tests, 1 skipped [junit4] [junit4] JVM J0: 0.36 .. 107.82 = 107.47s [junit4] Execution time total: 1 minute 47 seconds [junit4] Tests summary: 1 suite, 20 tests, 1 ignored (1 assumption) [echo] 5 slowest tests: Why is testAllLatEqual so slow? This test case came from BKD's test (since we refactored to a common base test class) ... I just don't understand why it's so slow for geo point query.
          Hide
          nknize Nicholas Knize added a comment -

          Next WIP patch...

          This patch includes the following:

          1. Throws an IllegalArgumentException for PointDistanceQuery when the radius exceeds the maximum allowable distance (based on location on the earth)
          2. Reduces resolution for extreme GeoPointInBBoxQuery cases (based on diagonal distance of the bbox).

          Performance is much better (need to quantify through benchmarks) but there are still a few Test bugs that need to be ironed out (PointInPolygon tolerance, and catching illegal radius values for PointDistanceQuery)

          Show
          nknize Nicholas Knize added a comment - Next WIP patch... This patch includes the following: 1. Throws an IllegalArgumentException for PointDistanceQuery when the radius exceeds the maximum allowable distance (based on location on the earth) 2. Reduces resolution for extreme GeoPointInBBoxQuery cases (based on diagonal distance of the bbox). Performance is much better (need to quantify through benchmarks) but there are still a few Test bugs that need to be ironed out (PointInPolygon tolerance, and catching illegal radius values for PointDistanceQuery)
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709253 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1709253 ]

          LUCENE-6780: don't allow too-large radius in distance query (depending on origin); reduce resolution for large bbox

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709253 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1709253 ] LUCENE-6780 : don't allow too-large radius in distance query (depending on origin); reduce resolution for large bbox
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709257 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1709257 ]

          LUCENE-6780: fix bug in midLat/midLon; don't test dateline crossing when small == true

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709257 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1709257 ] LUCENE-6780 : fix bug in midLat/midLon; don't test dateline crossing when small == true
          Hide
          mikemccand Michael McCandless added a comment -

          Thanks Nicholas Knize, I committed the patch, but put a nocommit on the AwaitsFix (we plan to fix that before landing this right?).

          I also found a bug in how midLat/Lon was computed, that was causing us to almost always use shiftFactor=5 even when the bbox was smallish ... I fix that, and also fixed the test not to do dateline crossing when small==true.

          However ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testAllLatEqual -Dtests.seed=0 is still quite slow (~5.6 seconds on my fastish haswell box), even though it's always using a small bbox ....

          Show
          mikemccand Michael McCandless added a comment - Thanks Nicholas Knize , I committed the patch, but put a nocommit on the AwaitsFix (we plan to fix that before landing this right?). I also found a bug in how midLat/Lon was computed, that was causing us to almost always use shiftFactor=5 even when the bbox was smallish ... I fix that, and also fixed the test not to do dateline crossing when small==true. However ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testAllLatEqual -Dtests.seed=0 is still quite slow (~5.6 seconds on my fastish haswell box), even though it's always using a small bbox ....
          Hide
          nknize Nicholas Knize added a comment -

          Updated patch.

          • Adds Nightly to a few of the tests that still seem to run slow during range creation (particularly on PointInPolygon queries). IMHO this shouldn't preclude the fixes and improvements added by this patch. Will open separate issues to a. investigate performance of range/polygon intersection (probably needs to be handled under the dateline crossing feature) and b. create ranges on-demand instead of using transient memory (improve overall performance)
          • Added consistency to rectContainsPoint to use same method as GeoUtils.bboxContains
          Show
          nknize Nicholas Knize added a comment - Updated patch. Adds Nightly to a few of the tests that still seem to run slow during range creation (particularly on PointInPolygon queries). IMHO this shouldn't preclude the fixes and improvements added by this patch. Will open separate issues to a. investigate performance of range/polygon intersection (probably needs to be handled under the dateline crossing feature) and b. create ranges on-demand instead of using transient memory (improve overall performance) Added consistency to rectContainsPoint to use same method as GeoUtils.bboxContains
          Hide
          nknize Nicholas Knize added a comment -

          git svn is behind by 3 days... here's the patch against the latest svn repo.

          Show
          nknize Nicholas Knize added a comment - git svn is behind by 3 days... here's the patch against the latest svn repo.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709483 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1709483 ]

          LUCENE-6780: mark slow tests nightly, turn back on some big bbox/distances in tests, use GeoUtils.bboxContains

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709483 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1709483 ] LUCENE-6780 : mark slow tests nightly, turn back on some big bbox/distances in tests, use GeoUtils.bboxContains
          Hide
          mikemccand Michael McCandless added a comment -

          git svn is behind by 3 days.

          Not good!

          Thanks Nicholas Knize, I committed the last patch, but I hit this test failure in the new "geo API self-consistency" test:

             [junit4] Started J0 PID(17316@localhost).
             [junit4] Suite: org.apache.lucene.util.TestGeoUtils
             [junit4]   1> doc=113 did not match but should
             [junit4]   1>   lon=19.101526554407 lat=-85.2135864567328 distanceMeters=220459.12109877667 vs radiusMeters=220827.9418903528
             [junit4]   1> doc=814 did not match but should
             [junit4]   1>   lon=18.946167779153978 lat=-85.21667574378826 distanceMeters=220122.387740399 vs radiusMeters=220827.9418903528
             [junit4]   1> doc=1340 did not match but should
             [junit4]   1>   lon=19.231482133213458 lat=-85.21195787282232 distanceMeters=220639.0006665368 vs radiusMeters=220827.9418903528
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=A1A8548928708420 -Dtests.locale=en_PH -Dtests.timezone=Africa/Mogadishu -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 0.09s | TestGeoUtils.testGeoRelations <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: 3 incorrect hits (see above)
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([A1A8548928708420:638B403C5C87F29E]:0)
             [junit4]    > 	at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:506)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: test params are: codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST, chunkSize=25153, maxDocsPerChunk=7, blockSize=932), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, chunkSize=25153, blockSize=932)), sim=ClassicSimilarity, locale=en_PH, timezone=Africa/Mogadishu
             [junit4]   2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=422996528,total=504889344
             [junit4]   2> NOTE: All tests run in this JVM: [TestGeoUtils]
             [junit4] Completed [1/1] in 0.24s, 1 test, 1 failure <<< FAILURES!
          

          Seems like boundary cases because the actual distance vs query distance are pretty close?

          Show
          mikemccand Michael McCandless added a comment - git svn is behind by 3 days. Not good! Thanks Nicholas Knize , I committed the last patch, but I hit this test failure in the new "geo API self-consistency" test: [junit4] Started J0 PID(17316@localhost). [junit4] Suite: org.apache.lucene.util.TestGeoUtils [junit4] 1> doc=113 did not match but should [junit4] 1> lon=19.101526554407 lat=-85.2135864567328 distanceMeters=220459.12109877667 vs radiusMeters=220827.9418903528 [junit4] 1> doc=814 did not match but should [junit4] 1> lon=18.946167779153978 lat=-85.21667574378826 distanceMeters=220122.387740399 vs radiusMeters=220827.9418903528 [junit4] 1> doc=1340 did not match but should [junit4] 1> lon=19.231482133213458 lat=-85.21195787282232 distanceMeters=220639.0006665368 vs radiusMeters=220827.9418903528 [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoUtils -Dtests.method=testGeoRelations -Dtests.seed=A1A8548928708420 -Dtests.locale=en_PH -Dtests.timezone=Africa/Mogadishu -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.09s | TestGeoUtils.testGeoRelations <<< [junit4] > Throwable #1: java.lang.AssertionError: 3 incorrect hits (see above) [junit4] > at __randomizedtesting.SeedInfo.seed([A1A8548928708420:638B403C5C87F29E]:0) [junit4] > at org.apache.lucene.util.TestGeoUtils.testGeoRelations(TestGeoUtils.java:506) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: test params are: codec=FastCompressingStoredFields(storedFieldsFormat=CompressingStoredFieldsFormat(compressionMode=FAST, chunkSize=25153, maxDocsPerChunk=7, blockSize=932), termVectorsFormat=CompressingTermVectorsFormat(compressionMode=FAST, chunkSize=25153, blockSize=932)), sim=ClassicSimilarity, locale=en_PH, timezone=Africa/Mogadishu [junit4] 2> NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=422996528,total=504889344 [junit4] 2> NOTE: All tests run in this JVM: [TestGeoUtils] [junit4] Completed [1/1] in 0.24s, 1 test, 1 failure <<< FAILURES! Seems like boundary cases because the actual distance vs query distance are pretty close?
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709485 from Michael McCandless in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1709485 ]

          LUCENE-6780: disable test until we can fix pole crossing correctly

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709485 from Michael McCandless in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1709485 ] LUCENE-6780 : disable test until we can fix pole crossing correctly
          Hide
          mikemccand Michael McCandless added a comment -

          OK I disabled the new test case until we can fix pole crossing.

          But I hit a new failure:

          ....I.I.......T2: id=2929 should not match but did
            small=true query=GeoPointInBBoxQuery: field=point: Lower Left: [-96.30716303968933,35.304955180599926] Upper Right: [-95.78797371671509,35.55537284046483] docID=2929
            lat=35.30495325717168 lon=-95.95683785493188
            deleted?=false
          okt 19, 2015 5:00:44 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
          WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestGeoPointQuery]
          java.lang.AssertionError: some hits were wrong
          	at __randomizedtesting.SeedInfo.seed([F82368891A1F01C9]:0)
          	at org.junit.Assert.fail(Assert.java:93)
          	at org.apache.lucene.util.BaseGeoPointTestCase$VerifyHits.test(BaseGeoPointTestCase.java:509)
          	at org.apache.lucene.util.BaseGeoPointTestCase$2._run(BaseGeoPointTestCase.java:697)
          	at org.apache.lucene.util.BaseGeoPointTestCase$2.run(BaseGeoPointTestCase.java:580)
          
          ENOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomMedium -Dtests.seed=F82368891A1F01C9 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=sk_SK -Dtests.timezone=America/Knox_IN -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          ....I.....NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=Memory doPackFST= true), point=PostingsFormat(name=Asserting)}, docValues:{id=DocValuesFormat(name=Lucene50), point=DocValuesFormat(name=Lucene50)}, sim=ClassicSimilarity, locale=sk_SK, timezone=America/Knox_IN
          NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=488560232,total=514850816
          NOTE: All tests run in this JVM: [TestGeoPointQuery]
          

          Looks like a boundary case ...

          Show
          mikemccand Michael McCandless added a comment - OK I disabled the new test case until we can fix pole crossing. But I hit a new failure: ....I.I.......T2: id=2929 should not match but did small=true query=GeoPointInBBoxQuery: field=point: Lower Left: [-96.30716303968933,35.304955180599926] Upper Right: [-95.78797371671509,35.55537284046483] docID=2929 lat=35.30495325717168 lon=-95.95683785493188 deleted?=false okt 19, 2015 5:00:44 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[T2,5,TGRP-TestGeoPointQuery] java.lang.AssertionError: some hits were wrong at __randomizedtesting.SeedInfo.seed([F82368891A1F01C9]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.util.BaseGeoPointTestCase$VerifyHits.test(BaseGeoPointTestCase.java:509) at org.apache.lucene.util.BaseGeoPointTestCase$2._run(BaseGeoPointTestCase.java:697) at org.apache.lucene.util.BaseGeoPointTestCase$2.run(BaseGeoPointTestCase.java:580) ENOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomMedium -Dtests.seed=F82368891A1F01C9 -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=sk_SK -Dtests.timezone=America/Knox_IN -Dtests.asserts=true -Dtests.file.encoding=UTF-8 ....I.....NOTE: test params are: codec=Asserting(Lucene53): {id=PostingsFormat(name=Memory doPackFST= true), point=PostingsFormat(name=Asserting)}, docValues:{id=DocValuesFormat(name=Lucene50), point=DocValuesFormat(name=Lucene50)}, sim=ClassicSimilarity, locale=sk_SK, timezone=America/Knox_IN NOTE: Linux 3.13.0-61-generic amd64/Oracle Corporation 1.8.0_60 (64-bit)/cpus=8,threads=1,free=488560232,total=514850816 NOTE: All tests run in this JVM: [TestGeoPointQuery] Looks like a boundary case ...
          Hide
          nknize Nicholas Knize added a comment -

          Updated patch:

          • Quantizes random generated lat/lon values to hash scale
          • Reverts overly complicated compare method
          • Adds @Nightly to repeated adversarial tests
          • Passes all beast tests
          • Passes ant precommit

          I think this is ready and we can move on to fixing performance by removing some of these transient structures (e.g., rangeList)

          Show
          nknize Nicholas Knize added a comment - Updated patch: Quantizes random generated lat/lon values to hash scale Reverts overly complicated compare method Adds @Nightly to repeated adversarial tests Passes all beast tests Passes ant precommit I think this is ready and we can move on to fixing performance by removing some of these transient structures (e.g., rangeList)
          Hide
          mikemccand Michael McCandless added a comment -

          +1 to the last patch, thanks Nicholas Knize!

          Show
          mikemccand Michael McCandless added a comment - +1 to the last patch, thanks Nicholas Knize !
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709892 from Nicholas Knize in branch 'dev/branches/lucene6780'
          [ https://svn.apache.org/r1709892 ]

          LUCENE-6780: fix quantization in random lat/lon generation, cut overly adversarial tests to @Nightly

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709892 from Nicholas Knize in branch 'dev/branches/lucene6780' [ https://svn.apache.org/r1709892 ] LUCENE-6780 : fix quantization in random lat/lon generation, cut overly adversarial tests to @Nightly
          Hide
          nknize Nicholas Knize added a comment -

          Patch of lucene6780 feature branch against trunk. I think its ready to merge.

          Show
          nknize Nicholas Knize added a comment - Patch of lucene6780 feature branch against trunk. I think its ready to merge.
          Hide
          nknize Nicholas Knize added a comment -

          Updated patch to remove BKDDistanceQuery and make randomLat/Lon non-static

          Show
          nknize Nicholas Knize added a comment - Updated patch to remove BKDDistanceQuery and make randomLat/Lon non-static
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709926 from Nicholas Knize in branch 'dev/trunk'
          [ https://svn.apache.org/r1709926 ]

          LUCENE-6780: Improves GeoPointDistanceQuery accuracy with large radius. Improves testing rigor to GeoPointField

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709926 from Nicholas Knize in branch 'dev/trunk' [ https://svn.apache.org/r1709926 ] LUCENE-6780 : Improves GeoPointDistanceQuery accuracy with large radius. Improves testing rigor to GeoPointField
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709927 from Nicholas Knize in branch 'dev/trunk'
          [ https://svn.apache.org/r1709927 ]

          LUCENE-6780: add missing classes

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709927 from Nicholas Knize in branch 'dev/trunk' [ https://svn.apache.org/r1709927 ] LUCENE-6780 : add missing classes
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1709954 from shalin@apache.org in branch 'dev/trunk'
          [ https://svn.apache.org/r1709954 ]

          LUCENE-6780: Fix svn:eol-style to make precommit happy

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1709954 from shalin@apache.org in branch 'dev/trunk' [ https://svn.apache.org/r1709954 ] LUCENE-6780 : Fix svn:eol-style to make precommit happy
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1710027 from Nicholas Knize in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1710027 ]

          LUCENE-6780: Improves GeoPointDistanceQuery accuracy with large radius. Improves testing rigor to GeoPointField

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1710027 from Nicholas Knize in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1710027 ] LUCENE-6780 : Improves GeoPointDistanceQuery accuracy with large radius. Improves testing rigor to GeoPointField
          Hide
          mikemccand Michael McCandless added a comment -

          Nicholas Knize can this be resolved now?

          Show
          mikemccand Michael McCandless added a comment - Nicholas Knize can this be resolved now?
          Hide
          nknize Nicholas Knize added a comment -

          ++. Merged to trunk and 5.4

          Show
          nknize Nicholas Knize added a comment - ++. Merged to trunk and 5.4

            People

            • Assignee:
              Unassigned
              Reporter:
              mikemccand Michael McCandless
            • Votes:
              0 Vote for this issue
              Watchers:
              3 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development