Lucene - Core
  1. Lucene - Core
  2. LUCENE-6481

Improve GeoPointField type to only visit high precision boundary terms

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.3, 6.0
    • Component/s: core/index
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Current GeoPointField LUCENE-6450 computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large).

      This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box.

      A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue.

      1. LUCENE-6481_WIP.patch
        37 kB
        Nicholas Knize
      2. LUCENE-6481.patch
        74 kB
        Nicholas Knize
      3. LUCENE-6481.patch
        30 kB
        Nicholas Knize
      4. LUCENE-6481.patch
        50 kB
        Nicholas Knize
      5. LUCENE-6481.patch
        49 kB
        Nicholas Knize
      6. LUCENE-6481.patch
        49 kB
        Nicholas Knize
      7. LUCENE-6481.patch
        50 kB
        Nicholas Knize
      8. LUCENE-6481.patch
        45 kB
        Nicholas Knize
      9. LUCENE-6481.patch
        46 kB
        Michael McCandless

        Issue Links

          Activity

          Hide
          Nicholas Knize added a comment -

          First cut WIP patch. LuceneUtil benchmark shows false negatives, though, so this is definitely not ready. So far I've been unable to reproduce the false negatives...I put it here for iterating improvements.

          GeoPointField

          Index Time: 640.24 sec
          Index Size: 4.4G
          Mean Query Time: 0.02 sec

          Show
          Nicholas Knize added a comment - First cut WIP patch. LuceneUtil benchmark shows false negatives, though, so this is definitely not ready. So far I've been unable to reproduce the false negatives...I put it here for iterating improvements. GeoPointField Index Time: 640.24 sec Index Size: 4.4G Mean Query Time: 0.02 sec
          Hide
          Michael McCandless added a comment -

          New patch, starting from Nicholas Knize's and then folding in the evilish random test I added for LUCENE-6477 ... maybe this can help debug why there are false negatives?

          E.g. with this patch when I run:

          ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true
          

          It fails with this:

             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 2.91s | TestGeoPointQuery.testRandomTiny <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: id=0 docID=0 lat=-27.180279388889545 lon=-167.14191331870592 expected true but got: false deleted?=false
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:B8A3E1152EBAC72E]:0)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:301)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:203)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.testRandomTiny(TestGeoPointQuery.java:125)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
          

          The test case should be easy-ish to debug: it only indexes at most a few 10s of points...

          Show
          Michael McCandless added a comment - New patch, starting from Nicholas Knize 's and then folding in the evilish random test I added for LUCENE-6477 ... maybe this can help debug why there are false negatives? E.g. with this patch when I run: ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true It fails with this: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 2.91s | TestGeoPointQuery.testRandomTiny <<< [junit4] > Throwable #1: java.lang.AssertionError: id=0 docID=0 lat=-27.180279388889545 lon=-167.14191331870592 expected true but got: false deleted?=false [junit4] > at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:B8A3E1152EBAC72E]:0) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:301) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:203) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.testRandomTiny(TestGeoPointQuery.java:125) [junit4] > at java.lang.Thread.run(Thread.java:745) The test case should be easy-ish to debug: it only indexes at most a few 10s of points...
          Hide
          Nicholas Knize added a comment -

          The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery. I've attached a new patch with the correction.

          testRandomTiny passes but there is one failure in testRandom with the following:

          ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true
          
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.slow=true -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 1.54s | TestGeoPointQuery.testRandom <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: id=632 docID=613 lat=46.19240875459866 lon=143.92476891121902 expected true but got: false deleted?=false
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:83A81A5CC1FB49F1]:0)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:302)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:204)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.testRandom(TestGeoPointQuery.java:130)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
          

          This should be enough to debug the issue. I expect to have a new patch sometime tomorrow or before weeks end.

          Show
          Nicholas Knize added a comment - The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery. I've attached a new patch with the correction. testRandomTiny passes but there is one failure in testRandom with the following: ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.slow=true -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 1.54s | TestGeoPointQuery.testRandom <<< [junit4] > Throwable #1: java.lang.AssertionError: id=632 docID=613 lat=46.19240875459866 lon=143.92476891121902 expected true but got: false deleted?=false [junit4] > at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:83A81A5CC1FB49F1]:0) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:302) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:204) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.testRandom(TestGeoPointQuery.java:130) [junit4] > at java.lang.Thread.run(Thread.java:745) This should be enough to debug the issue. I expect to have a new patch sometime tomorrow or before weeks end.
          Hide
          Michael McCandless added a comment -

          The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery.

          Oh, woops Thanks for fixing1

          Show
          Michael McCandless added a comment - The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery. Oh, woops Thanks for fixing1
          Hide
          Nicholas Knize added a comment -

          Minor patch update that adds geodesic to geodetic projection / reprojection methods to GeoUtils.

          Show
          Nicholas Knize added a comment - Minor patch update that adds geodesic to geodetic projection / reprojection methods to GeoUtils.
          Hide
          Nicholas Knize added a comment -

          Updated patch to fix false negatives. This now improves performance of LUCENE-6450 to 0.02 sec / query by using the postings list instead of visiting every term.

          Show
          Nicholas Knize added a comment - Updated patch to fix false negatives. This now improves performance of LUCENE-6450 to 0.02 sec / query by using the postings list instead of visiting every term.
          Hide
          Nicholas Knize added a comment -

          This patch is ready to go. In fact, this performance improvement should supersede the existing patch in LUCENE-6450.

          Show
          Nicholas Knize added a comment - This patch is ready to go. In fact, this performance improvement should supersede the existing patch in LUCENE-6450 .
          Hide
          Michael McCandless added a comment -

          Ooh, awesome progress! I'll look in more detail, but: I applied the patch and hit this test failure:

          ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testBBoxQuery -Dtests.seed=FC153607634373BA -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8
          
             [junit4] Started J0 PID(14260@localhost).
             [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testBBoxQuery -Dtests.seed=FC153607634373BA -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] FAILURE 0.09s | TestGeoPointQuery.testBBoxQuery <<<
             [junit4]    > Throwable #1: java.lang.AssertionError: GeoBoundingBoxQuery failed expected:<1> but was:<2>
             [junit4]    > 	at __randomizedtesting.SeedInfo.seed([FC153607634373BA:5CE7EC6FB0DC9159]:0)
             [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.testBBoxQuery(TestGeoPointQuery.java:113)
             [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
             [junit4]   2> NOTE: leaving temporary files on disk at: /l/nick/lucene/build/sandbox/test/J0/temp/lucene.search.TestGeoPointQuery FC153607634373BA-001
             [junit4]   2> NOTE: test params are: codec=Asserting(Lucene50): {geoField=PostingsFormat(name=MockRandom)}, docValues:{}, sim=DefaultSimilarity, locale=es, timezone=Asia/Jerusalem
             [junit4]   2> NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=385990872,total=504889344
             [junit4]   2> NOTE: All tests run in this JVM: [TestGeoPointQuery]
             [junit4] Completed [1/1] in 0.48s, 1 test, 1 failure <<< FAILURES!
          
          Show
          Michael McCandless added a comment - Ooh, awesome progress! I'll look in more detail, but: I applied the patch and hit this test failure: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testBBoxQuery -Dtests.seed=FC153607634373BA -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] Started J0 PID(14260@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testBBoxQuery -Dtests.seed=FC153607634373BA -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.09s | TestGeoPointQuery.testBBoxQuery <<< [junit4] > Throwable #1: java.lang.AssertionError: GeoBoundingBoxQuery failed expected:<1> but was:<2> [junit4] > at __randomizedtesting.SeedInfo.seed([FC153607634373BA:5CE7EC6FB0DC9159]:0) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.testBBoxQuery(TestGeoPointQuery.java:113) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: leaving temporary files on disk at: /l/nick/lucene/build/sandbox/test/J0/temp/lucene.search.TestGeoPointQuery FC153607634373BA-001 [junit4] 2> NOTE: test params are: codec=Asserting(Lucene50): {geoField=PostingsFormat(name=MockRandom)}, docValues:{}, sim=DefaultSimilarity, locale=es, timezone=Asia/Jerusalem [junit4] 2> NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=385990872,total=504889344 [junit4] 2> NOTE: All tests run in this JVM: [TestGeoPointQuery] [junit4] Completed [1/1] in 0.48s, 1 test, 1 failure <<< FAILURES!
          Hide
          Nicholas Knize added a comment -

          Thanks Mike! Not sure how I missed my own test. Trivial fix though, new patch added and all tests are passing on my end. Next iteration ready for review. This should be ready for sandbox commit blessings.

          Show
          Nicholas Knize added a comment - Thanks Mike! Not sure how I missed my own test. Trivial fix though, new patch added and all tests are passing on my end. Next iteration ready for review. This should be ready for sandbox commit blessings.
          Hide
          Michael McCandless added a comment -

          Thanks Nicholas Knize, tests pass for me now, and I ran the same "bboxes around London, UK" perf test and this is much faster than before (= LUCENE-6450): I now see ~17.8 msec avg per query (~2X faster than GeoHashPrefixTree I think).

          I'll have a closer look at the patch ... I really want to make a visualization here just like the BKD tree ones (https://plus.google.com/+MichaelMcCandless/posts/Daj9FgYPdtv) to see how the morton-shapes "work" in filling in a polygon...

          I think there are fun things we can explore (future!) to speed things up further, e.g. if we also index lat/lon into doc values, then we can use that for filtering, freeing us to also use prefix terms on boundary shapes, and also maybe freeing us to use encodings like Hilbert curves which should give better locality / able to use prefix terms more frequently since you would no longer need a fast decode from term -> lat/lon. But, it would use more disk space... we can also integrate with geo3d so we get shape intersection for faster polygon querying (now it must filter every point?). Later!

          Show
          Michael McCandless added a comment - Thanks Nicholas Knize , tests pass for me now, and I ran the same "bboxes around London, UK" perf test and this is much faster than before (= LUCENE-6450 ): I now see ~17.8 msec avg per query (~2X faster than GeoHashPrefixTree I think). I'll have a closer look at the patch ... I really want to make a visualization here just like the BKD tree ones ( https://plus.google.com/+MichaelMcCandless/posts/Daj9FgYPdtv ) to see how the morton-shapes "work" in filling in a polygon... I think there are fun things we can explore (future!) to speed things up further, e.g. if we also index lat/lon into doc values, then we can use that for filtering, freeing us to also use prefix terms on boundary shapes, and also maybe freeing us to use encodings like Hilbert curves which should give better locality / able to use prefix terms more frequently since you would no longer need a fast decode from term -> lat/lon. But, it would use more disk space... we can also integrate with geo3d so we get shape intersection for faster polygon querying (now it must filter every point?). Later!
          Hide
          Michael McCandless added a comment -

          I was curious about the polygon query performance, so I tweaked the "bboxes around London, UK" benchmark to just use a polygon query instead (with 5 points, first and last are the same to the polygon is close), and surprisingly the performance is only a bit slower than the bbox case (19.1 msec vs 17.8 msec) ... I expected much slower because in the polygon case we cannot use the prefix terms, I think?

          Show
          Michael McCandless added a comment - I was curious about the polygon query performance, so I tweaked the "bboxes around London, UK" benchmark to just use a polygon query instead (with 5 points, first and last are the same to the polygon is close), and surprisingly the performance is only a bit slower than the bbox case (19.1 msec vs 17.8 msec) ... I expected much slower because in the polygon case we cannot use the prefix terms, I think?
          Hide
          Nicholas Knize added a comment -

          You can use the postings when the cell is wholly contained within the polygon, which wasn't in that last patch. New patch attached to include this logic.

          Boundary cells are still computed as they relate to the bbox. This could be improved by computing boundary cells as they relate to the shape as long as computing the existence of an intersection of the bbox and shape is fast - this is usually the Achilles heel of spatial relations.

          Show
          Nicholas Knize added a comment - You can use the postings when the cell is wholly contained within the polygon, which wasn't in that last patch. New patch attached to include this logic. Boundary cells are still computed as they relate to the bbox. This could be improved by computing boundary cells as they relate to the shape as long as computing the existence of an intersection of the bbox and shape is fast - this is usually the Achilles heel of spatial relations.
          Hide
          ASF subversion and git services added a comment -

          Commit 1681820 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681820 ]

          LUCENE-6481: make branch

          Show
          ASF subversion and git services added a comment - Commit 1681820 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681820 ] LUCENE-6481 : make branch
          Hide
          ASF subversion and git services added a comment -

          Commit 1681821 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681821 ]

          LUCENE-6481: commit latest patch

          Show
          ASF subversion and git services added a comment - Commit 1681821 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681821 ] LUCENE-6481 : commit latest patch
          Hide
          Michael McCandless added a comment -

          I made a branch so we can iterate a bit: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-6481

          I'll make some small fixes to satisfy "ant precommit", and fold some of the improvements to the random test from LUCENE-6477

          Show
          Michael McCandless added a comment - I made a branch so we can iterate a bit: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-6481 I'll make some small fixes to satisfy "ant precommit", and fold some of the improvements to the random test from LUCENE-6477
          Hide
          ASF subversion and git services added a comment -

          Commit 1681827 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681827 ]

          LUCENE-6481: woops

          Show
          ASF subversion and git services added a comment - Commit 1681827 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681827 ] LUCENE-6481 : woops
          Hide
          ASF subversion and git services added a comment -

          Commit 1681828 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681828 ]

          LUCENE-6481: fix some precommit issues

          Show
          ASF subversion and git services added a comment - Commit 1681828 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681828 ] LUCENE-6481 : fix some precommit issues
          Hide
          ASF subversion and git services added a comment -

          Commit 1681836 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681836 ]

          LUCENE-6481: fix some ant precommit issues; improve test (use threads; randomly test poly query too), but currently disabled; add validation of the incoming lat/lon to poly query; improve query toStrings a bit

          Show
          ASF subversion and git services added a comment - Commit 1681836 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681836 ] LUCENE-6481 : fix some ant precommit issues; improve test (use threads; randomly test poly query too), but currently disabled; add validation of the incoming lat/lon to poly query; improve query toStrings a bit
          Hide
          ASF subversion and git services added a comment -

          Commit 1681842 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681842 ]

          LUCENE-6481: make explicit separate methods for the lat vs lon cases

          Show
          ASF subversion and git services added a comment - Commit 1681842 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681842 ] LUCENE-6481 : make explicit separate methods for the lat vs lon cases
          Hide
          ASF subversion and git services added a comment -

          Commit 1681849 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681849 ]

          LUCENE-6481: enable threads and random poly query in test

          Show
          ASF subversion and git services added a comment - Commit 1681849 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681849 ] LUCENE-6481 : enable threads and random poly query in test
          Hide
          Michael McCandless added a comment -

          I made test a bit more evil (use threads, sometimes randomly do poly query), but I hit this:

             [junit4] Started J0 PID(12760@localhost).
             [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
             [junit4] OK      0.04s | TestGeoPointQuery.testBBoxQuery
             [junit4]   2> de maig 26, 2015 3:30:16 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
             [junit4]   2> java.lang.OutOfMemoryError: Java heap space
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([E57ABD1CCC7F4388]:0)
             [junit4]   2> 	at java.util.LinkedList.toArray(LinkedList.java:1050)
             [junit4]   2> 	at java.util.List.sort(List.java:477)
             [junit4]   2> 	at java.util.Collections.sort(Collections.java:141)
             [junit4]   2> 	at org.apache.lucene.search.GeoPointInBBoxQuery$GeoBBoxTermsEnum.<init>(GeoPointInBBoxQuery.java:157)
             [junit4]   2> 	at org.apache.lucene.search.GeoPointInBBoxQuery.getTermsEnum(GeoPointInBBoxQuery.java:86)
             [junit4]   2> 	at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:345)
             [junit4]   2> 	at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite(MultiTermQueryConstantScoreWrapper.java:146)
             [junit4]   2> 	at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.scorer(MultiTermQueryConstantScoreWrapper.java:212)
             [junit4]   2> 	at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:585)
             [junit4]   2> 	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:137)
             [junit4]   2> 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:72)
             [junit4]   2> 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:72)
             [junit4]   2> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560)
             [junit4]   2> 	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:93)
             [junit4]   2> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:367)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:337)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:265)
             [junit4]   2> 
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=E57ABD1CCC7F4388 -Dtests.locale=ca_ES -Dtests.timezone=America/Rankin_Inlet -Dtests.asserts=true -Dtests.file.encoding=UTF-8
             [junit4] ERROR   49.3s | TestGeoPointQuery.testRandomTiny <<<
          

          It's curious because the testRandomTiny only indexes a few 10s of points...

          Show
          Michael McCandless added a comment - I made test a bit more evil (use threads, sometimes randomly do poly query), but I hit this: [junit4] Started J0 PID(12760@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] OK 0.04s | TestGeoPointQuery.testBBoxQuery [junit4] 2> de maig 26, 2015 3:30:16 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] [junit4] 2> java.lang.OutOfMemoryError: Java heap space [junit4] 2> at __randomizedtesting.SeedInfo.seed([E57ABD1CCC7F4388]:0) [junit4] 2> at java.util.LinkedList.toArray(LinkedList.java:1050) [junit4] 2> at java.util.List.sort(List.java:477) [junit4] 2> at java.util.Collections.sort(Collections.java:141) [junit4] 2> at org.apache.lucene.search.GeoPointInBBoxQuery$GeoBBoxTermsEnum.<init>(GeoPointInBBoxQuery.java:157) [junit4] 2> at org.apache.lucene.search.GeoPointInBBoxQuery.getTermsEnum(GeoPointInBBoxQuery.java:86) [junit4] 2> at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:345) [junit4] 2> at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite(MultiTermQueryConstantScoreWrapper.java:146) [junit4] 2> at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.scorer(MultiTermQueryConstantScoreWrapper.java:212) [junit4] 2> at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:585) [junit4] 2> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:137) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:72) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:72) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560) [junit4] 2> at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:93) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:367) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:337) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:265) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=E57ABD1CCC7F4388 -Dtests.locale=ca_ES -Dtests.timezone=America/Rankin_Inlet -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 49.3s | TestGeoPointQuery.testRandomTiny <<< It's curious because the testRandomTiny only indexes a few 10s of points...
          Hide
          ASF subversion and git services added a comment -

          Commit 1681853 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681853 ]

          LUCENE-6481: add back lost GeoPointInPolygonQuery.isWithin override

          Show
          ASF subversion and git services added a comment - Commit 1681853 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681853 ] LUCENE-6481 : add back lost GeoPointInPolygonQuery.isWithin override
          Hide
          ASF subversion and git services added a comment -

          Commit 1681855 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681855 ]

          LUCENE-6481: go back to 1 thread, no poly query, until we track down testRandomTiny failures

          Show
          ASF subversion and git services added a comment - Commit 1681855 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681855 ] LUCENE-6481 : go back to 1 thread, no poly query, until we track down testRandomTiny failures
          Hide
          ASF subversion and git services added a comment -

          Commit 1681941 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681941 ]

          LUCENE-6481: sometimes use smallish bbox for random test

          Show
          ASF subversion and git services added a comment - Commit 1681941 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681941 ] LUCENE-6481 : sometimes use smallish bbox for random test
          Hide
          Michael McCandless added a comment -

          I committed a small improvement to the test, to sometimes test on a smallish random sized bbox, and it hits this new failure:

             [junit4] Started J0 PID(15031@localhost).
             [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
             [junit4]   1> smallBBox=true
             [junit4]   1> T0: iter=1 id=3 docID=3 lat=-90.2957562284163 lon=-179.83102465432302 (bbox: lat=-90.84003988710097 TO -89.86198542441215 lon=-180.5339936365967 TO -179.31722107698667) expected true but got: false deleted?=false query=GeoPointInBBoxQuery: field=geoField: Lower Left: [-180.5339936365967,-90.84003988710097] Upper Right: [-179.31722107698667,-89.86198542441215]
             [junit4]   2> ??? 27, 2015 12:37:04 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
             [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
             [junit4]   2> java.lang.AssertionError: wrong result
             [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([5]:0)
             [junit4]   2> 	at org.junit.Assert.fail(Assert.java:93)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:372)
             [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:271)
             [junit4]   2> 
             [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=5 -Dtests.locale=sr_ME -Dtests.timezone=Europe/Vilnius -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
             [junit4] ERROR   0.09s | TestGeoPointQuery.testRandomTiny <<<
          

          I'm not sure why testRandomTiny is hitting all these problems!

          Show
          Michael McCandless added a comment - I committed a small improvement to the test, to sometimes test on a smallish random sized bbox, and it hits this new failure: [junit4] Started J0 PID(15031@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] 1> smallBBox=true [junit4] 1> T0: iter=1 id=3 docID=3 lat=-90.2957562284163 lon=-179.83102465432302 (bbox: lat=-90.84003988710097 TO -89.86198542441215 lon=-180.5339936365967 TO -179.31722107698667) expected true but got: false deleted?=false query=GeoPointInBBoxQuery: field=geoField: Lower Left: [-180.5339936365967,-90.84003988710097] Upper Right: [-179.31722107698667,-89.86198542441215] [junit4] 2> ??? 27, 2015 12:37:04 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] [junit4] 2> java.lang.AssertionError: wrong result [junit4] 2> at __randomizedtesting.SeedInfo.seed([5]:0) [junit4] 2> at org.junit.Assert.fail(Assert.java:93) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:372) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:271) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=5 -Dtests.locale=sr_ME -Dtests.timezone=Europe/Vilnius -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 0.09s | TestGeoPointQuery.testRandomTiny <<< I'm not sure why testRandomTiny is hitting all these problems!
          Hide
          ASF subversion and git services added a comment -

          Commit 1681942 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681942 ]

          LUCENE-6481: add nocommit

          Show
          ASF subversion and git services added a comment - Commit 1681942 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681942 ] LUCENE-6481 : add nocommit
          Hide
          ASF subversion and git services added a comment -

          Commit 1681943 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1681943 ]

          LUCENE-6481: woops

          Show
          ASF subversion and git services added a comment - Commit 1681943 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681943 ] LUCENE-6481 : woops
          Hide
          Michael McCandless added a comment -

          I committed a small improvement to the test, to sometimes test on a smallish random sized bbox, and it hits this new failure:

          Woops, my bad: test bug. I committed a fix...

          Show
          Michael McCandless added a comment - I committed a small improvement to the test, to sometimes test on a smallish random sized bbox, and it hits this new failure: Woops, my bad: test bug. I committed a fix...
          Hide
          ASF subversion and git services added a comment -

          Commit 1682029 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1682029 ]

          LUCENE-6481: validate lat/lon passed to the queries

          Show
          ASF subversion and git services added a comment - Commit 1682029 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1682029 ] LUCENE-6481 : validate lat/lon passed to the queries
          Hide
          ASF subversion and git services added a comment -

          Commit 1682036 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1682036 ]

          LUCENE-6481: restrict random test cases to 'smallish' bbox; switch static factory for polygon query to ctor

          Show
          ASF subversion and git services added a comment - Commit 1682036 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1682036 ] LUCENE-6481 : restrict random test cases to 'smallish' bbox; switch static factory for polygon query to ctor
          Hide
          Michael McCandless added a comment -

          I fixed the randomized test case to only test a "smallish" global bbox (up to +/- 1.5 in lat and lon directions)... when testing the full space the test runs very very slowly, even testRandomTiny, because it can require 100s of K terms ... I'm not quite sure why but the query classes do advertise that they should be used on smallish rectangles.

          Also, the test fails because of boundary cases:

          T0: iter=57 id=6819 docID=6723 lat=-81.12143028547105 lon=-168.98618510239544 (bbox: lat=-81.22948018485512 TO -80.9313958433031 lon=-168.98618505380117 TO -168.77958782241174) expected true but got: false deleted?=false query=GeoPointInBBoxQuery: field=geoField: Lower Left: [-168.98618505380117,-81.22948018485512] Upper Right: [-168.77958782241174,-80.9313958433031]
          máj 27, 2015 2:16:52 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
          WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
          java.lang.AssertionError: wrong result
          	at __randomizedtesting.SeedInfo.seed([4B5245DED09AE592]:0)
          	at org.junit.Assert.fail(Assert.java:93)
          	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:380)
          	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:279)
          

          It's odd because in GeoUtils#bboxContains, we do try to take the quantization into account ...

          Show
          Michael McCandless added a comment - I fixed the randomized test case to only test a "smallish" global bbox (up to +/- 1.5 in lat and lon directions)... when testing the full space the test runs very very slowly, even testRandomTiny, because it can require 100s of K terms ... I'm not quite sure why but the query classes do advertise that they should be used on smallish rectangles. Also, the test fails because of boundary cases: T0: iter=57 id=6819 docID=6723 lat=-81.12143028547105 lon=-168.98618510239544 (bbox: lat=-81.22948018485512 TO -80.9313958433031 lon=-168.98618505380117 TO -168.77958782241174) expected true but got: false deleted?=false query=GeoPointInBBoxQuery: field=geoField: Lower Left: [-168.98618505380117,-81.22948018485512] Upper Right: [-168.77958782241174,-80.9313958433031] máj 27, 2015 2:16:52 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] java.lang.AssertionError: wrong result at __randomizedtesting.SeedInfo.seed([4B5245DED09AE592]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:380) at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:279) It's odd because in GeoUtils#bboxContains, we do try to take the quantization into account ...
          Hide
          ASF subversion and git services added a comment -

          Commit 1682043 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1682043 ]

          LUCENE-6481: add nocommit about query slowness for large bboxes; improve javadocs

          Show
          ASF subversion and git services added a comment - Commit 1682043 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1682043 ] LUCENE-6481 : add nocommit about query slowness for large bboxes; improve javadocs
          Hide
          Nicholas Knize added a comment -

          Updates:

          • cache ranges across segments
          • only add ranges that are either within or cross the boundary of the bbox or polygon

          In exotic cases this latter fix drastically reduces the number of ranges added since it avoids unnecessary exterior cells that only "touch" the boundary. The downside is since the random test doesn't currently use the TOLERANCE criteria it occasionally fails due computation error at 1e-7 precision. This can be tweaked in the next patch.

          Show
          Nicholas Knize added a comment - Updates: cache ranges across segments only add ranges that are either within or cross the boundary of the bbox or polygon In exotic cases this latter fix drastically reduces the number of ranges added since it avoids unnecessary exterior cells that only "touch" the boundary. The downside is since the random test doesn't currently use the TOLERANCE criteria it occasionally fails due computation error at 1e-7 precision. This can be tweaked in the next patch.
          Hide
          Nicholas Knize added a comment -

          Note: this is a diff off the LUCENE-6481 branch.

          Show
          Nicholas Knize added a comment - Note: this is a diff off the LUCENE-6481 branch.
          Hide
          Nicholas Knize added a comment -

          Note: this is a diff off the LUCENE-6481 branch.

          Show
          Nicholas Knize added a comment - Note: this is a diff off the LUCENE-6481 branch.
          Hide
          Nicholas Knize added a comment -

          Note: this is a diff off the LUCENE-6481 branch.

          Show
          Nicholas Knize added a comment - Note: this is a diff off the LUCENE-6481 branch.
          Hide
          ASF subversion and git services added a comment -

          Commit 1682453 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1682453 ]

          LUCENE-6481: Nick's latest patch: create range terms once per query, not per segment; check cell intersection against polygon not its bbox for more restrictive recursing

          Show
          ASF subversion and git services added a comment - Commit 1682453 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1682453 ] LUCENE-6481 : Nick's latest patch: create range terms once per query, not per segment; check cell intersection against polygon not its bbox for more restrictive recursing
          Hide
          ASF subversion and git services added a comment -

          Commit 1683615 from Michael McCandless in branch 'dev/branches/LUCENE-6481'
          [ https://svn.apache.org/r1683615 ]

          LUCENE-6481: merge trunk

          Show
          ASF subversion and git services added a comment - Commit 1683615 from Michael McCandless in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1683615 ] LUCENE-6481 : merge trunk
          Hide
          Nicholas Knize added a comment -

          For some reason a diff with the latest branch introduced a lot of duplicate changes so this is the latest patch off trunk.

          This patch resolves all no commits, including:

          • random polygon testing
          • thread safety testing
          • added tolerance to expectation check in random test
          • beast tested w/ 500 iterations
          Show
          Nicholas Knize added a comment - For some reason a diff with the latest branch introduced a lot of duplicate changes so this is the latest patch off trunk. This patch resolves all no commits, including: random polygon testing thread safety testing added tolerance to expectation check in random test beast tested w/ 500 iterations
          Hide
          Michael McCandless added a comment -

          For some reason a diff with the latest branch introduced a lot of duplicate changes

          Oh, sorry ... the branch was supposed to make things easier. But a trunk patch is fine too.

          I like testPacManPolyQuery!

          We can remove that NOTE in GeoPointField javadocs right?

          Else, this looks great, I'll commit soon! Thanks Nicholas Knize.

          Show
          Michael McCandless added a comment - For some reason a diff with the latest branch introduced a lot of duplicate changes Oh, sorry ... the branch was supposed to make things easier. But a trunk patch is fine too. I like testPacManPolyQuery! We can remove that NOTE in GeoPointField javadocs right? Else, this looks great, I'll commit soon! Thanks Nicholas Knize .
          Hide
          ASF subversion and git services added a comment -

          Commit 1684285 from Michael McCandless in branch 'dev/trunk'
          [ https://svn.apache.org/r1684285 ]

          LUCENE-6481: add simple API to index geo lat/lon points and search by bounding box or polygon

          Show
          ASF subversion and git services added a comment - Commit 1684285 from Michael McCandless in branch 'dev/trunk' [ https://svn.apache.org/r1684285 ] LUCENE-6481 : add simple API to index geo lat/lon points and search by bounding box or polygon
          Hide
          ASF subversion and git services added a comment -

          Commit 1684286 from Michael McCandless in branch 'dev/branches/branch_5x'
          [ https://svn.apache.org/r1684286 ]

          LUCENE-6481: add simple API to index geo lat/lon points and search by bounding box or polygon

          Show
          ASF subversion and git services added a comment - Commit 1684286 from Michael McCandless in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1684286 ] LUCENE-6481 : add simple API to index geo lat/lon points and search by bounding box or polygon
          Hide
          Michael McCandless added a comment -
          Show
          Michael McCandless added a comment - Thanks Nicholas Knize !
          Hide
          Shalin Shekhar Mangar added a comment -

          Bulk close for 5.3.0 release

          Show
          Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

            People

            • Assignee:
              Unassigned
              Reporter:
              Nicholas Knize
            • Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development