Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-6481

Improve GeoPointField type to only visit high precision boundary terms

Details

    • New Feature
    • Status: Closed
    • Major
    • Resolution: Fixed
    • None
    • 5.3, 6.0
    • core/index
    • None
    • New

    Description

      Current GeoPointField LUCENE-6450 computes a set of ranges along the space-filling curve that represent a provided bounding box. This determines which terms to visit in the terms dictionary and which to skip. This is suboptimal for large bounding boxes as we may end up visiting all terms (which could be quite large).

      This incremental improvement is to improve GeoPointField to only visit high precision terms in boundary ranges and use the postings list for ranges that are completely within the target bounding box.

      A separate improvement is to switch over to auto-prefix and build an Automaton representing the bounding box. That can be tracked in a separate issue.

      Attachments

        1. LUCENE-6481.patch
          46 kB
          Michael McCandless
        2. LUCENE-6481.patch
          45 kB
          Nick Knize
        3. LUCENE-6481.patch
          50 kB
          Nick Knize
        4. LUCENE-6481.patch
          49 kB
          Nick Knize
        5. LUCENE-6481.patch
          49 kB
          Nick Knize
        6. LUCENE-6481.patch
          50 kB
          Nick Knize
        7. LUCENE-6481.patch
          30 kB
          Nick Knize
        8. LUCENE-6481.patch
          74 kB
          Nick Knize
        9. LUCENE-6481_WIP.patch
          37 kB
          Nick Knize

        Issue Links

          Activity

            tomoko Tomoko Uchida added a comment -

            This issue was moved to GitHub issue: #7540.

            tomoko Tomoko Uchida added a comment - This issue was moved to GitHub issue: #7540 .

            Bulk close for 5.3.0 release

            shalin Shalin Shekhar Mangar added a comment - Bulk close for 5.3.0 release

            Thanks nknize!

            mikemccand Michael McCandless added a comment - Thanks nknize !

            Commit 1684286 from mikemccand in branch 'dev/branches/branch_5x'
            [ https://svn.apache.org/r1684286 ]

            LUCENE-6481: add simple API to index geo lat/lon points and search by bounding box or polygon

            jira-bot ASF subversion and git services added a comment - Commit 1684286 from mikemccand in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1684286 ] LUCENE-6481 : add simple API to index geo lat/lon points and search by bounding box or polygon

            Commit 1684285 from mikemccand in branch 'dev/trunk'
            [ https://svn.apache.org/r1684285 ]

            LUCENE-6481: add simple API to index geo lat/lon points and search by bounding box or polygon

            jira-bot ASF subversion and git services added a comment - Commit 1684285 from mikemccand in branch 'dev/trunk' [ https://svn.apache.org/r1684285 ] LUCENE-6481 : add simple API to index geo lat/lon points and search by bounding box or polygon

            For some reason a diff with the latest branch introduced a lot of duplicate changes

            Oh, sorry ... the branch was supposed to make things easier. But a trunk patch is fine too.

            I like testPacManPolyQuery!

            We can remove that NOTE in GeoPointField javadocs right?

            Else, this looks great, I'll commit soon! Thanks nknize.

            mikemccand Michael McCandless added a comment - For some reason a diff with the latest branch introduced a lot of duplicate changes Oh, sorry ... the branch was supposed to make things easier. But a trunk patch is fine too. I like testPacManPolyQuery! We can remove that NOTE in GeoPointField javadocs right? Else, this looks great, I'll commit soon! Thanks nknize .
            nknize Nick Knize added a comment -

            For some reason a diff with the latest branch introduced a lot of duplicate changes so this is the latest patch off trunk.

            This patch resolves all no commits, including:

            • random polygon testing
            • thread safety testing
            • added tolerance to expectation check in random test
            • beast tested w/ 500 iterations
            nknize Nick Knize added a comment - For some reason a diff with the latest branch introduced a lot of duplicate changes so this is the latest patch off trunk. This patch resolves all no commits, including: random polygon testing thread safety testing added tolerance to expectation check in random test beast tested w/ 500 iterations

            Commit 1683615 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1683615 ]

            LUCENE-6481: merge trunk

            jira-bot ASF subversion and git services added a comment - Commit 1683615 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1683615 ] LUCENE-6481 : merge trunk

            Commit 1682453 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1682453 ]

            LUCENE-6481: Nick's latest patch: create range terms once per query, not per segment; check cell intersection against polygon not its bbox for more restrictive recursing

            jira-bot ASF subversion and git services added a comment - Commit 1682453 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1682453 ] LUCENE-6481 : Nick's latest patch: create range terms once per query, not per segment; check cell intersection against polygon not its bbox for more restrictive recursing
            nknize Nick Knize added a comment -

            Note: this is a diff off the LUCENE-6481 branch.

            nknize Nick Knize added a comment - Note: this is a diff off the LUCENE-6481 branch.
            nknize Nick Knize added a comment -

            Note: this is a diff off the LUCENE-6481 branch.

            nknize Nick Knize added a comment - Note: this is a diff off the LUCENE-6481 branch.
            nknize Nick Knize added a comment -

            Note: this is a diff off the LUCENE-6481 branch.

            nknize Nick Knize added a comment - Note: this is a diff off the LUCENE-6481 branch.
            nknize Nick Knize added a comment -

            Updates:

            • cache ranges across segments
            • only add ranges that are either within or cross the boundary of the bbox or polygon

            In exotic cases this latter fix drastically reduces the number of ranges added since it avoids unnecessary exterior cells that only "touch" the boundary. The downside is since the random test doesn't currently use the TOLERANCE criteria it occasionally fails due computation error at 1e-7 precision. This can be tweaked in the next patch.

            nknize Nick Knize added a comment - Updates: cache ranges across segments only add ranges that are either within or cross the boundary of the bbox or polygon In exotic cases this latter fix drastically reduces the number of ranges added since it avoids unnecessary exterior cells that only "touch" the boundary. The downside is since the random test doesn't currently use the TOLERANCE criteria it occasionally fails due computation error at 1e-7 precision. This can be tweaked in the next patch.

            Commit 1682043 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1682043 ]

            LUCENE-6481: add nocommit about query slowness for large bboxes; improve javadocs

            jira-bot ASF subversion and git services added a comment - Commit 1682043 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1682043 ] LUCENE-6481 : add nocommit about query slowness for large bboxes; improve javadocs

            I fixed the randomized test case to only test a "smallish" global bbox (up to +/- 1.5 in lat and lon directions)... when testing the full space the test runs very very slowly, even testRandomTiny, because it can require 100s of K terms ... I'm not quite sure why but the query classes do advertise that they should be used on smallish rectangles.

            Also, the test fails because of boundary cases:

            T0: iter=57 id=6819 docID=6723 lat=-81.12143028547105 lon=-168.98618510239544 (bbox: lat=-81.22948018485512 TO -80.9313958433031 lon=-168.98618505380117 TO -168.77958782241174) expected true but got: false deleted?=false query=GeoPointInBBoxQuery: field=geoField: Lower Left: [-168.98618505380117,-81.22948018485512] Upper Right: [-168.77958782241174,-80.9313958433031]
            máj 27, 2015 2:16:52 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
            WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
            java.lang.AssertionError: wrong result
            	at __randomizedtesting.SeedInfo.seed([4B5245DED09AE592]:0)
            	at org.junit.Assert.fail(Assert.java:93)
            	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:380)
            	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:279)
            

            It's odd because in GeoUtils#bboxContains, we do try to take the quantization into account ...

            mikemccand Michael McCandless added a comment - I fixed the randomized test case to only test a "smallish" global bbox (up to +/- 1.5 in lat and lon directions)... when testing the full space the test runs very very slowly, even testRandomTiny, because it can require 100s of K terms ... I'm not quite sure why but the query classes do advertise that they should be used on smallish rectangles. Also, the test fails because of boundary cases: T0: iter=57 id=6819 docID=6723 lat=-81.12143028547105 lon=-168.98618510239544 (bbox: lat=-81.22948018485512 TO -80.9313958433031 lon=-168.98618505380117 TO -168.77958782241174) expected true but got: false deleted?=false query=GeoPointInBBoxQuery: field=geoField: Lower Left: [-168.98618505380117,-81.22948018485512] Upper Right: [-168.77958782241174,-80.9313958433031] máj 27, 2015 2:16:52 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] java.lang.AssertionError: wrong result at __randomizedtesting.SeedInfo.seed([4B5245DED09AE592]:0) at org.junit.Assert.fail(Assert.java:93) at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:380) at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:279) It's odd because in GeoUtils#bboxContains, we do try to take the quantization into account ...

            Commit 1682036 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1682036 ]

            LUCENE-6481: restrict random test cases to 'smallish' bbox; switch static factory for polygon query to ctor

            jira-bot ASF subversion and git services added a comment - Commit 1682036 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1682036 ] LUCENE-6481 : restrict random test cases to 'smallish' bbox; switch static factory for polygon query to ctor

            Commit 1682029 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1682029 ]

            LUCENE-6481: validate lat/lon passed to the queries

            jira-bot ASF subversion and git services added a comment - Commit 1682029 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1682029 ] LUCENE-6481 : validate lat/lon passed to the queries

            I committed a small improvement to the test, to sometimes test on a smallish random sized bbox, and it hits this new failure:

            Woops, my bad: test bug. I committed a fix...

            mikemccand Michael McCandless added a comment - I committed a small improvement to the test, to sometimes test on a smallish random sized bbox, and it hits this new failure: Woops, my bad: test bug. I committed a fix...

            Commit 1681943 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681943 ]

            LUCENE-6481: woops

            jira-bot ASF subversion and git services added a comment - Commit 1681943 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681943 ] LUCENE-6481 : woops

            Commit 1681942 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681942 ]

            LUCENE-6481: add nocommit

            jira-bot ASF subversion and git services added a comment - Commit 1681942 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681942 ] LUCENE-6481 : add nocommit

            I committed a small improvement to the test, to sometimes test on a smallish random sized bbox, and it hits this new failure:

               [junit4] Started J0 PID(15031@localhost).
               [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
               [junit4]   1> smallBBox=true
               [junit4]   1> T0: iter=1 id=3 docID=3 lat=-90.2957562284163 lon=-179.83102465432302 (bbox: lat=-90.84003988710097 TO -89.86198542441215 lon=-180.5339936365967 TO -179.31722107698667) expected true but got: false deleted?=false query=GeoPointInBBoxQuery: field=geoField: Lower Left: [-180.5339936365967,-90.84003988710097] Upper Right: [-179.31722107698667,-89.86198542441215]
               [junit4]   2> ??? 27, 2015 12:37:04 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
               [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
               [junit4]   2> java.lang.AssertionError: wrong result
               [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([5]:0)
               [junit4]   2> 	at org.junit.Assert.fail(Assert.java:93)
               [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:372)
               [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:271)
               [junit4]   2> 
               [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=5 -Dtests.locale=sr_ME -Dtests.timezone=Europe/Vilnius -Dtests.asserts=true -Dtests.file.encoding=US-ASCII
               [junit4] ERROR   0.09s | TestGeoPointQuery.testRandomTiny <<<
            

            I'm not sure why testRandomTiny is hitting all these problems!

            mikemccand Michael McCandless added a comment - I committed a small improvement to the test, to sometimes test on a smallish random sized bbox, and it hits this new failure: [junit4] Started J0 PID(15031@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] 1> smallBBox=true [junit4] 1> T0: iter=1 id=3 docID=3 lat=-90.2957562284163 lon=-179.83102465432302 (bbox: lat=-90.84003988710097 TO -89.86198542441215 lon=-180.5339936365967 TO -179.31722107698667) expected true but got: false deleted?=false query=GeoPointInBBoxQuery: field=geoField: Lower Left: [-180.5339936365967,-90.84003988710097] Upper Right: [-179.31722107698667,-89.86198542441215] [junit4] 2> ??? 27, 2015 12:37:04 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] [junit4] 2> java.lang.AssertionError: wrong result [junit4] 2> at __randomizedtesting.SeedInfo.seed([5]:0) [junit4] 2> at org.junit.Assert.fail(Assert.java:93) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:372) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:271) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=5 -Dtests.locale=sr_ME -Dtests.timezone=Europe/Vilnius -Dtests.asserts=true -Dtests.file.encoding=US-ASCII [junit4] ERROR 0.09s | TestGeoPointQuery.testRandomTiny <<< I'm not sure why testRandomTiny is hitting all these problems!

            Commit 1681941 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681941 ]

            LUCENE-6481: sometimes use smallish bbox for random test

            jira-bot ASF subversion and git services added a comment - Commit 1681941 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681941 ] LUCENE-6481 : sometimes use smallish bbox for random test

            Commit 1681855 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681855 ]

            LUCENE-6481: go back to 1 thread, no poly query, until we track down testRandomTiny failures

            jira-bot ASF subversion and git services added a comment - Commit 1681855 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681855 ] LUCENE-6481 : go back to 1 thread, no poly query, until we track down testRandomTiny failures

            Commit 1681853 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681853 ]

            LUCENE-6481: add back lost GeoPointInPolygonQuery.isWithin override

            jira-bot ASF subversion and git services added a comment - Commit 1681853 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681853 ] LUCENE-6481 : add back lost GeoPointInPolygonQuery.isWithin override

            I made test a bit more evil (use threads, sometimes randomly do poly query), but I hit this:

               [junit4] Started J0 PID(12760@localhost).
               [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
               [junit4] OK      0.04s | TestGeoPointQuery.testBBoxQuery
               [junit4]   2> de maig 26, 2015 3:30:16 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException
               [junit4]   2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery]
               [junit4]   2> java.lang.OutOfMemoryError: Java heap space
               [junit4]   2> 	at __randomizedtesting.SeedInfo.seed([E57ABD1CCC7F4388]:0)
               [junit4]   2> 	at java.util.LinkedList.toArray(LinkedList.java:1050)
               [junit4]   2> 	at java.util.List.sort(List.java:477)
               [junit4]   2> 	at java.util.Collections.sort(Collections.java:141)
               [junit4]   2> 	at org.apache.lucene.search.GeoPointInBBoxQuery$GeoBBoxTermsEnum.<init>(GeoPointInBBoxQuery.java:157)
               [junit4]   2> 	at org.apache.lucene.search.GeoPointInBBoxQuery.getTermsEnum(GeoPointInBBoxQuery.java:86)
               [junit4]   2> 	at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:345)
               [junit4]   2> 	at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite(MultiTermQueryConstantScoreWrapper.java:146)
               [junit4]   2> 	at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.scorer(MultiTermQueryConstantScoreWrapper.java:212)
               [junit4]   2> 	at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:585)
               [junit4]   2> 	at org.apache.lucene.search.Weight.bulkScorer(Weight.java:137)
               [junit4]   2> 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:72)
               [junit4]   2> 	at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:72)
               [junit4]   2> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560)
               [junit4]   2> 	at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:93)
               [junit4]   2> 	at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:367)
               [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:337)
               [junit4]   2> 	at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:265)
               [junit4]   2> 
               [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=E57ABD1CCC7F4388 -Dtests.locale=ca_ES -Dtests.timezone=America/Rankin_Inlet -Dtests.asserts=true -Dtests.file.encoding=UTF-8
               [junit4] ERROR   49.3s | TestGeoPointQuery.testRandomTiny <<<
            

            It's curious because the testRandomTiny only indexes a few 10s of points...

            mikemccand Michael McCandless added a comment - I made test a bit more evil (use threads, sometimes randomly do poly query), but I hit this: [junit4] Started J0 PID(12760@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] OK 0.04s | TestGeoPointQuery.testBBoxQuery [junit4] 2> de maig 26, 2015 3:30:16 PM com.carrotsearch.randomizedtesting.RandomizedRunner$QueueUncaughtExceptionsHandler uncaughtException [junit4] 2> WARNING: Uncaught exception in thread: Thread[T0,5,TGRP-TestGeoPointQuery] [junit4] 2> java.lang.OutOfMemoryError: Java heap space [junit4] 2> at __randomizedtesting.SeedInfo.seed([E57ABD1CCC7F4388]:0) [junit4] 2> at java.util.LinkedList.toArray(LinkedList.java:1050) [junit4] 2> at java.util.List.sort(List.java:477) [junit4] 2> at java.util.Collections.sort(Collections.java:141) [junit4] 2> at org.apache.lucene.search.GeoPointInBBoxQuery$GeoBBoxTermsEnum.<init>(GeoPointInBBoxQuery.java:157) [junit4] 2> at org.apache.lucene.search.GeoPointInBBoxQuery.getTermsEnum(GeoPointInBBoxQuery.java:86) [junit4] 2> at org.apache.lucene.search.MultiTermQuery.getTermsEnum(MultiTermQuery.java:345) [junit4] 2> at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.rewrite(MultiTermQueryConstantScoreWrapper.java:146) [junit4] 2> at org.apache.lucene.search.MultiTermQueryConstantScoreWrapper$1.scorer(MultiTermQueryConstantScoreWrapper.java:212) [junit4] 2> at org.apache.lucene.search.LRUQueryCache$CachingWrapperWeight.scorer(LRUQueryCache.java:585) [junit4] 2> at org.apache.lucene.search.Weight.bulkScorer(Weight.java:137) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:72) [junit4] 2> at org.apache.lucene.search.AssertingWeight.bulkScorer(AssertingWeight.java:72) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:560) [junit4] 2> at org.apache.lucene.search.AssertingIndexSearcher.search(AssertingIndexSearcher.java:93) [junit4] 2> at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:367) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1._run(TestGeoPointQuery.java:337) [junit4] 2> at org.apache.lucene.search.TestGeoPointQuery$1.run(TestGeoPointQuery.java:265) [junit4] 2> [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=E57ABD1CCC7F4388 -Dtests.locale=ca_ES -Dtests.timezone=America/Rankin_Inlet -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] ERROR 49.3s | TestGeoPointQuery.testRandomTiny <<< It's curious because the testRandomTiny only indexes a few 10s of points...

            Commit 1681849 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681849 ]

            LUCENE-6481: enable threads and random poly query in test

            jira-bot ASF subversion and git services added a comment - Commit 1681849 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681849 ] LUCENE-6481 : enable threads and random poly query in test

            Commit 1681842 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681842 ]

            LUCENE-6481: make explicit separate methods for the lat vs lon cases

            jira-bot ASF subversion and git services added a comment - Commit 1681842 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681842 ] LUCENE-6481 : make explicit separate methods for the lat vs lon cases

            Commit 1681836 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681836 ]

            LUCENE-6481: fix some ant precommit issues; improve test (use threads; randomly test poly query too), but currently disabled; add validation of the incoming lat/lon to poly query; improve query toStrings a bit

            jira-bot ASF subversion and git services added a comment - Commit 1681836 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681836 ] LUCENE-6481 : fix some ant precommit issues; improve test (use threads; randomly test poly query too), but currently disabled; add validation of the incoming lat/lon to poly query; improve query toStrings a bit

            Commit 1681828 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681828 ]

            LUCENE-6481: fix some precommit issues

            jira-bot ASF subversion and git services added a comment - Commit 1681828 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681828 ] LUCENE-6481 : fix some precommit issues

            Commit 1681827 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681827 ]

            LUCENE-6481: woops

            jira-bot ASF subversion and git services added a comment - Commit 1681827 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681827 ] LUCENE-6481 : woops

            I made a branch so we can iterate a bit: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-6481

            I'll make some small fixes to satisfy "ant precommit", and fold some of the improvements to the random test from LUCENE-6477

            mikemccand Michael McCandless added a comment - I made a branch so we can iterate a bit: https://svn.apache.org/repos/asf/lucene/dev/branches/LUCENE-6481 I'll make some small fixes to satisfy "ant precommit", and fold some of the improvements to the random test from LUCENE-6477

            Commit 1681821 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681821 ]

            LUCENE-6481: commit latest patch

            jira-bot ASF subversion and git services added a comment - Commit 1681821 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681821 ] LUCENE-6481 : commit latest patch

            Commit 1681820 from mikemccand in branch 'dev/branches/LUCENE-6481'
            [ https://svn.apache.org/r1681820 ]

            LUCENE-6481: make branch

            jira-bot ASF subversion and git services added a comment - Commit 1681820 from mikemccand in branch 'dev/branches/ LUCENE-6481 ' [ https://svn.apache.org/r1681820 ] LUCENE-6481 : make branch
            nknize Nick Knize added a comment -

            You can use the postings when the cell is wholly contained within the polygon, which wasn't in that last patch. New patch attached to include this logic.

            Boundary cells are still computed as they relate to the bbox. This could be improved by computing boundary cells as they relate to the shape as long as computing the existence of an intersection of the bbox and shape is fast - this is usually the Achilles heel of spatial relations.

            nknize Nick Knize added a comment - You can use the postings when the cell is wholly contained within the polygon, which wasn't in that last patch. New patch attached to include this logic. Boundary cells are still computed as they relate to the bbox. This could be improved by computing boundary cells as they relate to the shape as long as computing the existence of an intersection of the bbox and shape is fast - this is usually the Achilles heel of spatial relations.

            I was curious about the polygon query performance, so I tweaked the "bboxes around London, UK" benchmark to just use a polygon query instead (with 5 points, first and last are the same to the polygon is close), and surprisingly the performance is only a bit slower than the bbox case (19.1 msec vs 17.8 msec) ... I expected much slower because in the polygon case we cannot use the prefix terms, I think?

            mikemccand Michael McCandless added a comment - I was curious about the polygon query performance, so I tweaked the "bboxes around London, UK" benchmark to just use a polygon query instead (with 5 points, first and last are the same to the polygon is close), and surprisingly the performance is only a bit slower than the bbox case (19.1 msec vs 17.8 msec) ... I expected much slower because in the polygon case we cannot use the prefix terms, I think?

            Thanks nknize, tests pass for me now, and I ran the same "bboxes around London, UK" perf test and this is much faster than before (= LUCENE-6450): I now see ~17.8 msec avg per query (~2X faster than GeoHashPrefixTree I think).

            I'll have a closer look at the patch ... I really want to make a visualization here just like the BKD tree ones (https://plus.google.com/+MichaelMcCandless/posts/Daj9FgYPdtv) to see how the morton-shapes "work" in filling in a polygon...

            I think there are fun things we can explore (future!) to speed things up further, e.g. if we also index lat/lon into doc values, then we can use that for filtering, freeing us to also use prefix terms on boundary shapes, and also maybe freeing us to use encodings like Hilbert curves which should give better locality / able to use prefix terms more frequently since you would no longer need a fast decode from term -> lat/lon. But, it would use more disk space... we can also integrate with geo3d so we get shape intersection for faster polygon querying (now it must filter every point?). Later!

            mikemccand Michael McCandless added a comment - Thanks nknize , tests pass for me now, and I ran the same "bboxes around London, UK" perf test and this is much faster than before (= LUCENE-6450 ): I now see ~17.8 msec avg per query (~2X faster than GeoHashPrefixTree I think). I'll have a closer look at the patch ... I really want to make a visualization here just like the BKD tree ones ( https://plus.google.com/+MichaelMcCandless/posts/Daj9FgYPdtv ) to see how the morton-shapes "work" in filling in a polygon... I think there are fun things we can explore (future!) to speed things up further, e.g. if we also index lat/lon into doc values, then we can use that for filtering, freeing us to also use prefix terms on boundary shapes, and also maybe freeing us to use encodings like Hilbert curves which should give better locality / able to use prefix terms more frequently since you would no longer need a fast decode from term -> lat/lon. But, it would use more disk space... we can also integrate with geo3d so we get shape intersection for faster polygon querying (now it must filter every point?). Later!
            nknize Nick Knize added a comment -

            Thanks Mike! Not sure how I missed my own test. Trivial fix though, new patch added and all tests are passing on my end. Next iteration ready for review. This should be ready for sandbox commit blessings.

            nknize Nick Knize added a comment - Thanks Mike! Not sure how I missed my own test. Trivial fix though, new patch added and all tests are passing on my end. Next iteration ready for review. This should be ready for sandbox commit blessings.

            Ooh, awesome progress! I'll look in more detail, but: I applied the patch and hit this test failure:

            ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testBBoxQuery -Dtests.seed=FC153607634373BA -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8
            
               [junit4] Started J0 PID(14260@localhost).
               [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery
               [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testBBoxQuery -Dtests.seed=FC153607634373BA -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8
               [junit4] FAILURE 0.09s | TestGeoPointQuery.testBBoxQuery <<<
               [junit4]    > Throwable #1: java.lang.AssertionError: GeoBoundingBoxQuery failed expected:<1> but was:<2>
               [junit4]    > 	at __randomizedtesting.SeedInfo.seed([FC153607634373BA:5CE7EC6FB0DC9159]:0)
               [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.testBBoxQuery(TestGeoPointQuery.java:113)
               [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
               [junit4]   2> NOTE: leaving temporary files on disk at: /l/nick/lucene/build/sandbox/test/J0/temp/lucene.search.TestGeoPointQuery FC153607634373BA-001
               [junit4]   2> NOTE: test params are: codec=Asserting(Lucene50): {geoField=PostingsFormat(name=MockRandom)}, docValues:{}, sim=DefaultSimilarity, locale=es, timezone=Asia/Jerusalem
               [junit4]   2> NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=385990872,total=504889344
               [junit4]   2> NOTE: All tests run in this JVM: [TestGeoPointQuery]
               [junit4] Completed [1/1] in 0.48s, 1 test, 1 failure <<< FAILURES!
            
            mikemccand Michael McCandless added a comment - Ooh, awesome progress! I'll look in more detail, but: I applied the patch and hit this test failure: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testBBoxQuery -Dtests.seed=FC153607634373BA -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] Started J0 PID(14260@localhost). [junit4] Suite: org.apache.lucene.search.TestGeoPointQuery [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testBBoxQuery -Dtests.seed=FC153607634373BA -Dtests.nightly=true -Dtests.slow=true -Dtests.linedocsfile=/lucenedata/hudson.enwiki.random.lines.txt.fixed -Dtests.locale=es -Dtests.timezone=Asia/Jerusalem -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 0.09s | TestGeoPointQuery.testBBoxQuery <<< [junit4] > Throwable #1: java.lang.AssertionError: GeoBoundingBoxQuery failed expected:<1> but was:<2> [junit4] > at __randomizedtesting.SeedInfo.seed([FC153607634373BA:5CE7EC6FB0DC9159]:0) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.testBBoxQuery(TestGeoPointQuery.java:113) [junit4] > at java.lang.Thread.run(Thread.java:745) [junit4] 2> NOTE: leaving temporary files on disk at: /l/nick/lucene/build/sandbox/test/J0/temp/lucene.search.TestGeoPointQuery FC153607634373BA-001 [junit4] 2> NOTE: test params are: codec=Asserting(Lucene50): {geoField=PostingsFormat(name=MockRandom)}, docValues:{}, sim=DefaultSimilarity, locale=es, timezone=Asia/Jerusalem [junit4] 2> NOTE: Linux 3.13.0-46-generic amd64/Oracle Corporation 1.8.0_40 (64-bit)/cpus=8,threads=1,free=385990872,total=504889344 [junit4] 2> NOTE: All tests run in this JVM: [TestGeoPointQuery] [junit4] Completed [1/1] in 0.48s, 1 test, 1 failure <<< FAILURES!
            nknize Nick Knize added a comment -

            This patch is ready to go. In fact, this performance improvement should supersede the existing patch in LUCENE-6450.

            nknize Nick Knize added a comment - This patch is ready to go. In fact, this performance improvement should supersede the existing patch in LUCENE-6450 .
            nknize Nick Knize added a comment -

            Updated patch to fix false negatives. This now improves performance of LUCENE-6450 to 0.02 sec / query by using the postings list instead of visiting every term.

            nknize Nick Knize added a comment - Updated patch to fix false negatives. This now improves performance of LUCENE-6450 to 0.02 sec / query by using the postings list instead of visiting every term.
            nknize Nick Knize added a comment -

            Minor patch update that adds geodesic to geodetic projection / reprojection methods to GeoUtils.

            nknize Nick Knize added a comment - Minor patch update that adds geodesic to geodetic projection / reprojection methods to GeoUtils.

            The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery.

            Oh, woops Thanks for fixing1

            mikemccand Michael McCandless added a comment - The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery. Oh, woops Thanks for fixing1
            nknize Nick Knize added a comment -

            The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery. I've attached a new patch with the correction.

            testRandomTiny passes but there is one failure in testRandom with the following:

            ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true
            
               [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.slow=true -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8
               [junit4] FAILURE 1.54s | TestGeoPointQuery.testRandom <<<
               [junit4]    > Throwable #1: java.lang.AssertionError: id=632 docID=613 lat=46.19240875459866 lon=143.92476891121902 expected true but got: false deleted?=false
               [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:83A81A5CC1FB49F1]:0)
               [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:302)
               [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:204)
               [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.testRandom(TestGeoPointQuery.java:130)
               [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
            

            This should be enough to debug the issue. I expect to have a new patch sometime tomorrow or before weeks end.

            nknize Nick Knize added a comment - The test had the lat and lon ordering incorrect for both GeoPointFieldType and the GeoPointInBBoxQuery. I've attached a new patch with the correction. testRandomTiny passes but there is one failure in testRandom with the following: ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandom -Dtests.seed=F1E43F53709BFF82 -Dtests.slow=true -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 1.54s | TestGeoPointQuery.testRandom <<< [junit4] > Throwable #1: java.lang.AssertionError: id=632 docID=613 lat=46.19240875459866 lon=143.92476891121902 expected true but got: false deleted?=false [junit4] > at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:83A81A5CC1FB49F1]:0) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:302) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:204) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.testRandom(TestGeoPointQuery.java:130) [junit4] > at java.lang.Thread.run(Thread.java:745) This should be enough to debug the issue. I expect to have a new patch sometime tomorrow or before weeks end.

            New patch, starting from nknize's and then folding in the evilish random test I added for LUCENE-6477 ... maybe this can help debug why there are false negatives?

            E.g. with this patch when I run:

            ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true
            

            It fails with this:

               [junit4]   2> NOTE: reproduce with: ant test  -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8
               [junit4] FAILURE 2.91s | TestGeoPointQuery.testRandomTiny <<<
               [junit4]    > Throwable #1: java.lang.AssertionError: id=0 docID=0 lat=-27.180279388889545 lon=-167.14191331870592 expected true but got: false deleted?=false
               [junit4]    > 	at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:B8A3E1152EBAC72E]:0)
               [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:301)
               [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:203)
               [junit4]    > 	at org.apache.lucene.search.TestGeoPointQuery.testRandomTiny(TestGeoPointQuery.java:125)
               [junit4]    > 	at java.lang.Thread.run(Thread.java:745)
            

            The test case should be easy-ish to debug: it only indexes at most a few 10s of points...

            mikemccand Michael McCandless added a comment - New patch, starting from nknize 's and then folding in the evilish random test I added for LUCENE-6477 ... maybe this can help debug why there are false negatives? E.g. with this patch when I run: ant test -Dtestcase=TestGeoPointQuery -Dtestmethod=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.verbose=true It fails with this: [junit4] 2> NOTE: reproduce with: ant test -Dtestcase=TestGeoPointQuery -Dtests.method=testRandomTiny -Dtests.seed=F1E43F53709BFF82 -Dtests.locale=en_US -Dtests.timezone=Africa/Lome -Dtests.asserts=true -Dtests.file.encoding=UTF-8 [junit4] FAILURE 2.91s | TestGeoPointQuery.testRandomTiny <<< [junit4] > Throwable #1: java.lang.AssertionError: id=0 docID=0 lat=-27.180279388889545 lon=-167.14191331870592 expected true but got: false deleted?=false [junit4] > at __randomizedtesting.SeedInfo.seed([F1E43F53709BFF82:B8A3E1152EBAC72E]:0) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.verify(TestGeoPointQuery.java:301) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.doTestRandom(TestGeoPointQuery.java:203) [junit4] > at org.apache.lucene.search.TestGeoPointQuery.testRandomTiny(TestGeoPointQuery.java:125) [junit4] > at java.lang.Thread.run(Thread.java:745) The test case should be easy-ish to debug: it only indexes at most a few 10s of points...
            nknize Nick Knize added a comment -

            First cut WIP patch. LuceneUtil benchmark shows false negatives, though, so this is definitely not ready. So far I've been unable to reproduce the false negatives...I put it here for iterating improvements.

            GeoPointField

            Index Time: 640.24 sec
            Index Size: 4.4G
            Mean Query Time: 0.02 sec

            nknize Nick Knize added a comment - First cut WIP patch. LuceneUtil benchmark shows false negatives, though, so this is definitely not ready. So far I've been unable to reproduce the false negatives...I put it here for iterating improvements. GeoPointField Index Time: 640.24 sec Index Size: 4.4G Mean Query Time: 0.02 sec

            People

              Unassigned Unassigned
              nknize Nick Knize
              Votes:
              0 Vote for this issue
              Watchers:
              5 Start watching this issue

              Dates

                Created:
                Updated:
                Resolved: