Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7153

give GeoPoint and LatLonPoint full polygon support

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, master (7.0)
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      These two geo impls have a very limited polygon support that does not support inner rings (holes) or multiple outer rings efficiently.

      Basically if you want to do this, you are left building crazy logic with booleanquery which will send memory into the gigabytes for a single query, needlessly.

      For example Russia polygon from geonames is 250KB of geojson and over a thousand outer rings.

      We should instead support this stuff with the queries themselves, especially it will allow us to implement things more efficiently in the future.

      I think instead of newPolygonQuery(double[], double[]) it should look like newPolygonQuery(Polygon...). A polygon can be a single outer ring (shape) with 0 or more inner rings (holes). No nesting, you just use multiply polygons if you e.g. have an island.

      See http://esri.github.io/geometry-api-java/doc/Polygon.html for visuals and examples. I indented their GeoJSON example:

      {
        "type":"MultiPolygon",
        "coordinates": [
           // first polygon (order does not matter could have been last instead)
           [
             // clockwise => outer ring
             [[0.0,0.0],[-0.5,0.5],[0.0,1.0],[0.5,1.0],[1.0,0.5],[0.5,0.0],[0.0,0.0]],
             // hole
             [[0.5,0.2],[0.6,0.5],[0.2,0.9],[-0.2,0.5],[0.1,0.2],[0.2,0.3],[0.5,0.2]]
           ],
           // second polygon (order does not matter, could have been first instead)
           [ 
             // island
             [[0.1,0.7],[0.3,0.7],[0.3,0.4],[0.1,0.4],[0.1,0.7]]
           ]
        ],
      }
      
      1. LUCENE-7153.patch
        78 kB
        Robert Muir
      2. LUCENE-7153.patch
        73 kB
        Robert Muir
      3. LUCENE-7153.patch
        62 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          rcmuir Robert Muir added a comment -

          Here is a patch. I hate adding a public class, but I think its needed here. Its completely immutable like String and has some helper methods for multipolygon logic. I added "hole" support to the existing traversal operations and some simple tests.

          Also, GeoPoint moves from approximate to precise implementation. Today we have two different ones, for some reason it uses approximate. So this solves LUCENE-7145 too. Since GeoPoint is not in the sandbox i deprecated its current two ctors and added two new ones.

          I benchmarked performance of single polygons versus master with Mike's london benchmark (these are circle-like approximations of varying number of vertices n):

          LatLonPoint
          n=5   25.7 QPS -> 26.2 QPS
          n=50  17.8 QPS -> 18.3 QPS
          n=500 7.5 QPS  -> 8.0 QPS
          
          GeoPointField
          n=5   19.3 QPS -> 19.0 QPS
          n=50  11.3 QPS -> 12.0 QPS
          n=500 4.0 QPS  -> 3.8 QPS
          

          For LatLonPoint the patch is an improvement as the polygon complexity grows, thats because bounding box was not always used before.

          For GeoPoint, its faster in some and slower in other cases. But still very close to the same performance. And LatLonPoint is faster and the patch makes it even more so, so I think things are ok.

          Show
          rcmuir Robert Muir added a comment - Here is a patch. I hate adding a public class, but I think its needed here. Its completely immutable like String and has some helper methods for multipolygon logic. I added "hole" support to the existing traversal operations and some simple tests. Also, GeoPoint moves from approximate to precise implementation. Today we have two different ones, for some reason it uses approximate. So this solves LUCENE-7145 too. Since GeoPoint is not in the sandbox i deprecated its current two ctors and added two new ones. I benchmarked performance of single polygons versus master with Mike's london benchmark (these are circle-like approximations of varying number of vertices n ): LatLonPoint n=5 25.7 QPS -> 26.2 QPS n=50 17.8 QPS -> 18.3 QPS n=500 7.5 QPS -> 8.0 QPS GeoPointField n=5 19.3 QPS -> 19.0 QPS n=50 11.3 QPS -> 12.0 QPS n=500 4.0 QPS -> 3.8 QPS For LatLonPoint the patch is an improvement as the polygon complexity grows, thats because bounding box was not always used before. For GeoPoint, its faster in some and slower in other cases. But still very close to the same performance. And LatLonPoint is faster and the patch makes it even more so, so I think things are ok.
          Hide
          rcmuir Robert Muir added a comment -

          Also cleans up the old logic (sorry, i forgot this in the first patch) and ports over the pacman test.

          Show
          rcmuir Robert Muir added a comment - Also cleans up the old logic (sorry, i forgot this in the first patch) and ports over the pacman test.
          Hide
          mikemccand Michael McCandless added a comment -

          +1, patch looks very clean!

          And without this approach, it's essentially hopeless to run with "real world" polygons: I'm trying to add them to the London, UK benchmark now, and hitting OOME with 10 GB heap! I will test with this patch.

          Show
          mikemccand Michael McCandless added a comment - +1, patch looks very clean! And without this approach, it's essentially hopeless to run with "real world" polygons: I'm trying to add them to the London, UK benchmark now, and hitting OOME with 10 GB heap! I will test with this patch.
          Hide
          rcmuir Robert Muir added a comment -

          See discussion with karl on LUCENE-7154, I think there is a bug with holes. We just need simple unit tests for the possibilities in TestPolygon, I will add these until it fails and upload a new patch.

          Show
          rcmuir Robert Muir added a comment - See discussion with karl on LUCENE-7154 , I think there is a bug with holes. We just need simple unit tests for the possibilities in TestPolygon, I will add these until it fails and upload a new patch.
          Hide
          rcmuir Robert Muir added a comment -

          Also because of how crosses() depends on bounding box checks we need to move this logic out of e.g. LatLonPoint's KD intersector into crosses:

                                       if (cellMinLat <= minLat && cellMaxLat >= maxLat && cellMinLon <= minLon && cellMaxLon >= maxLon) {
                                         // Cell fully encloses the query
                                         return Relation.CELL_CROSSES_QUERY;
          
          Show
          rcmuir Robert Muir added a comment - Also because of how crosses() depends on bounding box checks we need to move this logic out of e.g. LatLonPoint's KD intersector into crosses: if (cellMinLat <= minLat && cellMaxLat >= maxLat && cellMinLon <= minLon && cellMaxLon >= maxLon) { // Cell fully encloses the query return Relation.CELL_CROSSES_QUERY;
          Hide
          rcmuir Robert Muir added a comment -

          I added unit tests for outer+hole+island and test the possibilities for crosses/contains. I fixed crosses() to itself have the "fully enclosed" check (trap waiting to happen), and for crosses/contains to properly work with holes...

          LatLonPoint encodes the bounding box of all outer rings into one big box (with binary encoding) for fast filtering. it doesn't matter much for a single outer ring (simple polygon) case, but this can be easy win for complex ones. It also makes the two-phase iterator have a slightly better approximation for edge cases (like distance does).

          I think this is ok as a start? We can improve javadocs and tests and performance on later issues.

          Show
          rcmuir Robert Muir added a comment - I added unit tests for outer+hole+island and test the possibilities for crosses/contains. I fixed crosses() to itself have the "fully enclosed" check (trap waiting to happen), and for crosses/contains to properly work with holes... LatLonPoint encodes the bounding box of all outer rings into one big box (with binary encoding) for fast filtering. it doesn't matter much for a single outer ring (simple polygon) case, but this can be easy win for complex ones. It also makes the two-phase iterator have a slightly better approximation for edge cases (like distance does). I think this is ok as a start? We can improve javadocs and tests and performance on later issues.
          Hide
          kwright@metacarta.com Karl Wright added a comment -

          Robert Muir: FWIW, since the API for polygon queries now uses Polygon instances, and those are local to spatial, I can't make a compatible API for Geo3DPoint without some repackaging of these spatial util classes.

          Show
          kwright@metacarta.com Karl Wright added a comment - Robert Muir : FWIW, since the API for polygon queries now uses Polygon instances, and those are local to spatial, I can't make a compatible API for Geo3DPoint without some repackaging of these spatial util classes.
          Hide
          rcmuir Robert Muir added a comment -

          I know, i saw the issue. We have to fix this situation somehow for other reasons too (like even simple lat/lon verification checks). Hopefully we can get this figured out soon. Otherwise it makes it too hard for us to benchmark,test,compare geo3d.

          Show
          rcmuir Robert Muir added a comment - I know, i saw the issue. We have to fix this situation somehow for other reasons too (like even simple lat/lon verification checks). Hopefully we can get this figured out soon. Otherwise it makes it too hard for us to benchmark,test,compare geo3d.
          Hide
          mikemccand Michael McCandless added a comment -

          FWIW, since the API for polygon queries now uses Polygon instances, and those are local to spatial, I can't make a compatible API for Geo3DPoint without some repackaging of these spatial util classes.

          I agree this is horrible

          Until we can sort this out, more cleanly, I think you (Karl Wright) should just poach/fork Polygon.java into geo3d with a TODO to share again (just like what I did with checkLatitude/Longitude)? It's by far the lesser evil imo: we (devs) should bear this burden, not our users. We are still discussing this in LUCENE-7152...

          I think common-to-all-spatial-implementations like Polygon should be promoted to lucene core, along with basics like earth's various non-controversial radii, methods to validate lat/lon/radians, etc.: these are fundamental non-in-dispute constants and methods and classes.

          I like the purity of the separate competing spatial implementations and I see adding a dep from spatial3d to spatial as dangerous: it sends the message that our 3D math somehow depends on 2D projected math. I think we should avoid that, and all of this competition between wildly different geo implementations is very healthy.

          Separately I think at some point soon we should promote the "backed by points" queries in sandbox (LatLonPoint, InetAddressPoint, BigIntegerPoint) to core: they really are no longer very sandy, they exposed juicy early bugs in points, they have no index format requirements since they build on points which has strong back compat guarantees.

          Show
          mikemccand Michael McCandless added a comment - FWIW, since the API for polygon queries now uses Polygon instances, and those are local to spatial, I can't make a compatible API for Geo3DPoint without some repackaging of these spatial util classes. I agree this is horrible Until we can sort this out, more cleanly, I think you ( Karl Wright ) should just poach/fork Polygon.java into geo3d with a TODO to share again (just like what I did with checkLatitude/Longitude )? It's by far the lesser evil imo: we (devs) should bear this burden, not our users. We are still discussing this in LUCENE-7152 ... I think common-to-all-spatial-implementations like Polygon should be promoted to lucene core, along with basics like earth's various non-controversial radii, methods to validate lat/lon/radians, etc.: these are fundamental non-in-dispute constants and methods and classes. I like the purity of the separate competing spatial implementations and I see adding a dep from spatial3d to spatial as dangerous: it sends the message that our 3D math somehow depends on 2D projected math. I think we should avoid that, and all of this competition between wildly different geo implementations is very healthy. Separately I think at some point soon we should promote the "backed by points" queries in sandbox ( LatLonPoint, InetAddressPoint, BigIntegerPoint ) to core: they really are no longer very sandy, they exposed juicy early bugs in points, they have no index format requirements since they build on points which has strong back compat guarantees.
          Hide
          dsmiley David Smiley added a comment -

          There has been some great leaps and bounds in spatial recently and that's awesome (thanks to Karl, Nick, and Mike) but I don't understand the desire for lucene core to have spatial stuff. Sigh So what if spatial3d might depend on spatial; it'd kind of obvious to me it would (or the other way around) given that there is now a general Polygon abstraction that the Geo3D impl wants to use. Heck, maybe spatial should be renamed spatial-core though I think spatial is just fine. Just because there is some Euclidean/Cartesian (2D) math in spatial shouldn't imply that anything that depends on this module uses it. The highlighter module has 3 highlighters (and they work quite differently); the suggester module has a bunch of those, etc. The join module... you get the idea. I think that's the main take-away from my perspective. Likewise, this could be said for the spatial3d module to an outsider. Should someone want to use the cool/novel surface-of-sphere (or ellipsoid) math that is so hard to come by out there, they need not use the 2% of the code that uses both the 3D math & Lucene.

          I think it's kinda crazy to fork Polygon and whatever else out of the spatial module into some other spatial module; surely that is the least favorable option above any sort of dependency. spatial is our core/main spatial module. Just add a dependency on it. Would it be easier for you to consider the idea that spatial might depend on spatial3d instead? Then the very small part of spatial3d could go to spatial, and spatial3d would be pure math and dependency-less – I would like that quite a bit. What do you think? The only thing I wouldn't like so much is what it says about the move of spatial-extras to where it is... it says spatial can have a dependency now too, but maybe spatial3d is an exception/okay. I'm good with it.

          Show
          dsmiley David Smiley added a comment - There has been some great leaps and bounds in spatial recently and that's awesome (thanks to Karl, Nick, and Mike) but I don't understand the desire for lucene core to have spatial stuff. Sigh So what if spatial3d might depend on spatial ; it'd kind of obvious to me it would (or the other way around) given that there is now a general Polygon abstraction that the Geo3D impl wants to use. Heck, maybe spatial should be renamed spatial-core though I think spatial is just fine. Just because there is some Euclidean/Cartesian (2D) math in spatial shouldn't imply that anything that depends on this module uses it. The highlighter module has 3 highlighters (and they work quite differently); the suggester module has a bunch of those, etc. The join module... you get the idea. I think that's the main take-away from my perspective. Likewise, this could be said for the spatial3d module to an outsider. Should someone want to use the cool/novel surface-of-sphere (or ellipsoid) math that is so hard to come by out there, they need not use the 2% of the code that uses both the 3D math & Lucene. I think it's kinda crazy to fork Polygon and whatever else out of the spatial module into some other spatial module; surely that is the least favorable option above any sort of dependency. spatial is our core/main spatial module. Just add a dependency on it. Would it be easier for you to consider the idea that spatial might depend on spatial3d instead? Then the very small part of spatial3d could go to spatial , and spatial3d would be pure math and dependency-less – I would like that quite a bit. What do you think? The only thing I wouldn't like so much is what it says about the move of spatial-extras to where it is... it says spatial can have a dependency now too, but maybe spatial3d is an exception/okay. I'm good with it.
          Hide
          rcmuir Robert Muir added a comment -

          Can we migrate this discussion to LUCENE-7152? I do think it is important, but I don't think it needs to hold up this change.

          To me this change is technical, we use 1 bitset of size O(maxdoc) instead of N bitsets (where N = number of polygon rings). It makes it feasible to query on complicated polygons without hitting OutOfMemoryError.

          Show
          rcmuir Robert Muir added a comment - Can we migrate this discussion to LUCENE-7152 ? I do think it is important, but I don't think it needs to hold up this change. To me this change is technical, we use 1 bitset of size O(maxdoc) instead of N bitsets (where N = number of polygon rings). It makes it feasible to query on complicated polygons without hitting OutOfMemoryError.
          Hide
          mikemccand Michael McCandless added a comment -

          +1 to commit the last patch: this is an immense step forward. Thanks Robert Muir!

          Show
          mikemccand Michael McCandless added a comment - +1 to commit the last patch: this is an immense step forward. Thanks Robert Muir !
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 81c83b443182cb5869079924637a4d43e9e7917e in lucene-solr's branch refs/heads/master from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=81c83b4 ]

          LUCENE-7153: give GeoPointField and LatLonPoint full polygon support

          Show
          jira-bot ASF subversion and git services added a comment - Commit 81c83b443182cb5869079924637a4d43e9e7917e in lucene-solr's branch refs/heads/master from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=81c83b4 ] LUCENE-7153 : give GeoPointField and LatLonPoint full polygon support
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 2c0a8ed418934732fafa4d2bca0c757cca1a42b0 in lucene-solr's branch refs/heads/branch_6x from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2c0a8ed ]

          LUCENE-7153: give GeoPointField and LatLonPoint full polygon support

          Show
          jira-bot ASF subversion and git services added a comment - Commit 2c0a8ed418934732fafa4d2bca0c757cca1a42b0 in lucene-solr's branch refs/heads/branch_6x from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=2c0a8ed ] LUCENE-7153 : give GeoPointField and LatLonPoint full polygon support
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit cc75be53f9b3b86ec59cb93896c4fd5a9a5926b2 in lucene-solr's branch refs/heads/master from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc75be5 ]

          LUCENE-7153: remove extra PIP copy, this is in Polygon.java now

          Show
          jira-bot ASF subversion and git services added a comment - Commit cc75be53f9b3b86ec59cb93896c4fd5a9a5926b2 in lucene-solr's branch refs/heads/master from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=cc75be5 ] LUCENE-7153 : remove extra PIP copy, this is in Polygon.java now
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 50909527d038b7cff1f8dcb31d5343667956b790 in lucene-solr's branch refs/heads/branch_6x from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5090952 ]

          LUCENE-7153: remove extra PIP copy, this is in Polygon.java now

          Show
          jira-bot ASF subversion and git services added a comment - Commit 50909527d038b7cff1f8dcb31d5343667956b790 in lucene-solr's branch refs/heads/branch_6x from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=5090952 ] LUCENE-7153 : remove extra PIP copy, this is in Polygon.java now
          Hide
          hossman Hoss Man added a comment -

          Manually correcting fixVersion per Step #S5 of LUCENE-7271

          Show
          hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              4 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development