Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7159

improve spatial point/rect vs. polygon performance

    Details

    • Type: Bug
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, master (7.0)
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      Now that we can query on complex polygons without going OOM (LUCENE-7153), we should do something to address the current 🐢 performance.

      Currently, we use a basic crossings test (O(n)) for boundary cases. We defer these expensive per-doc checks on boundary cases to a two phase iterator (LUCENE-7019, LUCENE-7109), so that it can be avoided if e.g. excluded by filters, conjunctions, deleted doc, and so on. This is currently important for performance, but basically its shoving the problem under the rug and hoping it goes away. At least for point in poly, there are a number of faster techniques described here: http://erich.realtimerendering.com/ptinpoly/

      Additionally I am not sure how costly our "tree traversal" (rectangle intersection algorithms). Maybe its nothing to be worried about, but likely it too gets bad if the thing gets complex enough. These don't need to be perfect but need to behave like java's Shape#contains (can conservatively return false), and Shape#intersects (can conservatively return true). Of course, if they are too inaccurate, then things can get slower.

      In cases of precomputed structures we should also consider memory usage: e.g. we shouldn't make a horrible tradeoff there.

      1. LUCENE-7159.patch
        31 kB
        Robert Muir
      2. LUCENE-7159.patch
        26 kB
        Robert Muir

        Issue Links

          Activity

          Hide
          rcmuir Robert Muir added a comment -

          Note: for the second "tree traversal" part we might already have better algorithms in git: when I added the Polygon class in LUCENE-7153 it used the exact algorithms (for some reason GeoPoint was using different approximate ones with less guarantees).

          But these guarantees might be compatible yet still with those approximate methods: the thing that is missing is edge case testing for these operations. If they are faster and still safe (or can be easily made safe), maybe we should restore them and cut the polygon class over to them!

          These tests don't need to be complicated, e.g. for multipolygon/hole support I added this one (https://github.com/apache/lucene-solr/blob/81c83b443182cb5869079924637a4d43e9e7917e/lucene/spatial/src/test/org/apache/lucene/spatial/util/TestPolygon.java#L63-L91). We can add randomized tests too: but we don't need to index lucene documents or perform searches to do this and it will be more efficient and easier to debug/improve things.

          Show
          rcmuir Robert Muir added a comment - Note: for the second "tree traversal" part we might already have better algorithms in git: when I added the Polygon class in LUCENE-7153 it used the exact algorithms (for some reason GeoPoint was using different approximate ones with less guarantees). But these guarantees might be compatible yet still with those approximate methods: the thing that is missing is edge case testing for these operations. If they are faster and still safe (or can be easily made safe), maybe we should restore them and cut the polygon class over to them! These tests don't need to be complicated, e.g. for multipolygon/hole support I added this one ( https://github.com/apache/lucene-solr/blob/81c83b443182cb5869079924637a4d43e9e7917e/lucene/spatial/src/test/org/apache/lucene/spatial/util/TestPolygon.java#L63-L91 ). We can add randomized tests too: but we don't need to index lucene documents or perform searches to do this and it will be more efficient and easier to debug/improve things.
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit ef6a0d001361068a37dc2256cd33c11d771c653a in lucene-solr's branch refs/heads/master from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ef6a0d0 ]

          LUCENE-7159: improve testing of polygon tree methods

          Show
          jira-bot ASF subversion and git services added a comment - Commit ef6a0d001361068a37dc2256cd33c11d771c653a in lucene-solr's branch refs/heads/master from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=ef6a0d0 ] LUCENE-7159 : improve testing of polygon tree methods
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit bb2e01c3d913894062fff3f18bd5d0256f884185 in lucene-solr's branch refs/heads/branch_6x from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb2e01c ]

          LUCENE-7159: improve testing of polygon tree methods

          Show
          jira-bot ASF subversion and git services added a comment - Commit bb2e01c3d913894062fff3f18bd5d0256f884185 in lucene-solr's branch refs/heads/branch_6x from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bb2e01c ] LUCENE-7159 : improve testing of polygon tree methods
          Hide
          rcmuir Robert Muir added a comment -

          Here is a patch with an integer grid for LatLonPoint's polygon query, and some speedups for relation logic.

          The relation logic is still the bottleneck for humongous polygons... so this does not solve all perf issues.

          Show
          rcmuir Robert Muir added a comment - Here is a patch with an integer grid for LatLonPoint's polygon query, and some speedups for relation logic. The relation logic is still the bottleneck for humongous polygons... so this does not solve all perf issues.
          Hide
          mikemccand Michael McCandless added a comment -

          +1, cool!

          Show
          mikemccand Michael McCandless added a comment - +1, cool!
          Hide
          rcmuir Robert Muir added a comment -

          I added javadocs, asserts, a direct test, cleaned up the internal API, and re-ran benchmarks with mike's openstreetmap benchmark:

          Point in polygon only becomes a bottleneck for complex polygons, so it really only helps there, but doesn't hurt simple ones.

          More than half the gain here is unrelated to the grid, and only due to very minor optimizations in Polygon.relates(), which actually is the bigger bottleneck.

          I think we can do this as a start, and then as a followup implement relates() too? We should be able to speed those up in a similar way, most calls would then be rectangle <-> rectangle operations in integer space.

          And of course it would still be nice to speed up the relation operations in Polygon.java (e.g. compute if convex in ctor or whatever), but that stuff is complex.

          n=5 25.9 QPS -> 27.0 QPS
          n=50 18.2 QPS -> 18.1 QPS
          n=500 8.0 QPS -> 9.5 QPS
          n=1000 5.0 QPS -> 6.2 QPS
          
          Show
          rcmuir Robert Muir added a comment - I added javadocs, asserts, a direct test, cleaned up the internal API, and re-ran benchmarks with mike's openstreetmap benchmark: Point in polygon only becomes a bottleneck for complex polygons, so it really only helps there, but doesn't hurt simple ones. More than half the gain here is unrelated to the grid, and only due to very minor optimizations in Polygon.relates(), which actually is the bigger bottleneck. I think we can do this as a start, and then as a followup implement relates() too? We should be able to speed those up in a similar way, most calls would then be rectangle <-> rectangle operations in integer space. And of course it would still be nice to speed up the relation operations in Polygon.java (e.g. compute if convex in ctor or whatever), but that stuff is complex. n=5 25.9 QPS -> 27.0 QPS n=50 18.2 QPS -> 18.1 QPS n=500 8.0 QPS -> 9.5 QPS n=1000 5.0 QPS -> 6.2 QPS
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit c1a3e1b8d04ffc94e502b086e0544c0e0494d5a8 in lucene-solr's branch refs/heads/master from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c1a3e1b ]

          LUCENE-7159: Speed up LatLonPoint point-in-polygon performance

          Show
          jira-bot ASF subversion and git services added a comment - Commit c1a3e1b8d04ffc94e502b086e0544c0e0494d5a8 in lucene-solr's branch refs/heads/master from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=c1a3e1b ] LUCENE-7159 : Speed up LatLonPoint point-in-polygon performance
          Hide
          jira-bot ASF subversion and git services added a comment -

          Commit 1113443fc683d80720efd42706cb932b00f2cba8 in lucene-solr's branch refs/heads/branch_6x from Robert Muir
          [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1113443 ]

          LUCENE-7159: Speed up LatLonPoint point-in-polygon performance

          Show
          jira-bot ASF subversion and git services added a comment - Commit 1113443fc683d80720efd42706cb932b00f2cba8 in lucene-solr's branch refs/heads/branch_6x from Robert Muir [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=1113443 ] LUCENE-7159 : Speed up LatLonPoint point-in-polygon performance
          Hide
          mikemccand Michael McCandless added a comment -

          Can this be closed now @rcmuir?

          Show
          mikemccand Michael McCandless added a comment - Can this be closed now @rcmuir?
          Hide
          rcmuir Robert Muir added a comment -

          we are hobbling...

          Show
          rcmuir Robert Muir added a comment - we are hobbling...
          Hide
          hossman Hoss Man added a comment -

          Manually correcting fixVersion per Step #S5 of LUCENE-7271

          Show
          hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

            People

            • Assignee:
              Unassigned
              Reporter:
              rcmuir Robert Muir
            • Votes:
              0 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development