Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7147

Improve disjoint check for geo distance query traversal

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, 7.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      When doing geo distance queries, it is important to avoid traversing subtrees which do not contain any relevant points. We currently have checks which compare the bbox of the query to the bounds of the subtree. However, it is possible for a subtree to overlap the bbox, but still not intersect the query. This issue is to improve that check to avoid unnecessary traversals.

      1. example-crosses-axis-not-center.html
        23 kB
        Ryan Ernst
      2. example-intersects-bbox-not-circle.html
        23 kB
        Ryan Ernst
      3. LUCENE-7147.patch
        27 kB
        Ryan Ernst

        Activity

        Hide
        hossman Hoss Man added a comment -

        Manually correcting fixVersion per Step #S5 of LUCENE-7271

        Show
        hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271
        Hide
        dweiss Dawid Weiss added a comment -

        This is awesome, thanks Ryan! Btw. the crosshair pointer in Poland nearly points at my house, lol.

        Show
        dweiss Dawid Weiss added a comment - This is awesome, thanks Ryan! Btw. the crosshair pointer in Poland nearly points at my house, lol.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 0cf26bf368f9e08346df80f025ad021d5b3dbfe1 in lucene-solr's branch refs/heads/branch_6x from Ryan Ernst
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0cf26bf ]

        LUCENE-7147: Improve disjoint check for geo distance query traversal

        Show
        jira-bot ASF subversion and git services added a comment - Commit 0cf26bf368f9e08346df80f025ad021d5b3dbfe1 in lucene-solr's branch refs/heads/branch_6x from Ryan Ernst [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0cf26bf ] LUCENE-7147 : Improve disjoint check for geo distance query traversal
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 045659533cdbcc7c57a38cd2aa0278312011da43 in lucene-solr's branch refs/heads/master from Ryan Ernst
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0456595 ]

        LUCENE-7147: Improve disjoint check for geo distance query traversal

        Show
        jira-bot ASF subversion and git services added a comment - Commit 045659533cdbcc7c57a38cd2aa0278312011da43 in lucene-solr's branch refs/heads/master from Ryan Ernst [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0456595 ] LUCENE-7147 : Improve disjoint check for geo distance query traversal
        Show
        mikemccand Michael McCandless added a comment - Those are great examples Ryan Ernst , thanks! I put them here: http://home.apache.org/~mikemccand/example-crosses-axis-not-center.html http://home.apache.org/~mikemccand/example-intersects-bbox-not-circle.html
        Hide
        rjernst Ryan Ernst added a comment - - edited

        I've attached 2 examples. They both use the same geo distance query. Imagine someone from London says "I want to find places to travel within 1170km". So they do a distance search. The first example shows how this would require us to compute the distances from london to all the points indexes in the box covering Italy. The second example shows how if we were to use the naive approach of using the latitude of the circle center, we would incorrectly exclude a chunk of Poland thinking it did not intersect the circle because all corners are outside the circle, and it lies completely above the circle center latitude, but in fact it crosses the axis of the bbox.

        Show
        rjernst Ryan Ernst added a comment - - edited I've attached 2 examples. They both use the same geo distance query. Imagine someone from London says "I want to find places to travel within 1170km". So they do a distance search. The first example shows how this would require us to compute the distances from london to all the points indexes in the box covering Italy. The second example shows how if we were to use the naive approach of using the latitude of the circle center, we would incorrectly exclude a chunk of Poland thinking it did not intersect the circle because all corners are outside the circle, and it lies completely above the circle center latitude, but in fact it crosses the axis of the bbox.
        Hide
        mikemccand Michael McCandless added a comment -

        For comparison, do you remember what is the equivalent geo3d timing?

        I'll re-test geo3d!

        Show
        mikemccand Michael McCandless added a comment - For comparison, do you remember what is the equivalent geo3d timing? I'll re-test geo3d!
        Hide
        rcmuir Robert Muir added a comment -

        Think of a circle and turn it into "crosshairs" like looking through the scope of a gun

        If the rectangle does not cross one of these "crosshair" lines (axes) then we can use its corners to exclude it. That is because that is the "fattest" part of the circle. If the rectangle crosses one of those, corners are not the closest point But on the earth, things look different, so we have to do a little more work to compute the latitude axis (horizontal crosshair line) to accomodate that: that is ryan's axisLat method.

        Show
        rcmuir Robert Muir added a comment - Think of a circle and turn it into "crosshairs" like looking through the scope of a gun If the rectangle does not cross one of these "crosshair" lines (axes) then we can use its corners to exclude it. That is because that is the "fattest" part of the circle. If the rectangle crosses one of those, corners are not the closest point But on the earth, things look different, so we have to do a little more work to compute the latitude axis (horizontal crosshair line) to accomodate that: that is ryan's axisLat method.
        Hide
        dweiss Dawid Weiss added a comment -

        Yep, this is exactly the kind of collision checking I had in mind (rect-rect vs. circle-rect), I just wanted to see how it works, visualized. That's fine – I see it in the patch, I'll take a look, thanks for the explanation though.

        Show
        dweiss Dawid Weiss added a comment - Yep, this is exactly the kind of collision checking I had in mind (rect-rect vs. circle-rect), I just wanted to see how it works, visualized. That's fine – I see it in the patch, I'll take a look, thanks for the explanation though.
        Hide
        rcmuir Robert Muir added a comment -

        I will rephrase it a bit.

        for these spatial queries we don't want to run lots of per-document haversin checks. so its important to categorize "rectangles" (subtrees) as "completely OUTSIDE of query range" so that we don't have to inspect any points in them.

        does not matter if its GeoPointField impl (which makes a tree out of the term dictionary) or LatLonPoint impl (which uses BKD tree). in either cases it acts like a 2D tree and subtrees are "rectangles".

        Today, we only return "completely OUTSIDE of query range" if the incoming "rectangle" is completely outside of circle's bounding box This is very coarse: it means all the points in "BOUNDING_BOX MINUS CIRCLE" (imagine the corner areas of the bounding box: these are large areas and not relevant), we are slowly inspecting them one by one. Now with the patch, generally speaking we only do the slow inspection for areas crossing the actual edge of the circle, versus a much larger area.

        Show
        rcmuir Robert Muir added a comment - I will rephrase it a bit. for these spatial queries we don't want to run lots of per-document haversin checks. so its important to categorize "rectangles" (subtrees) as "completely OUTSIDE of query range" so that we don't have to inspect any points in them. does not matter if its GeoPointField impl (which makes a tree out of the term dictionary) or LatLonPoint impl (which uses BKD tree). in either cases it acts like a 2D tree and subtrees are "rectangles". Today, we only return "completely OUTSIDE of query range" if the incoming "rectangle" is completely outside of circle's bounding box This is very coarse: it means all the points in "BOUNDING_BOX MINUS CIRCLE" (imagine the corner areas of the bounding box: these are large areas and not relevant), we are slowly inspecting them one by one. Now with the patch, generally speaking we only do the slow inspection for areas crossing the actual edge of the circle, versus a much larger area.
        Hide
        dweiss Dawid Weiss added a comment -

        Any spatial tree is typically to speed up collision/overlap checking – my experience was not with information retrieval but car navigation systems, but it's essentially similar underlying theory. Anyway, this is the part I have a hard time picturing in my head (note the "simple logic" part...). Since Ryan mentioned there is a visualization of that I thought I'd ask for a screenshot, that's all.

        Here is a patch which adds the following simple logic: if the rect of the subtree does not cross over any axis of the bbox, and none of the corners of the rect are in the circle, then the rect is disjoint from the circle. Note that while the meridian axis of the bbox is simple (it is just the longitude of the circle center), the latitude is a little more complicated. This is the latitude at which the bbox longitudes intersect the circle, which is not the latitude of the circle center (as it would be in normal 2D), except when the center is on the equator.

        Show
        dweiss Dawid Weiss added a comment - Any spatial tree is typically to speed up collision/overlap checking – my experience was not with information retrieval but car navigation systems, but it's essentially similar underlying theory. Anyway, this is the part I have a hard time picturing in my head (note the "simple logic" part...). Since Ryan mentioned there is a visualization of that I thought I'd ask for a screenshot, that's all. Here is a patch which adds the following simple logic: if the rect of the subtree does not cross over any axis of the bbox, and none of the corners of the rect are in the circle, then the rect is disjoint from the circle. Note that while the meridian axis of the bbox is simple (it is just the longitude of the circle center), the latitude is a little more complicated. This is the latitude at which the bbox longitudes intersect the circle, which is not the latitude of the circle center (as it would be in normal 2D), except when the center is on the equator.
        Hide
        rcmuir Robert Muir added a comment -

        Dawid I think you are confused. People tend to do more than one query against an index

        This optimization was here in spatial before: i removed it in LUCENE-7127 because it was broken. Ryan just adds it back.

        I see the following change:
        sandbox/LatLonPoint: 17.2 QPS -> 26.3 QPS
        spatial/GeoPointField: 12.8 QPS -> 16.3 QPS

        Thanks ryan, +1

        Show
        rcmuir Robert Muir added a comment - Dawid I think you are confused. People tend to do more than one query against an index This optimization was here in spatial before: i removed it in LUCENE-7127 because it was broken. Ryan just adds it back. I see the following change: sandbox/LatLonPoint: 17.2 QPS -> 26.3 QPS spatial/GeoPointField: 12.8 QPS -> 16.3 QPS Thanks ryan, +1
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Michael McCandless For comparison, do you remember what is the equivalent geo3d timing? I think this is a case where 2D is likely better than 3D. It's not because of the ability to properly calculate intersections, but rather because there are three dimensions to descend into, and because the intersection math is more expensive.

        Show
        kwright@metacarta.com Karl Wright added a comment - Michael McCandless For comparison, do you remember what is the equivalent geo3d timing? I think this is a case where 2D is likely better than 3D. It's not because of the ability to properly calculate intersections, but rather because there are three dimensions to descend into, and because the intersection math is more expensive.
        Hide
        mikemccand Michael McCandless added a comment -

        Wow, this is a big speedup for LatLonPoint.newDistanceQuery on the 6.1 M point London UK benchmark: trunk is 12.3 sec for 225 queries, and this patch brings it down to 7.7 sec!

        GeoPointDistanceQuery gets a small speedup, from 16.2 sec to 15.3 sec ... not sure why it's less.

        And the total hit counts of all 4 runs are (thank God!) identical: 382,961,953

        Show
        mikemccand Michael McCandless added a comment - Wow, this is a big speedup for LatLonPoint.newDistanceQuery on the 6.1 M point London UK benchmark: trunk is 12.3 sec for 225 queries, and this patch brings it down to 7.7 sec! GeoPointDistanceQuery gets a small speedup, from 16.2 sec to 15.3 sec ... not sure why it's less. And the total hit counts of all 4 runs are (thank God!) identical: 382,961,953
        Hide
        dweiss Dawid Weiss added a comment -

        I think a visualization of this concept might be more intuitive than the textual description?

        Besides, from my limited knowledge of spatial indexes – isn't it the tree structure that's worth optimizing rather than the conditions of skipping branches? If you have very odd cells in the tree (like very tall or very "flat" rectangles) shouldn't they be arranged better at index-construction time rather than detected at query execution time? Not nitpicking, just curious.

        Show
        dweiss Dawid Weiss added a comment - I think a visualization of this concept might be more intuitive than the textual description? Besides, from my limited knowledge of spatial indexes – isn't it the tree structure that's worth optimizing rather than the conditions of skipping branches? If you have very odd cells in the tree (like very tall or very "flat" rectangles) shouldn't they be arranged better at index-construction time rather than detected at query execution time? Not nitpicking, just curious.
        Hide
        rjernst Ryan Ernst added a comment - - edited

        Here is a patch which adds the following simple logic: if the rect of the subtree does not cross over any axis of the bbox, and none of the corners of the rect are in the circle, then the rect is disjoint from the circle. Note that while the meridian axis of the bbox is simple (it is just the longitude of the circle center), the latitude is a little more complicated. This is the latitude at which the bbox longitudes intersect the circle, which is not the latitude of the circle center (as it would be in normal 2D), except when the center is on the equator. The patch provides a utility method to calculate this latitude for a given circle. It also has a randomized test to check this disjoint logic in isolation, outside of the geo data structures. And finally it has a nice utility for visualizing these circles and rects, along with the bbox and its axis' (thanks Mike!).

        Show
        rjernst Ryan Ernst added a comment - - edited Here is a patch which adds the following simple logic: if the rect of the subtree does not cross over any axis of the bbox, and none of the corners of the rect are in the circle, then the rect is disjoint from the circle. Note that while the meridian axis of the bbox is simple (it is just the longitude of the circle center), the latitude is a little more complicated. This is the latitude at which the bbox longitudes intersect the circle, which is not the latitude of the circle center (as it would be in normal 2D), except when the center is on the equator. The patch provides a utility method to calculate this latitude for a given circle. It also has a randomized test to check this disjoint logic in isolation, outside of the geo data structures. And finally it has a nice utility for visualizing these circles and rects, along with the bbox and its axis' (thanks Mike!).

          People

          • Assignee:
            rjernst Ryan Ernst
            Reporter:
            rjernst Ryan Ernst
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development