Lucene - Core
  1. Lucene - Core
  2. LUCENE-6905

GeoPointDistanceQuery using wrapped lon for dateline crossing query

    Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.4, 6.0
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      GeoPointDistanceQuery handles dateline crossing by splitting the Minimum Bounding Rectangle (MBR) into east/west ranges and rewriting to a Boolean SHOULD. PostFiltering is accomplished by calculating the distance from the center point to the candidate point field. Unfortunately the center point is wrapped such that calculating the closest point on the "circle" from an eastern point to a western MBR provides incorrect results thus causing false negatives in the range creation. This was caught by a jenkins failure and reproduced in 2 places: GeoPointDistanceTermsEnum and TestGeoRelations

      1. LUCENE-6905.patch
        8 kB
        Nicholas Knize
      2. LUCENE-6905.patch
        6 kB
        Nicholas Knize
      3. LUCENE-6905.patch
        7 kB
        Nicholas Knize
      4. LUCENE-6905.patch
        7 kB
        Nicholas Knize
      5. LUCENE-6905.patch
        5 kB
        Nicholas Knize

        Activity

        Hide
        Nicholas Knize added a comment - - edited

        Simple patch that unwraps the center longitude such that distance is correctly computed. Also corrects maxRadialDistance for center location at the poles.
        This fixes build failure #14669 and passes a 200 iteration beast test..

        Show
        Nicholas Knize added a comment - - edited Simple patch that unwraps the center longitude such that distance is correctly computed. Also corrects maxRadialDistance for center location at the poles. This fixes build failure #14669 and passes a 200 iteration beast test..
        Hide
        Nicholas Knize added a comment -

        Updated patch that moves unwrapping to GeoUtils. TestGeoUtils.testGeoRelations is also reenabled and updated for BKD verification. /cc Michael McCandless

        Show
        Nicholas Knize added a comment - Updated patch that moves unwrapping to GeoUtils. TestGeoUtils.testGeoRelations is also reenabled and updated for BKD verification. /cc Michael McCandless
        Hide
        Michael McCandless added a comment -

        Updated patch that moves unwrapping to GeoUtils. TestGeoUtils.testGeoRelations is also reenabled and updated for BKD verification

        Whoa, wonderful! I will look at the patch!

        Show
        Michael McCandless added a comment - Updated patch that moves unwrapping to GeoUtils. TestGeoUtils.testGeoRelations is also reenabled and updated for BKD verification Whoa, wonderful! I will look at the patch!
        Hide
        Michael McCandless added a comment -

        For the record, MBR in geo speak means "minimum bounding rectangle"

        In the GeoDistanceUtils.maxRadialDistance javadocs can you say that the returned results is in meters?

        Is this comment in GeoUtils.unwrapLon stale?

        +    // if centerLon is within bbox
        

        (I see no centerLon nor a bbox).

        Can you add a javadoc to this method? Can you add an assert before returning that lon is now in-bounds? (I.e. that the incoming lon did not require more than one iteration of unwrapping). Can you make the {{ += 360}} also a double (or make the other one an int)?

        OK so the test failures were caused by 1) not having a tolerance for up to 0.5% error in the distance, and 2) not handling dateline crossovers correctly, and not pole crossing issues? We should fix this test to "behave" like the query does: rewrite up front into the two halves of the MBR, instead of unwrapping on each step of the recursion. But let's do that separately ... can you add a TODO?

        Can we now remove the true in TestGeoUtils.testGeoRelations?:

              // TODO: GeoUtils APIs are still buggy for large distances:
              if (true || useSmallRanges) {
                // Approx 3 degrees lon at the equator:
                radiusMeters = random().nextDouble() * 333000;
              } else {
                radiusMeters = random().nextDouble() * GeoProjectionUtils.SEMIMAJOR_AXIS * Math.PI / 2.0;
              }
        

        I.e. we can now test large distances?

        Show
        Michael McCandless added a comment - For the record, MBR in geo speak means "minimum bounding rectangle" In the GeoDistanceUtils.maxRadialDistance javadocs can you say that the returned results is in meters? Is this comment in GeoUtils.unwrapLon stale? + // if centerLon is within bbox (I see no centerLon nor a bbox). Can you add a javadoc to this method? Can you add an assert before returning that lon is now in-bounds? (I.e. that the incoming lon did not require more than one iteration of unwrapping). Can you make the {{ += 360}} also a double (or make the other one an int)? OK so the test failures were caused by 1) not having a tolerance for up to 0.5% error in the distance, and 2) not handling dateline crossovers correctly, and not pole crossing issues? We should fix this test to "behave" like the query does: rewrite up front into the two halves of the MBR, instead of unwrapping on each step of the recursion. But let's do that separately ... can you add a TODO? Can we now remove the true in TestGeoUtils.testGeoRelations ?: // TODO: GeoUtils APIs are still buggy for large distances: if (true || useSmallRanges) { // Approx 3 degrees lon at the equator: radiusMeters = random().nextDouble() * 333000; } else { radiusMeters = random().nextDouble() * GeoProjectionUtils.SEMIMAJOR_AXIS * Math.PI / 2.0; } I.e. we can now test large distances?
        Hide
        Michael McCandless added a comment -

        Also, why didn't our randomized tests (which I think do span the dateline sometimes) not tickle this bug?

        Show
        Michael McCandless added a comment - Also, why didn't our randomized tests (which I think do span the dateline sometimes) not tickle this bug?
        Hide
        Nicholas Knize added a comment - - edited

        For the record, MBR in geo speak means "minimum bounding rectangle"

        yesyes! Updated the comment to spell out the acronym.

        We should fix this test to "behave" like the query does: rewrite up front into the two halves of the MBR

        Are you referring to TestGeoUtils.testGeoRelations? unwrapLon is only called iff the bbox crosses the dateline. Which behaves like the GeoPointDistanceQueryImpl does.

        Also, after digging a little bit more I've decided to move the TestGeoUtils.testGeoRelations fixes to new issue, LUCENE-6908 (patch coming next). The .testGeoRelations method doesn't exactly test the behavior of GeoPoint*Query as its using the BKD split technique (instead of quad cell division) to divide the space on each pass. For "large" distance queries this can create a lot of irregular rectangles producing large radial distortion error when using the cartesian approximation methods provided by GeoUtils. I have a "rewrite" fix that I'll attach to LUCENE-6908 that further divides the space allowing us to use the fast cartesian approximation methods instead of converting to an expensive oblate geometry approach.

        Can we now remove the true in TestGeoUtils.testGeoRelations

        Also doing this in LUCENE-6908.

        Show
        Nicholas Knize added a comment - - edited For the record, MBR in geo speak means "minimum bounding rectangle" yesyes! Updated the comment to spell out the acronym. We should fix this test to "behave" like the query does: rewrite up front into the two halves of the MBR Are you referring to TestGeoUtils.testGeoRelations ? unwrapLon is only called iff the bbox crosses the dateline. Which behaves like the GeoPointDistanceQueryImpl does. Also, after digging a little bit more I've decided to move the TestGeoUtils.testGeoRelations fixes to new issue, LUCENE-6908 (patch coming next). The .testGeoRelations method doesn't exactly test the behavior of GeoPoint*Query as its using the BKD split technique (instead of quad cell division) to divide the space on each pass. For "large" distance queries this can create a lot of irregular rectangles producing large radial distortion error when using the cartesian approximation methods provided by GeoUtils . I have a "rewrite" fix that I'll attach to LUCENE-6908 that further divides the space allowing us to use the fast cartesian approximation methods instead of converting to an expensive oblate geometry approach. Can we now remove the true in TestGeoUtils.testGeoRelations Also doing this in LUCENE-6908 .
        Hide
        Nicholas Knize added a comment - - edited

        Jenkins did catch this, eventually. Looking at the randomization it looks like the probability of producing this case is quite low. The LUCENE-6908 patch will re-enable large distances and testGeoRelations which will produce more of these edge cases.

        Show
        Nicholas Knize added a comment - - edited Jenkins did catch this, eventually. Looking at the randomization it looks like the probability of producing this case is quite low. The LUCENE-6908 patch will re-enable large distances and testGeoRelations which will produce more of these edge cases.
        Hide
        Nicholas Knize added a comment -

        New patch to focus on unwrapping only.

        Show
        Nicholas Knize added a comment - New patch to focus on unwrapping only.
        Hide
        Nicholas Knize added a comment - - edited

        Updated patch to correct path and addresses review comments.

        Show
        Nicholas Knize added a comment - - edited Updated patch to correct path and addresses review comments.
        Hide
        Michael McCandless added a comment -

        Thank you for separating out the patches Nicholas Knize.

        I don't like that we are changing the error tolerance from 0.5% to 7% when USGS says it's supposed to be 0.5%. Is that intentional? I mean, is the error in these queries really supposed to be so high?

        Can we move the lon unwrapping up into GeoPointDistanceQuery.rewrite method, and add centerLon as a parameter to GeoPointDistanceQueryImpl, since it "knows" when it's making queries that have a boundary on the date line? In fact, it knows which sub-query is "to the left" and which is "to the right", so maybe we just inline the logic inside rewrite and remove GeoUtils.unwrapLon public method?

        Show
        Michael McCandless added a comment - Thank you for separating out the patches Nicholas Knize . I don't like that we are changing the error tolerance from 0.5% to 7% when USGS says it's supposed to be 0.5%. Is that intentional? I mean, is the error in these queries really supposed to be so high? Can we move the lon unwrapping up into GeoPointDistanceQuery.rewrite method, and add centerLon as a parameter to GeoPointDistanceQueryImpl , since it "knows" when it's making queries that have a boundary on the date line? In fact, it knows which sub-query is "to the left" and which is "to the right", so maybe we just inline the logic inside rewrite and remove GeoUtils.unwrapLon public method?
        Hide
        Nicholas Knize added a comment -

        Updated patch:

        • moves unwrapping into GeoPointDistanceQuery
        • removes 7% tolerance - will be addressed by LUCENE-6908
        Show
        Nicholas Knize added a comment - Updated patch: moves unwrapping into GeoPointDistanceQuery removes 7% tolerance - will be addressed by LUCENE-6908
        Hide
        Michael McCandless added a comment -

        +1, thanks Nicholas Knize, patch looks great, but can we add a test case showing the bug and then showing that we fixed it? (Just capture the jenkins failures)

        Show
        Michael McCandless added a comment - +1, thanks Nicholas Knize , patch looks great, but can we add a test case showing the bug and then showing that we fixed it? (Just capture the jenkins failures)
        Hide
        Nicholas Knize added a comment -

        thanks Michael McCandless. I went ahead and added the jenkins failure as an explicit test. I'll commit shortly.

        Show
        Nicholas Knize added a comment - thanks Michael McCandless . I went ahead and added the jenkins failure as an explicit test. I'll commit shortly.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716462 from Nicholas Knize in branch 'dev/trunk'
        [ https://svn.apache.org/r1716462 ]

        LUCENE-6905: Unwrap center longitude for dateline crossing GeoPointDistanceQuery.

        Show
        ASF subversion and git services added a comment - Commit 1716462 from Nicholas Knize in branch 'dev/trunk' [ https://svn.apache.org/r1716462 ] LUCENE-6905 : Unwrap center longitude for dateline crossing GeoPointDistanceQuery.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716465 from Nicholas Knize in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1716465 ]

        LUCENE-6905: Unwrap center longitude for dateline crossing GeoPointDistanceQuery.

        Show
        ASF subversion and git services added a comment - Commit 1716465 from Nicholas Knize in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1716465 ] LUCENE-6905 : Unwrap center longitude for dateline crossing GeoPointDistanceQuery.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716478 from Nicholas Knize in branch 'dev/branches/lucene_solr_5_4'
        [ https://svn.apache.org/r1716478 ]

        LUCENE-6905: Unwrap center longitude for dateline crossing GeoPointDistanceQuery.

        Show
        ASF subversion and git services added a comment - Commit 1716478 from Nicholas Knize in branch 'dev/branches/lucene_solr_5_4' [ https://svn.apache.org/r1716478 ] LUCENE-6905 : Unwrap center longitude for dateline crossing GeoPointDistanceQuery.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716483 from Nicholas Knize in branch 'dev/branches/lucene_solr_5_4'
        [ https://svn.apache.org/r1716483 ]

        LUCENE-6905: updating change log.

        Show
        ASF subversion and git services added a comment - Commit 1716483 from Nicholas Knize in branch 'dev/branches/lucene_solr_5_4' [ https://svn.apache.org/r1716483 ] LUCENE-6905 : updating change log.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716486 from Nicholas Knize in branch 'dev/trunk'
        [ https://svn.apache.org/r1716486 ]

        LUCENE-6905: updating change log.

        Show
        ASF subversion and git services added a comment - Commit 1716486 from Nicholas Knize in branch 'dev/trunk' [ https://svn.apache.org/r1716486 ] LUCENE-6905 : updating change log.
        Hide
        ASF subversion and git services added a comment -

        Commit 1716488 from Nicholas Knize in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1716488 ]

        LUCENE-6905: updating change log.

        Show
        ASF subversion and git services added a comment - Commit 1716488 from Nicholas Knize in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1716488 ] LUCENE-6905 : updating change log.

          People

          • Assignee:
            Unassigned
            Reporter:
            Nicholas Knize
          • Votes:
            0 Vote for this issue
            Watchers:
            3 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development