Lucene - Core
  1. Lucene - Core
  2. LUCENE-5271

A slightly more accurate SloppyMath distance

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 4.7, 5.0
    • Component/s: modules/other
    • Labels:
      None
    • Lucene Fields:
      New, Patch Available

      Description

      SloppyMath, intriduced in LUCENE-5258, uses earth's avg. (according to WGS84) ellipsoid radius as an approximation for computing the "spherical" distance. (The TO_KILOMETERS constant).

      While this is pretty accurate for long distances (latitude wise) this may introduce some small errors while computing distances close to the equator (as the earth radius there is larger than the avg.)

      A more accurate approximation would be taking the avg. earth radius at the source and destination points. But computing an ellipsoid radius at any given point is a heavy function, and this distance should be used in a scoring function.. So two optimizations are optional -

      • Pre-compute a table with an earth radius per latitude (the longitude does not affect the radius)
      • Instead of using two point radius avg, figure out the avg. latitude (exactly between the src and dst points) and get its radius.
      1. LUCENE-5271.patch
        6 kB
        Gilad Barkai
      2. LUCENE-5271.patch
        5 kB
        Gilad Barkai
      3. LUCENE-5271.patch
        3 kB
        Gilad Barkai

        Activity

        Hide
        Gilad Barkai added a comment - - edited

        A proposed solution as per described.
        Please note it is not ready for commit as it breaks one of the tests derived for ES/Solr (as a result of the improved accuracy).

        Show
        Gilad Barkai added a comment - - edited A proposed solution as per described. Please note it is not ready for commit as it breaks one of the tests derived for ES/Solr (as a result of the improved accuracy).
        Hide
        Gilad Barkai added a comment -

        Adapted tests to the more accurate distance method.
        Patch is ready.

        Show
        Gilad Barkai added a comment - Adapted tests to the more accurate distance method. Patch is ready.
        Hide
        Ryan Ernst added a comment -

        Thanks for the patch Gilad. A couple comments:

        • If the lat/lon values are large, then the index would be out of bounds for the table. Today it will not error. Since these are just doubles being passed in, I think it should still work?
        • Why was this test removed? assertEquals(314.40338, haversin(1, 2, 3, 4), 10e-5);
        • Could you move the 2 * radius computation into the table?
        • I know this is an already existing problem, but could you move the division by 2 from h1/h2 to h?
        Show
        Ryan Ernst added a comment - Thanks for the patch Gilad. A couple comments: If the lat/lon values are large, then the index would be out of bounds for the table. Today it will not error. Since these are just doubles being passed in, I think it should still work? Why was this test removed? assertEquals(314.40338, haversin(1, 2, 3, 4), 10e-5); Could you move the 2 * radius computation into the table? I know this is an already existing problem, but could you move the division by 2 from h1/h2 to h?
        Hide
        Gilad Barkai added a comment -

        Ryan, thanks for looking at this.

        If the lat/lon values are large, then the index would be out of bounds for the table

        Nice catch! I did not check for values over 90 degs Lat. Added a % with the the table's size.

        Why was this test removed? assertEquals(314.40338, haversin(1, 2, 3, 4), 10e-5)

        Well the test's result are wrong The new more accurate method gets other results. I added other test instead:

            double earthRadiusKMs = 6378.137;
            double halfCircle = earthRadiusKMs * Math.PI;
            assertEquals(halfCircle, haversin(0, 0, 0, 180), 0D);
        

        Which computes half earth circle on the equator using both the harvestin and a simple circle equation using Earth's equator radius.
        It differs in over 20KMs from the old harvesin result btw.

        Could you move the 2 * radius computation into the table?

        Awesome! renamed the table to diameter rather than radius.

        I know this is an already existing problem, but could you move the division by 2 from h1/h2 to h?

        Done.

        Show
        Gilad Barkai added a comment - Ryan, thanks for looking at this. If the lat/lon values are large, then the index would be out of bounds for the table Nice catch! I did not check for values over 90 degs Lat. Added a % with the the table's size. Why was this test removed? assertEquals(314.40338, haversin(1, 2, 3, 4), 10e-5) Well the test's result are wrong The new more accurate method gets other results. I added other test instead: double earthRadiusKMs = 6378.137; double halfCircle = earthRadiusKMs * Math .PI; assertEquals(halfCircle, haversin(0, 0, 0, 180), 0D); Which computes half earth circle on the equator using both the harvestin and a simple circle equation using Earth's equator radius. It differs in over 20KMs from the old harvesin result btw. Could you move the 2 * radius computation into the table? Awesome! renamed the table to diameter rather than radius. I know this is an already existing problem, but could you move the division by 2 from h1/h2 to h? Done.
        Hide
        ASF subversion and git services added a comment -

        Commit 1558259 from Ryan Ernst in branch 'dev/trunk'
        [ https://svn.apache.org/r1558259 ]

        LUCENE-5271: A slightly more accurate SloppyMath distance

        Show
        ASF subversion and git services added a comment - Commit 1558259 from Ryan Ernst in branch 'dev/trunk' [ https://svn.apache.org/r1558259 ] LUCENE-5271 : A slightly more accurate SloppyMath distance
        Hide
        ASF subversion and git services added a comment -

        Commit 1558264 from Ryan Ernst in branch 'dev/branches/branch_4x'
        [ https://svn.apache.org/r1558264 ]

        LUCENE-5271: A slightly more accurate SloppyMath distance

        Show
        ASF subversion and git services added a comment - Commit 1558264 from Ryan Ernst in branch 'dev/branches/branch_4x' [ https://svn.apache.org/r1558264 ] LUCENE-5271 : A slightly more accurate SloppyMath distance
        Hide
        Ryan Ernst added a comment -

        Thanks Gilad!

        Show
        Ryan Ernst added a comment - Thanks Gilad!
        Hide
        David Smiley added a comment -

        Cool!

        Show
        David Smiley added a comment - Cool!

          People

          • Assignee:
            Ryan Ernst
            Reporter:
            Gilad Barkai
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development