Details

    • Type: Bug Bug
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: 1.5
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: Build
    • Labels:
      None

      Description

      The calculations of distance appears to be off.

      Note: "The radius of the sphere to be used when calculating distances on a sphere (i.e. haversine). Default is the Earth's mean radius in kilometers (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) which is set to 3,958.761458084784856. Most applications will not need to set this."

      The radius of the earth in KM is 6371.009 km (≈3958.761 mi).

      Also filtering distance appears to be off - example data:

      45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 miles = 220 kilometers

      http://....../solr/select?fl=*,score&start=0&rows=10&q=

      {!sfilt%20fl=store_lat_lon}

      &qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348)) asc

      Nothing shows. d=285 shows results. This is off by a lot.

      Bill

      1. Distance.diff
        2 kB
        Bill Bell
      2. SOLR-2125.patch
        16 kB
        Grant Ingersoll
      3. solrspatial.xlsx
        10 kB
        Bill Bell

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          Just in time Bill!
          I just started looking at spatial stuff today since I'm planning on putting some of it in my Lucene Revolution presentation. I've seen some tweets about people having difficulties, and I've had some problems when I tried stuff myself.

          Anyway, I'm going to try and clean up some of this stuff over the next few days and make the wiki a bit more user oriented - an extra pair of eyeballs would be welcome!

          Show
          Yonik Seeley added a comment - Just in time Bill! I just started looking at spatial stuff today since I'm planning on putting some of it in my Lucene Revolution presentation. I've seen some tweets about people having difficulties, and I've had some problems when I tried stuff myself. Anyway, I'm going to try and clean up some of this stuff over the next few days and make the wiki a bit more user oriented - an extra pair of eyeballs would be welcome!
          Hide
          Yonik Seeley added a comment -

          Hmmm, well, I just corrected one bug that hard-coded the distance in miles, but it was just a check to see if we crossed the poles.
          I don't think that change alone will fix your issue.

          Earlier today, I switched around some fields/field-types in the example schema, so "store" is now of latlon type, and it's the only location type (having multiple is just confusing).

          So just looking at the bounding box now, here's the URL from your example:
          http://localhost:8983/solr/select?fl=*,score&start=0&rows=10&q=

          {!sfilt%20fl=store}

          &qt=standard&pt=44.9369054,-91.3929348&d=280&debugQuery=true

          And I can see that the generated bounding box is:
          +store_0_coordinate:[43.129843715965166 TO 46.688683890119314] +store_1_coordinate:[-93.83266208454557 TO -88.79716545231159]

          Which just misses the longitude of the point on the document of -93.87341.

          Can anyone point to an webapp for checking arbitrary distances between two lat/lon points?

          Show
          Yonik Seeley added a comment - Hmmm, well, I just corrected one bug that hard-coded the distance in miles, but it was just a check to see if we crossed the poles. I don't think that change alone will fix your issue. Earlier today, I switched around some fields/field-types in the example schema, so "store" is now of latlon type, and it's the only location type (having multiple is just confusing). So just looking at the bounding box now, here's the URL from your example: http://localhost:8983/solr/select?fl=*,score&start=0&rows=10&q= {!sfilt%20fl=store} &qt=standard&pt=44.9369054,-91.3929348&d=280&debugQuery=true And I can see that the generated bounding box is: +store_0_coordinate: [43.129843715965166 TO 46.688683890119314] +store_1_coordinate: [-93.83266208454557 TO -88.79716545231159] Which just misses the longitude of the point on the document of -93.87341. Can anyone point to an webapp for checking arbitrary distances between two lat/lon points?
          Hide
          Yonik Seeley added a comment -

          Bill, I found two online distance calculators that both give the distance between the points you provided as 196km.
          http://www.movable-type.co.uk/scripts/latlong.html
          http://www.es.flinders.edu.au/~mattom/Utilities/distance.html

          Now... the distance of 280km you provided should certainly still encompass that, so we still have a bug anyway.

          Show
          Yonik Seeley added a comment - Bill, I found two online distance calculators that both give the distance between the points you provided as 196km. http://www.movable-type.co.uk/scripts/latlong.html http://www.es.flinders.edu.au/~mattom/Utilities/distance.html Now... the distance of 280km you provided should certainly still encompass that, so we still have a bug anyway.
          Hide
          Bill Bell added a comment - - edited

          Yes there is still a bug.

          Most of what I was saying was right. I just did a quick maps.google.com - click directions - and then put the 2 lat,long in both fields.

          137 miles = 220.480128 kilometers (Google)
          196.6km using http://www.movable-type.co.uk/scripts/latlong.html

          See on map: http://www.movable-type.co.uk/scripts/latlong-map.html?lat1=45.176140&long1=-93.873410&lat2=44.936905&long2=-91.392935

          Distance: 196.6 km
          Initial bearing: 096°53′44″
          Final bearing: 098°39′05″
          Midpoint: 45°03′48″N, 092°37′50″W

          As the crow flies is less distance (which makes sense).

          I even used the JS function on http://www.movable-type.co.uk/scripts/latlong.html:

          function toRad(a) {
              return (a*Math.PI/180);
          };
          
          function hsin(lat1,lon1,lat2,lon2) {
          var R = 6371; // km
          var dLat = toRad(lat2-lat1);
          var dLon = toRad(lon2-lon1); 
          var a = Math.sin(dLat/2) * Math.sin(dLat/2) +
                  Math.cos(toRad(lat1)) * Math.cos(toRad(lat2)) * 
                  Math.sin(dLon/2) * Math.sin(dLon/2); 
          var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); 
          var d = R * c;
          return d;
          };
          
          

          As a Javascript function - while looping through the results. Since I cannot find a way to output the distance automagically from the XML coming back from SOLR.

          <script>document.write(hsin(lat,lon,solr.lat,solr.lon));</script>
          

          I kept playing with d=<km> to see when the filter is not longer showing on the results at while value.

          &sort=dist(2,store,vector(44.9369054,-91.3929348)) asc

          d=285 shows.
          d=284 does not show.

          Show
          Bill Bell added a comment - - edited Yes there is still a bug. Most of what I was saying was right. I just did a quick maps.google.com - click directions - and then put the 2 lat,long in both fields. 137 miles = 220.480128 kilometers (Google) 196.6km using http://www.movable-type.co.uk/scripts/latlong.html See on map: http://www.movable-type.co.uk/scripts/latlong-map.html?lat1=45.176140&long1=-93.873410&lat2=44.936905&long2=-91.392935 Distance: 196.6 km Initial bearing: 096°53′44″ Final bearing: 098°39′05″ Midpoint: 45°03′48″N, 092°37′50″W As the crow flies is less distance (which makes sense). I even used the JS function on http://www.movable-type.co.uk/scripts/latlong.html: function toRad(a) { return (a* Math .PI/180); }; function hsin(lat1,lon1,lat2,lon2) { var R = 6371; // km var dLat = toRad(lat2-lat1); var dLon = toRad(lon2-lon1); var a = Math .sin(dLat/2) * Math .sin(dLat/2) + Math .cos(toRad(lat1)) * Math .cos(toRad(lat2)) * Math .sin(dLon/2) * Math .sin(dLon/2); var c = 2 * Math .atan2( Math .sqrt(a), Math .sqrt(1-a)); var d = R * c; return d; }; As a Javascript function - while looping through the results. Since I cannot find a way to output the distance automagically from the XML coming back from SOLR. <script>document.write(hsin(lat,lon,solr.lat,solr.lon));</script> I kept playing with d=<km> to see when the filter is not longer showing on the results at while value. &sort=dist(2,store,vector(44.9369054,-91.3929348)) asc d=285 shows. d=284 does not show.
          Hide
          Chris Male added a comment -

          The sorting won't be the issue though surely? The bug seems to be in the bounding box generation that Yonik pointed out. There will be some rounding issues at different places I can imagine, but nothing that would generate such a discrepancy.

          Show
          Chris Male added a comment - The sorting won't be the issue though surely? The bug seems to be in the bounding box generation that Yonik pointed out. There will be some rounding issues at different places I can imagine, but nothing that would generate such a discrepancy.
          Hide
          Chris A. Mattmann added a comment -

          Distance using the Haversine function is extremely sensitive to what spatial reference system the data was recorded in. WGS84 isn't particular great with long distances. The PostGIS in Action book has a really good explanation of this.

          Show
          Chris A. Mattmann added a comment - Distance using the Haversine function is extremely sensitive to what spatial reference system the data was recorded in. WGS84 isn't particular great with long distances. The PostGIS in Action book has a really good explanation of this.
          Hide
          Yonik Seeley added a comment -

          Distance using the Haversine function is extremely sensitive to what spatial reference system the data was recorded in. WGS84 isn't particular great with long distances.

          I know nothing on this topic, but an error of 45% at 200 km? I'm pretty certain that there is a bug not having to do with the accuracy of spatial reference systems here.

          Show
          Yonik Seeley added a comment - Distance using the Haversine function is extremely sensitive to what spatial reference system the data was recorded in. WGS84 isn't particular great with long distances. I know nothing on this topic, but an error of 45% at 200 km? I'm pretty certain that there is a bug not having to do with the accuracy of spatial reference systems here.
          Hide
          Chris A. Mattmann added a comment -

          Umm well if you know nothing then how are you pretty sure? And yes, the error bars are fairly high for the Great Circle distance.

          Show
          Chris A. Mattmann added a comment - Umm well if you know nothing then how are you pretty sure? And yes, the error bars are fairly high for the Great Circle distance.
          Hide
          Bill Bell added a comment -

          The calculation using hsin Javascript is more accurate than our algorithm ? Chris a few percentage points maybe - but not 45%.

          I will look into it some more tonight. It can't be that complicated.

          Bill

          Show
          Bill Bell added a comment - The calculation using hsin Javascript is more accurate than our algorithm ? Chris a few percentage points maybe - but not 45%. I will look into it some more tonight. It can't be that complicated. Bill
          Hide
          Chris A. Mattmann added a comment -

          Hey Bill:

          Interesting. usually these types of errors are due to errors in the coordinate reference system of the input data, but if you were able to successfully implement this in JS, then yep, it's likely some bug in the existing Solr implementation rather than the input data.

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - Hey Bill: Interesting. usually these types of errors are due to errors in the coordinate reference system of the input data, but if you were able to successfully implement this in JS, then yep, it's likely some bug in the existing Solr implementation rather than the input data. Cheers, Chris
          Hide
          Bill Bell added a comment - - edited

          /lucene/contrib/spatial/src/java/org/apache/lucene/spatial/DistanceUtils.java - hsin appears to be okay.

          Good print:

          System.out.println("x1 = " + x1 + " - " + x1*180/Math.PI + ",y1 = " + y1 + " - " + x2*180/Math.PI + ",x2 = " + x2 + " - " + x2*180/Math.PI + ",y2 = " + y2 + " - " + y2*180/Math.PI + ",radius = " + radius + "\n");
          System.out.println("result = " + result + " km\n");

          My guess: "src/java/org/apache/solr/schema/LatLonType.java

              double[] ur;
              double[] ll;
              if (options.measStr == null || options.measStr.equals("hsin")) {
                ur = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null, true, options.radius);
                ll = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null, false, options.radius);
              } else {
                ur = DistanceUtils.vectorBoxCorner(point, null, options.distance, true);
                ll = DistanceUtils.vectorBoxCorner(point, null, options.distance, false);
              }
          
          
          Show
          Bill Bell added a comment - - edited /lucene/contrib/spatial/src/java/org/apache/lucene/spatial/DistanceUtils.java - hsin appears to be okay. Good print: System.out.println("x1 = " + x1 + " - " + x1*180/Math.PI + ",y1 = " + y1 + " - " + x2*180/Math.PI + ",x2 = " + x2 + " - " + x2*180/Math.PI + ",y2 = " + y2 + " - " + y2*180/Math.PI + ",radius = " + radius + "\n"); System.out.println("result = " + result + " km\n"); My guess: "src/java/org/apache/solr/schema/LatLonType.java double [] ur; double [] ll; if (options.measStr == null || options.measStr.equals( "hsin" )) { ur = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null , true , options.radius); ll = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null , false , options.radius); } else { ur = DistanceUtils.vectorBoxCorner(point, null , options.distance, true ); ll = DistanceUtils.vectorBoxCorner(point, null , options.distance, false ); }
          Hide
          Grant Ingersoll added a comment -

          Hmm, I'm putting in some more exact tests into LatLonType and SpatialFilterTest, trying to work through this.

          clearIndex();
              assertU(adoc("id", "14", fieldName, "0,5"));
              assertU(adoc("id", "15", fieldName, "0,15"));
              //one at the upper right of the box, 3000KM from 0,0, see http://www.movable-type.co.uk/scripts/latlong.html
              assertU(adoc("id", "16", fieldName, "18.71111,19.79750"));
              assertU(commit());
          
              checkHits(fieldName, "0,0", 1000, 1, 14);
              checkHits(fieldName, "0,0", 2000, 1, 14);
              checkHits(fieldName, "0,0", 3000, 2, 14, 15);
              checkHits(fieldName, "0,0", 3001, 3, 14, 15, 16);
              checkHits(fieldName, "0,0", 3000.1, 3, 14, 15, 16);
          
          Show
          Grant Ingersoll added a comment - Hmm, I'm putting in some more exact tests into LatLonType and SpatialFilterTest, trying to work through this. clearIndex(); assertU(adoc( "id" , "14" , fieldName, "0,5" )); assertU(adoc( "id" , "15" , fieldName, "0,15" )); //one at the upper right of the box, 3000KM from 0,0, see http://www.movable-type.co.uk/scripts/latlong.html assertU(adoc( "id" , "16" , fieldName, "18.71111,19.79750" )); assertU(commit()); checkHits(fieldName, "0,0" , 1000, 1, 14); checkHits(fieldName, "0,0" , 2000, 1, 14); checkHits(fieldName, "0,0" , 3000, 2, 14, 15); checkHits(fieldName, "0,0" , 3001, 3, 14, 15, 16); checkHits(fieldName, "0,0" , 3000.1, 3, 14, 15, 16);
          Hide
          Grant Ingersoll added a comment -

          http://....../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348)) asc

          This URL has me a bit stumped. sfilt on a lat_lon is going to use Great Circle to calculate the distance, but then you are sorting using Euclidean distance? Not saying that's a problem, but it strikes me a bit weird.

          Show
          Grant Ingersoll added a comment - http://....../solr/select?fl=*,score&start=0&rows=10&q={!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348 )) asc This URL has me a bit stumped. sfilt on a lat_lon is going to use Great Circle to calculate the distance, but then you are sorting using Euclidean distance? Not saying that's a problem, but it strikes me a bit weird.
          Hide
          Yonik Seeley added a comment -

          but then you are sorting using Euclidean distance

          I imagine it's because of the wiki: http://wiki.apache.org/solr/SpatialSearch
          The first example uses "dist" and I imagine many are going to interpret that as geo distance (I made the same mistake trying to follow the wiki).

          Show
          Yonik Seeley added a comment - but then you are sorting using Euclidean distance I imagine it's because of the wiki: http://wiki.apache.org/solr/SpatialSearch The first example uses "dist" and I imagine many are going to interpret that as geo distance (I made the same mistake trying to follow the wiki).
          Hide
          Grant Ingersoll added a comment -

          Ah, good point, Yonik. I will fix that.

          Show
          Grant Ingersoll added a comment - Ah, good point, Yonik. I will fix that.
          Hide
          Grant Ingersoll added a comment -
           if (options.measStr == null || options.measStr.equals("hsin")) {
                ur = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null, true, options.radius);
                ll = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null, false, options.radius);
              }
          

          It could very well be in the determination of the two corners. That being said, in the test I just posted, the point 18.71111, 19.79750 is 3000KM away from 0,0 according to movable-type when traveling along a 45 degree bearing and the tests

          checkHits(fieldName, "0,0", 3000, 2, 14, 15);
          checkHits(fieldName, "0,0", 3001, 3, 14, 15, 16);
          checkHits(fieldName, "0,0", 3000.1, 3, 14, 15, 16);
          

          Pass for me now that Yonik's patch is applied. Please check my logic.

          Show
          Grant Ingersoll added a comment - if (options.measStr == null || options.measStr.equals( "hsin" )) { ur = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null , true , options.radius); ll = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null , false , options.radius); } It could very well be in the determination of the two corners. That being said, in the test I just posted, the point 18.71111, 19.79750 is 3000KM away from 0,0 according to movable-type when traveling along a 45 degree bearing and the tests checkHits(fieldName, "0,0" , 3000, 2, 14, 15); checkHits(fieldName, "0,0" , 3001, 3, 14, 15, 16); checkHits(fieldName, "0,0" , 3000.1, 3, 14, 15, 16); Pass for me now that Yonik's patch is applied. Please check my logic.
          Hide
          Yonik Seeley added a comment - - edited

          Ok Grant & I chatted and we figured out what's going wrong. We were calculating a box the size that would completely fit inside the circle rather than vice-versa. This was caused by taking the distance and projecting it out to calculate the corners of the box. But the distance given should really be to the side of the box... and the distance from the center to the corner of the box should be greater (if the box is to completely encompass the circle).

          The fix should be easy - the distance to the corner of the box is sqrt(2) * dist_to_side_of_box. So internally we just need to multiply the distance by sqrt(2) before finding the corners.

          Grant is coding up the fix and tests.

          Show
          Yonik Seeley added a comment - - edited Ok Grant & I chatted and we figured out what's going wrong. We were calculating a box the size that would completely fit inside the circle rather than vice-versa. This was caused by taking the distance and projecting it out to calculate the corners of the box. But the distance given should really be to the side of the box... and the distance from the center to the corner of the box should be greater (if the box is to completely encompass the circle). The fix should be easy - the distance to the corner of the box is sqrt(2) * dist_to_side_of_box. So internally we just need to multiply the distance by sqrt(2) before finding the corners. Grant is coding up the fix and tests.
          Hide
          Yonik Seeley added a comment -

          More chatting with Grant - sqrt(2) is correct if things are flat, but prob not correct for a bounding box on the surface of a sphere.
          Another thought is toproject left, right, up, and down to the sides of the box and use those lat ranges and lon ranges directly as the bounding box.

          The crazy thing is that this is basic geo code - isn't there some bounding box calculation code out there we can use or at least reference?

          Show
          Yonik Seeley added a comment - More chatting with Grant - sqrt(2) is correct if things are flat, but prob not correct for a bounding box on the surface of a sphere. Another thought is toproject left, right, up, and down to the sides of the box and use those lat ranges and lon ranges directly as the bounding box. The crazy thing is that this is basic geo code - isn't there some bounding box calculation code out there we can use or at least reference?
          Hide
          Bill Bell added a comment - - edited

          Grant: Can you give a good example using the same measurements for sorting and sfilt ?

          Will this work? The Spatial Wiki http://wiki.apache.org/solr/SpatialSearch shows:

          3.Using the "frange" QParser, as in fq=

          {!frange l=0 u=400}

          hsin(0.57, -1.3, lat_rad, lon_rad, 3963.205)

          The Function Wiki Shows:

          hsin(radius, true|false, x1,y1,x2,y2)

          Here is the correct solution: (UPDATED)

          
          http://localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,-91.3929348&d=285&sort=hsin(6371,true,store,vector(44.9369054,-91.3929348))%20asc
          
          

          The Local Solr from Patrick projected the ur, ll using the distance and did a range check to see if the points are in the box.It is a close approximation.

          Show
          Bill Bell added a comment - - edited Grant: Can you give a good example using the same measurements for sorting and sfilt ? Will this work? The Spatial Wiki http://wiki.apache.org/solr/SpatialSearch shows: 3.Using the "frange" QParser, as in fq= {!frange l=0 u=400} hsin(0.57, -1.3, lat_rad, lon_rad, 3963.205) The Function Wiki Shows: hsin(radius, true|false, x1,y1,x2,y2) Here is the correct solution: (UPDATED) http: //localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,-91.3929348&d=285&sort=hsin(6371, true ,store,vector(44.9369054,-91.3929348))%20asc The Local Solr from Patrick projected the ur, ll using the distance and did a range check to see if the points are in the box.It is a close approximation.
          Hide
          Bill Bell added a comment - - edited
          Show
          Bill Bell added a comment - - edited Two good articles: http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates Pole and good pictures http://stackoverflow.com/questions/238260/how-to-calculate-the-bounding-box-for-a-given-lat-lng-location Why Sphere is only good for 10KM distances.
          Hide
          Grant Ingersoll added a comment -

          The Local Solr from Patrick projected the ur, ll using the distance and did a range check to see if the points are in the box.It is a close approximation. An ellipses would be closer. Circle is not close enough.

          OK, so I went back to that code and looked and we are doing the same thing (in fact, I based it off of that). Namely, the only thing that is not working in the code is the interpretation of the results and not the math itself. To that point, Bill, if you look at the points you gave, it is right on the edge of a box that is centered on that point (roughly, Chippewa Falls, WI), but still outside, of the box that was created by Solr (at 280KM). I worked this out by viewing the maps at:
          1. http://www.movable-type.co.uk/scripts/latlong-map.html?lat1=44.936905&long1=-91.392935&lat2=43.09&long2=-93.87341
          2. http://www.movable-type.co.uk/scripts/latlong-map.html?lat1=44.936905&long1=-91.392935&lat2=45.17614&long2=-93.87341

          The reason is that we are calculating the box based on the fact that the upper right and lower left corners are 280KM away. The question is then, is this the intuitive thing app developers expect? If app developers think in terms of a radius of distance 280km and all points inside, then no, it isn't. But if they think in terms of the upper and lower box corners with no suggestion of "radius", hence implying a circle then it does.

          So, one fix for this could solely be a documentation fix. The other fix is to change the reasoning above to simply find the north, south, east, west points of a box that transcribes a radius of the distance given (since most users tend to think in terms of radius from where they are located and not upper and lower box corners. This will create a box that encloses said radius completely and might give a slight bit more fuzz up near the box corners due to curvature, but that should be fine. I'm working on a patch for this fix, as I think it is the better way.

          Show
          Grant Ingersoll added a comment - The Local Solr from Patrick projected the ur, ll using the distance and did a range check to see if the points are in the box.It is a close approximation. An ellipses would be closer. Circle is not close enough. OK, so I went back to that code and looked and we are doing the same thing (in fact, I based it off of that). Namely, the only thing that is not working in the code is the interpretation of the results and not the math itself. To that point, Bill, if you look at the points you gave, it is right on the edge of a box that is centered on that point (roughly, Chippewa Falls, WI), but still outside, of the box that was created by Solr (at 280KM). I worked this out by viewing the maps at: 1. http://www.movable-type.co.uk/scripts/latlong-map.html?lat1=44.936905&long1=-91.392935&lat2=43.09&long2=-93.87341 2. http://www.movable-type.co.uk/scripts/latlong-map.html?lat1=44.936905&long1=-91.392935&lat2=45.17614&long2=-93.87341 The reason is that we are calculating the box based on the fact that the upper right and lower left corners are 280KM away. The question is then, is this the intuitive thing app developers expect? If app developers think in terms of a radius of distance 280km and all points inside, then no, it isn't. But if they think in terms of the upper and lower box corners with no suggestion of "radius", hence implying a circle then it does. So, one fix for this could solely be a documentation fix. The other fix is to change the reasoning above to simply find the north, south, east, west points of a box that transcribes a radius of the distance given (since most users tend to think in terms of radius from where they are located and not upper and lower box corners. This will create a box that encloses said radius completely and might give a slight bit more fuzz up near the box corners due to curvature, but that should be fine. I'm working on a patch for this fix, as I think it is the better way.
          Hide
          Bill Bell added a comment - - edited

          OK so that makes sense. Youa re using distance at 45 degrees. So the east-west would not extend far enough.

          Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the east-west case, but circle or ellipses is MUCH better.

          Although extending the 45 degree out would be a conservative estimate. And since we usually sort by distance asc, those extra points would be in the result set but at the end of the list. (this is an improvement - again not at good as ellipses).

          You need a quick function that tells you "is this lat,long in the circle / ellipses or not". A range [X to Y] will not get you that. You need to use hsin().

          On potential:

          1. Do range select using points http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
          (Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221)
          2. Check those points for distance "in the ellipses". http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
          acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) <= 0.1570;

          That should make it fast and simplify the calculations.

          UPDATE - NOTE:

          Plugging all this into the web site, proves that Pythagorean is a good approximation...

          See Excel attached.

          hsin = 309 km from pt to max
          hsin = 314 km from pt to min
          Estimate using Pythagorean is 311 using sqrt(220km^2+220km^2)

          41.42% is the difference from west-east to 45 degree. sqrt(1^2+1^2)

          Yonik: sqrt(2) is right - but the spreadsheet is a bit better based on spheres.

          The #2 will then subselect the points to limit within that result set.

          Therefore, a user could take a distance from the user, sqrt(d^2+d^2) and use that to get a list - it is not exact but better than nothing.

          Show
          Bill Bell added a comment - - edited OK so that makes sense. Youa re using distance at 45 degrees. So the east-west would not extend far enough. Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the east-west case, but circle or ellipses is MUCH better. Although extending the 45 degree out would be a conservative estimate. And since we usually sort by distance asc, those extra points would be in the result set but at the end of the list. (this is an improvement - again not at good as ellipses). You need a quick function that tells you "is this lat,long in the circle / ellipses or not". A range [X to Y] will not get you that. You need to use hsin(). On potential: 1. Do range select using points http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates (Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= -1.8184 AND Lon <= 0.4221) 2. Check those points for distance "in the ellipses". http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon - (-0.6981))) <= 0.1570; That should make it fast and simplify the calculations. UPDATE - NOTE: Plugging all this into the web site, proves that Pythagorean is a good approximation... See Excel attached. hsin = 309 km from pt to max hsin = 314 km from pt to min Estimate using Pythagorean is 311 using sqrt(220km^2+220km^2) 41.42% is the difference from west-east to 45 degree. sqrt(1^2+1^2) Yonik: sqrt(2) is right - but the spreadsheet is a bit better based on spheres. The #2 will then subselect the points to limit within that result set. Therefore, a user could take a distance from the user, sqrt(d^2+d^2) and use that to get a list - it is not exact but better than nothing.
          Hide
          Bill Bell added a comment -

          Example calcs to get lat/long for distance of 220km from 44.93691,-91.3929

          Show
          Bill Bell added a comment - Example calcs to get lat/long for distance of 220km from 44.93691,-91.3929
          Hide
          Bill Bell added a comment - - edited

          There is another example...

          http://www.movable-type.co.uk/scripts/latlong-db.html not sure - but this is also a good first cut. Then if you want more accurate results calculate results that come back
          and through out ones that are not within. Cache results. The link above http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates appears a little more accurate. But this is simpler.

          
          // first-cut bounding box (in degrees)
          $maxLat = $lat + rad2deg($rad/$R);
          $minLat = $lat - rad2deg($rad/$R);
          // compensate for degrees longitude getting smaller with increasing latitude
          $maxLon = $lon + rad2deg($rad/$R/cos(deg2rad($lat)));
          $minLon = $lon - rad2deg($rad/$R/cos(deg2rad($lat)));
           
          $sql = "Select ID, Postcode, Lat, Lon
                  From MyTable
                  Where Lat > $minLat And Lat < $maxLat
                    And Lon > $minLon And Lon < $maxLon";
          
          Show
          Bill Bell added a comment - - edited There is another example... http://www.movable-type.co.uk/scripts/latlong-db.html not sure - but this is also a good first cut. Then if you want more accurate results calculate results that come back and through out ones that are not within. Cache results. The link above http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates appears a little more accurate. But this is simpler. // first-cut bounding box (in degrees) $maxLat = $lat + rad2deg($rad/$R); $minLat = $lat - rad2deg($rad/$R); // compensate for degrees longitude getting smaller with increasing latitude $maxLon = $lon + rad2deg($rad/$R/cos(deg2rad($lat))); $minLon = $lon - rad2deg($rad/$R/cos(deg2rad($lat))); $sql = "Select ID, Postcode, Lat, Lon From MyTable Where Lat > $minLat And Lat < $maxLat And Lon > $minLon And Lon < $maxLon";
          Hide
          Bill Bell added a comment - - edited

          Fixed the distance calc for box. We still need to take the results and do a hsin()

          
          this now returns 8 on example data:
          
          http://localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,-91.3929348&d=196&sort=hsin%286371,true,store,vector%2844.9369054,-91.3929348%29%29%20asc
          
          Note: 196 km is right
          
          
          Show
          Bill Bell added a comment - - edited Fixed the distance calc for box. We still need to take the results and do a hsin() this now returns 8 on example data: http: //localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,-91.3929348&d=196&sort=hsin%286371, true ,store,vector%2844.9369054,-91.3929348%29%29%20asc Note: 196 km is right
          Hide
          Grant Ingersoll added a comment -

          Couple of things:

          1. I did find some other errors around radians, etc. I will work to fix today.

          Although extending the 45 degree out would be a conservative estimate. And since we usually sort by distance asc, those extra points would be in the result set but at the end of the list. (this is an improvement - again not at good as ellipses).

          This FieldType is going to be implemented to be a bounding box around a circle of a radius specified by the passed in distance, which seems to be what most tools do, at least as one way of doing it. If you want, pick your distance based on the max distance of an ellipse or you can override LatLonType.createSpatialQuery to do so in your own FieldType. Naturally, this will overselect, but this bounding box stuff is meant to be a filter, like you said. You can then sort by distance later.

          Show
          Grant Ingersoll added a comment - Couple of things: 1. I did find some other errors around radians, etc. I will work to fix today. Although extending the 45 degree out would be a conservative estimate. And since we usually sort by distance asc, those extra points would be in the result set but at the end of the list. (this is an improvement - again not at good as ellipses). This FieldType is going to be implemented to be a bounding box around a circle of a radius specified by the passed in distance, which seems to be what most tools do, at least as one way of doing it. If you want, pick your distance based on the max distance of an ellipse or you can override LatLonType.createSpatialQuery to do so in your own FieldType. Naturally, this will overselect, but this bounding box stuff is meant to be a filter, like you said. You can then sort by distance later.
          Hide
          Bill Bell added a comment -

          Grant: Did you see my patch? It appears to work pretty well. I implemented the solution for you.

          One idea: it might be interesting to pass vector(lat,long) as an optional parameters to sfilt ur, ll if we continue using the box. I can see it pretty useful for google maps (to limit the results based on the map size).

          Bill

          Show
          Bill Bell added a comment - Grant: Did you see my patch? It appears to work pretty well. I implemented the solution for you. One idea: it might be interesting to pass vector(lat,long) as an optional parameters to sfilt ur, ll if we continue using the box. I can see it pretty useful for google maps (to limit the results based on the map size). Bill
          Hide
          Bill Bell added a comment -

          Another reference:

          I like the pointInPolygon() idea from Microsoft. I do not like that they don't have a short-cut to reduce the shapes like what we have implemented. But their looping to make sure the distance is being satisfied is good.... See below.

          http://msdn.microsoft.com/en-us/library/cc451895.aspx

          As follow-up, I would take my patch apply it, and then loop through the results to limit by hsin() distance to make sure the results are not on the 45 degree.... I also would make this 2nd step optional, since it slows things down, and for most applications they don't need to be that precise. (SInce you order by hsin() desc - the large distances if there are any will be at the bottom and inconsequential).

          Show
          Bill Bell added a comment - Another reference: I like the pointInPolygon() idea from Microsoft. I do not like that they don't have a short-cut to reduce the shapes like what we have implemented. But their looping to make sure the distance is being satisfied is good.... See below. http://msdn.microsoft.com/en-us/library/cc451895.aspx As follow-up, I would take my patch apply it, and then loop through the results to limit by hsin() distance to make sure the results are not on the 45 degree.... I also would make this 2nd step optional, since it slows things down, and for most applications they don't need to be that precise. (SInce you order by hsin() desc - the large distances if there are any will be at the bottom and inconsequential).
          Hide
          Grant Ingersoll added a comment -

          Here's a patch that puts in the east/west, north/south bounding box, which seems to be what PostGIS offers. It passes Bill's example at the top and I put in other tests. I refactored some of the DistanceUtils to have some other useful methods. I think I finally got all my degrees/radians straightened out, too.

          Show
          Grant Ingersoll added a comment - Here's a patch that puts in the east/west, north/south bounding box, which seems to be what PostGIS offers. It passes Bill's example at the top and I put in other tests. I refactored some of the DistanceUtils to have some other useful methods. I think I finally got all my degrees/radians straightened out, too.
          Hide
          Grant Ingersoll added a comment -

          Hi Bill,

          Sorry, yeah, I did see your patch, but was almost done w/ what I had at that point. Keep in mind that the approach is meant to be a rough cut for filtering, as it is not 100% accurate. It is expected that most people would filter based on these range queries, and then do other things to do exact distances later in terms of scoring and sorting. I will document this on the Wiki.

          Show
          Grant Ingersoll added a comment - Hi Bill, Sorry, yeah, I did see your patch, but was almost done w/ what I had at that point. Keep in mind that the approach is meant to be a rough cut for filtering, as it is not 100% accurate. It is expected that most people would filter based on these range queries, and then do other things to do exact distances later in terms of scoring and sorting. I will document this on the Wiki.
          Hide
          Grant Ingersoll added a comment -

          Committed to trunk and 3.x

          Show
          Grant Ingersoll added a comment - Committed to trunk and 3.x
          Hide
          Simon Rijnders added a comment -

          Is there possible now to use a current Solr build (nightly) to solve this problem: find all points (using index) in radius of R km on earth?

          If so, I'm more than willing to test this for you guys....

          Thanks!

          Show
          Simon Rijnders added a comment - Is there possible now to use a current Solr build (nightly) to solve this problem: find all points (using index) in radius of R km on earth? If so, I'm more than willing to test this for you guys.... Thanks!
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Bill Bell
            • Votes:
              0 Vote for this issue
              Watchers:
              0 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development