Details
Description
The calculations of distance appears to be off.
Note: "The radius of the sphere to be used when calculating distances on a sphere (i.e. haversine). Default is the Earth's mean radius in kilometers (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) which is set to 3,958.761458084784856. Most applications will not need to set this."
The radius of the earth in KM is 6371.009 km (≈3958.761 mi).
Also filtering distance appears to be off  example data:
45.17614,93.87341 to 44.9369054,91.3929348 Approx 137 miles Google. 169 miles = 220 kilometers
http://....../solr/select?fl=*,score&start=0&rows=10&q=
{!sfilt%20fl=store_lat_lon}&qt=standard&pt=44.9369054,91.3929348&d=280&sort=dist(2,store,vector(44.9369054,91.3929348)) asc
Nothing shows. d=285 shows results. This is off by a lot.
Bill

 Distance.diff
 2 kB
 Bill Bell

 SOLR2125.patch
 16 kB
 Grant Ingersoll

 solrspatial.xlsx
 10 kB
 Bill Bell
Issue Links
 blocks

SOLR1568 Implement Spatial Filter
 Closed
Activity
Is there possible now to use a current Solr build (nightly) to solve this problem: find all points (using index) in radius of R km on earth?
If so, I'm more than willing to test this for you guys....
Thanks!
Committed to trunk and 3.x
Hi Bill,
Sorry, yeah, I did see your patch, but was almost done w/ what I had at that point. Keep in mind that the approach is meant to be a rough cut for filtering, as it is not 100% accurate. It is expected that most people would filter based on these range queries, and then do other things to do exact distances later in terms of scoring and sorting. I will document this on the Wiki.
Here's a patch that puts in the east/west, north/south bounding box, which seems to be what PostGIS offers. It passes Bill's example at the top and I put in other tests. I refactored some of the DistanceUtils to have some other useful methods. I think I finally got all my degrees/radians straightened out, too.
Another reference:
I like the pointInPolygon() idea from Microsoft. I do not like that they don't have a shortcut to reduce the shapes like what we have implemented. But their looping to make sure the distance is being satisfied is good.... See below.
http://msdn.microsoft.com/enus/library/cc451895.aspx
As followup, I would take my patch apply it, and then loop through the results to limit by hsin() distance to make sure the results are not on the 45 degree.... I also would make this 2nd step optional, since it slows things down, and for most applications they don't need to be that precise. (SInce you order by hsin() desc  the large distances if there are any will be at the bottom and inconsequential).
Grant: Did you see my patch? It appears to work pretty well. I implemented the solution for you.
One idea: it might be interesting to pass vector(lat,long) as an optional parameters to sfilt ur, ll if we continue using the box. I can see it pretty useful for google maps (to limit the results based on the map size).
Bill
Couple of things:
1. I did find some other errors around radians, etc. I will work to fix today.
Although extending the 45 degree out would be a conservative estimate. And since we usually sort by distance asc, those extra points would be in the result set but at the end of the list. (this is an improvement  again not at good as ellipses).
This FieldType is going to be implemented to be a bounding box around a circle of a radius specified by the passed in distance, which seems to be what most tools do, at least as one way of doing it. If you want, pick your distance based on the max distance of an ellipse or you can override LatLonType.createSpatialQuery to do so in your own FieldType. Naturally, this will overselect, but this bounding box stuff is meant to be a filter, like you said. You can then sort by distance later.
Fixed the distance calc for box. We still need to take the results and do a hsin()
this now returns 8 on example data: http://localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,91.3929348&d=196&sort=hsin%286371,true,store,vector%2844.9369054,91.3929348%29%29%20asc Note: 196 km is right
There is another example...
http://www.movabletype.co.uk/scripts/latlongdb.html not sure  but this is also a good first cut. Then if you want more accurate results calculate results that come back
and through out ones that are not within. Cache results. The link above http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates appears a little more accurate. But this is simpler.
// firstcut bounding box (in degrees) $maxLat = $lat + rad2deg($rad/$R); $minLat = $lat  rad2deg($rad/$R); // compensate for degrees longitude getting smaller with increasing latitude $maxLon = $lon + rad2deg($rad/$R/cos(deg2rad($lat))); $minLon = $lon  rad2deg($rad/$R/cos(deg2rad($lat))); $sql = "Select ID, Postcode, Lat, Lon From MyTable Where Lat > $minLat And Lat < $maxLat And Lon > $minLon And Lon < $maxLon";
Example calcs to get lat/long for distance of 220km from 44.93691,91.3929
OK so that makes sense. Youa re using distance at 45 degrees. So the eastwest would not extend far enough.
Using http://en.wikipedia.org/wiki/Pythagorean_theorem would help on the eastwest case, but circle or ellipses is MUCH better.
Although extending the 45 degree out would be a conservative estimate. And since we usually sort by distance asc, those extra points would be in the result set but at the end of the list. (this is an improvement  again not at good as ellipses).
You need a quick function that tells you "is this lat,long in the circle / ellipses or not". A range [X to Y] will not get you that. You need to use hsin().
On potential:
1. Do range select using points http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
(Lat => 1.2393 AND Lat <= 1.5532) AND (Lon >= 1.8184 AND Lon <= 0.4221)
2. Check those points for distance "in the ellipses". http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
acos(sin(1.3963) * sin(Lat) + cos(1.3963) * cos(Lat) * cos(Lon  (0.6981))) <= 0.1570;
That should make it fast and simplify the calculations.
UPDATE  NOTE:
Plugging all this into the web site, proves that Pythagorean is a good approximation...
See Excel attached.
hsin = 309 km from pt to max
hsin = 314 km from pt to min
Estimate using Pythagorean is 311 using sqrt(220km^2+220km^2)
41.42% is the difference from westeast to 45 degree. sqrt(1^2+1^2)
Yonik: sqrt(2) is right  but the spreadsheet is a bit better based on spheres.
The #2 will then subselect the points to limit within that result set.
Therefore, a user could take a distance from the user, sqrt(d^2+d^2) and use that to get a list  it is not exact but better than nothing.
The Local Solr from Patrick projected the ur, ll using the distance and did a range check to see if the points are in the box.It is a close approximation. An ellipses would be closer. Circle is not close enough.
OK, so I went back to that code and looked and we are doing the same thing (in fact, I based it off of that). Namely, the only thing that is not working in the code is the interpretation of the results and not the math itself. To that point, Bill, if you look at the points you gave, it is right on the edge of a box that is centered on that point (roughly, Chippewa Falls, WI), but still outside, of the box that was created by Solr (at 280KM). I worked this out by viewing the maps at:
1. http://www.movabletype.co.uk/scripts/latlongmap.html?lat1=44.936905&long1=91.392935&lat2=43.09&long2=93.87341
2. http://www.movabletype.co.uk/scripts/latlongmap.html?lat1=44.936905&long1=91.392935&lat2=45.17614&long2=93.87341
The reason is that we are calculating the box based on the fact that the upper right and lower left corners are 280KM away. The question is then, is this the intuitive thing app developers expect? If app developers think in terms of a radius of distance 280km and all points inside, then no, it isn't. But if they think in terms of the upper and lower box corners with no suggestion of "radius", hence implying a circle then it does.
So, one fix for this could solely be a documentation fix. The other fix is to change the reasoning above to simply find the north, south, east, west points of a box that transcribes a radius of the distance given (since most users tend to think in terms of radius from where they are located and not upper and lower box corners. This will create a box that encloses said radius completely and might give a slight bit more fuzz up near the box corners due to curvature, but that should be fine. I'm working on a patch for this fix, as I think it is the better way.
Two good articles:
http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates
 Pole and good pictures
 Why Sphere is only good for 10KM distances.
Grant: Can you give a good example using the same measurements for sorting and sfilt ?
Will this work? The Spatial Wiki http://wiki.apache.org/solr/SpatialSearch shows:
3.Using the "frange" QParser, as in fq=
{!frange l=0 u=400}hsin(0.57, 1.3, lat_rad, lon_rad, 3963.205)
The Function Wiki Shows:
hsin(radius, truefalse, x1,y1,x2,y2)
Here is the correct solution: (UPDATED)
http://localhost:8983/solr/select/??fl=*,score&start=0&rows=10&q={!sfilt%20fl=store}&qt=standard&pt=44.9369054,91.3929348&d=285&sort=hsin(6371,true,store,vector(44.9369054,91.3929348))%20asc
The Local Solr from Patrick projected the ur, ll using the distance and did a range check to see if the points are in the box.It is a close approximation.
More chatting with Grant  sqrt(2) is correct if things are flat, but prob not correct for a bounding box on the surface of a sphere.
Another thought is toproject left, right, up, and down to the sides of the box and use those lat ranges and lon ranges directly as the bounding box.
The crazy thing is that this is basic geo code  isn't there some bounding box calculation code out there we can use or at least reference?
Ok Grant & I chatted and we figured out what's going wrong. We were calculating a box the size that would completely fit inside the circle rather than viceversa. This was caused by taking the distance and projecting it out to calculate the corners of the box. But the distance given should really be to the side of the box... and the distance from the center to the corner of the box should be greater (if the box is to completely encompass the circle).
The fix should be easy  the distance to the corner of the box is sqrt(2) * dist_to_side_of_box. So internally we just need to multiply the distance by sqrt(2) before finding the corners.
Grant is coding up the fix and tests.
if (options.measStr == null  options.measStr.equals("hsin")) { ur = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null, true, options.radius); ll = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null, false, options.radius); }
It could very well be in the determination of the two corners. That being said, in the test I just posted, the point 18.71111, 19.79750 is 3000KM away from 0,0 according to movabletype when traveling along a 45 degree bearing and the tests
checkHits(fieldName, "0,0", 3000, 2, 14, 15); checkHits(fieldName, "0,0", 3001, 3, 14, 15, 16); checkHits(fieldName, "0,0", 3000.1, 3, 14, 15, 16);
Pass for me now that Yonik's patch is applied. Please check my logic.
Ah, good point, Yonik. I will fix that.
but then you are sorting using Euclidean distance
I imagine it's because of the wiki: http://wiki.apache.org/solr/SpatialSearch
The first example uses "dist" and I imagine many are going to interpret that as geo distance (I made the same mistake trying to follow the wiki).
This URL has me a bit stumped. sfilt on a lat_lon is going to use Great Circle to calculate the distance, but then you are sorting using Euclidean distance? Not saying that's a problem, but it strikes me a bit weird.
Hmm, I'm putting in some more exact tests into LatLonType and SpatialFilterTest, trying to work through this.
clearIndex(); assertU(adoc("id", "14", fieldName, "0,5")); assertU(adoc("id", "15", fieldName, "0,15")); //one at the upper right of the box, 3000KM from 0,0, see http://www.movabletype.co.uk/scripts/latlong.html assertU(adoc("id", "16", fieldName, "18.71111,19.79750")); assertU(commit()); checkHits(fieldName, "0,0", 1000, 1, 14); checkHits(fieldName, "0,0", 2000, 1, 14); checkHits(fieldName, "0,0", 3000, 2, 14, 15); checkHits(fieldName, "0,0", 3001, 3, 14, 15, 16); checkHits(fieldName, "0,0", 3000.1, 3, 14, 15, 16);
/lucene/contrib/spatial/src/java/org/apache/lucene/spatial/DistanceUtils.java  hsin appears to be okay.
Good print:
System.out.println("x1 = " + x1 + "  " + x1*180/Math.PI + ",y1 = " + y1 + "  " + x2*180/Math.PI + ",x2 = " + x2 + "  " + x2*180/Math.PI + ",y2 = " + y2 + "  " + y2*180/Math.PI + ",radius = " + radius + "\n");
System.out.println("result = " + result + " km\n");
My guess: "src/java/org/apache/solr/schema/LatLonType.java
double[] ur; double[] ll; if (options.measStr == null  options.measStr.equals("hsin")) { ur = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null, true, options.radius); ll = DistanceUtils.latLonCornerDegs(point[LAT], point[LONG], options.distance, null, false, options.radius); } else { ur = DistanceUtils.vectorBoxCorner(point, null, options.distance, true); ll = DistanceUtils.vectorBoxCorner(point, null, options.distance, false); }
Hey Bill:
Interesting. usually these types of errors are due to errors in the coordinate reference system of the input data, but if you were able to successfully implement this in JS, then yep, it's likely some bug in the existing Solr implementation rather than the input data.
Cheers,
Chris
The calculation using hsin Javascript is more accurate than our algorithm ? Chris a few percentage points maybe  but not 45%.
I will look into it some more tonight. It can't be that complicated.
Bill
Umm well if you know nothing then how are you pretty sure? And yes, the error bars are fairly high for the Great Circle distance.
Distance using the Haversine function is extremely sensitive to what spatial reference system the data was recorded in. WGS84 isn't particular great with long distances.
I know nothing on this topic, but an error of 45% at 200 km? I'm pretty certain that there is a bug not having to do with the accuracy of spatial reference systems here.
Distance using the Haversine function is extremely sensitive to what spatial reference system the data was recorded in. WGS84 isn't particular great with long distances. The PostGIS in Action book has a really good explanation of this.
The sorting won't be the issue though surely? The bug seems to be in the bounding box generation that Yonik pointed out. There will be some rounding issues at different places I can imagine, but nothing that would generate such a discrepancy.
Yes there is still a bug.
Most of what I was saying was right. I just did a quick maps.google.com  click directions  and then put the 2 lat,long in both fields.
137 miles = 220.480128 kilometers (Google)
196.6km using http://www.movabletype.co.uk/scripts/latlong.html
Distance: 196.6 km
Initial bearing: 096°53′44″
Final bearing: 098°39′05″
Midpoint: 45°03′48″N, 092°37′50″W
As the crow flies is less distance (which makes sense).
I even used the JS function on http://www.movabletype.co.uk/scripts/latlong.html:
function toRad(a) { return (a*Math.PI/180); }; function hsin(lat1,lon1,lat2,lon2) { var R = 6371; // km var dLat = toRad(lat2lat1); var dLon = toRad(lon2lon1); var a = Math.sin(dLat/2) * Math.sin(dLat/2) + Math.cos(toRad(lat1)) * Math.cos(toRad(lat2)) * Math.sin(dLon/2) * Math.sin(dLon/2); var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1a)); var d = R * c; return d; };
As a Javascript function  while looping through the results. Since I cannot find a way to output the distance automagically from the XML coming back from SOLR.
<script>document.write(hsin(lat,lon,solr.lat,solr.lon));</script>
I kept playing with d=<km> to see when the filter is not longer showing on the results at while value.
&sort=dist(2,store,vector(44.9369054,91.3929348)) asc
d=285 shows.
d=284 does not show.
Bill, I found two online distance calculators that both give the distance between the points you provided as 196km.
http://www.movabletype.co.uk/scripts/latlong.html
http://www.es.flinders.edu.au/~mattom/Utilities/distance.html
Now... the distance of 280km you provided should certainly still encompass that, so we still have a bug anyway.
Hmmm, well, I just corrected one bug that hardcoded the distance in miles, but it was just a check to see if we crossed the poles.
I don't think that change alone will fix your issue.
Earlier today, I switched around some fields/fieldtypes in the example schema, so "store" is now of latlon type, and it's the only location type (having multiple is just confusing).
So just looking at the bounding box now, here's the URL from your example:
http://localhost:8983/solr/select?fl=*,score&start=0&rows=10&q=
&qt=standard&pt=44.9369054,91.3929348&d=280&debugQuery=true
And I can see that the generated bounding box is:
+store_0_coordinate:[43.129843715965166 TO 46.688683890119314] +store_1_coordinate:[93.83266208454557 TO 88.79716545231159]
Which just misses the longitude of the point on the document of 93.87341.
Can anyone point to an webapp for checking arbitrary distances between two lat/lon points?
Just in time Bill!
I just started looking at spatial stuff today since I'm planning on putting some of it in my Lucene Revolution presentation. I've seen some tweets about people having difficulties, and I've had some problems when I tried stuff myself.
Anyway, I'm going to try and clean up some of this stuff over the next few days and make the wiki a bit more user oriented  an extra pair of eyeballs would be welcome!
Bulk close for 3.1.0 release