Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-7150

geo3d public APIs should match the 2D apis?

    Details

    • Type: Improvement
    • Status: Closed
    • Priority: Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 6.1, master (7.0)
    • Component/s: None
    • Labels:
      None
    • Lucene Fields:
      New

      Description

      I'm struggling to benchmark the equivalent to LatLonPoint.newDistanceQuery in the geo3d world.

      Ideally, I think we'd have a Geo3DPoint.newDistanceQuery? And it would take degrees, not radians, and radiusMeters, not an angle?

      And if I index and search using PlanetModel.SPHERE I think it should ideally give the same results as LatLonPoint.newDistanceQuery, which uses haversin.

      1. LUCENE-7150.patch
        14 kB
        Michael McCandless
      2. LUCENE-7150.patch
        10 kB
        Karl Wright
      3. LUCENE-7150.patch
        9 kB
        Karl Wright
      4. LUCENE-7150.patch
        9 kB
        Karl Wright
      5. LUCENE-7150.patch
        7 kB
        Karl Wright
      6. LUCENE-7150.patch
        6 kB
        Karl Wright
      7. LUCENE-7150-sphere.patch
        12 kB
        Michael McCandless

        Activity

        Hide
        kwright@metacarta.com Karl Wright added a comment -

        One problem at the moment is that all of geo3d works against a unit sphere/ellipsoid, and uses units of radians. The Geo3DPoint does the same. Argument compatibility would require that this be changed to degrees and meters, and a common measure of earth radius be introduced.

        Converting everything to degrees and meters is both unnecessary and unwise, in my opinion. It introduces constants all over the place that really don't add anything logically or numerically. But there are a number of shapes that one creates now using radians, e.g polygons, circles, xyz solids, rectangles, etc. Some of these use a builder-type metaphor which means multiple method invocations are needed to construct the object.

        So maybe it's possible, but non-trivial, to construct a wrapper around geo3d for constructing objects? Seems like a lot of work for not much gain.

        Show
        kwright@metacarta.com Karl Wright added a comment - One problem at the moment is that all of geo3d works against a unit sphere/ellipsoid, and uses units of radians. The Geo3DPoint does the same. Argument compatibility would require that this be changed to degrees and meters, and a common measure of earth radius be introduced. Converting everything to degrees and meters is both unnecessary and unwise, in my opinion. It introduces constants all over the place that really don't add anything logically or numerically. But there are a number of shapes that one creates now using radians, e.g polygons, circles, xyz solids, rectangles, etc. Some of these use a builder-type metaphor which means multiple method invocations are needed to construct the object. So maybe it's possible, but non-trivial, to construct a wrapper around geo3d for constructing objects? Seems like a lot of work for not much gain.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Another interesting question this begs is what exactly is the public API of geo3d?

        I would claim it consists of the following:

        • Construction of shapes
        • Construction of points
        • Evaluating intersection/overlap/etc between shapes (GeoShape interface)
        • Evaluating relationship between shapes and points (Membership interface)
        • Computing "inside" distances (as represented by the GeoDistance interface)
        • Computing "outside" distances (as represented by GeoOutsideDistance interface)

        Michael McCandless, would you agree?

        Show
        kwright@metacarta.com Karl Wright added a comment - Another interesting question this begs is what exactly is the public API of geo3d? I would claim it consists of the following: Construction of shapes Construction of points Evaluating intersection/overlap/etc between shapes (GeoShape interface) Evaluating relationship between shapes and points (Membership interface) Computing "inside" distances (as represented by the GeoDistance interface) Computing "outside" distances (as represented by GeoOutsideDistance interface) Michael McCandless , would you agree?
        Hide
        rcmuir Robert Muir added a comment -

        One problem at the moment is that all of geo3d works against a unit sphere/ellipsoid, and uses units of radians.

        But this is an implementation detail. Literally, no user cares about this.

        The api should be simple: it should take common stuff that users have like latitude, longitude, distance in meters. A user could care less what is happening behind the scenes!

        It does not matter about esoteric use cases, it does not matter if this geo3d can be used to model the earth the shape of an apple or to compute distance on jupiter’s rings.

        We should provide one field: one that works on planet earth, takes lat,lon,meters, stuff like that. Anything else can be a custom query.

        Show
        rcmuir Robert Muir added a comment - One problem at the moment is that all of geo3d works against a unit sphere/ellipsoid, and uses units of radians. But this is an implementation detail. Literally, no user cares about this. The api should be simple: it should take common stuff that users have like latitude, longitude, distance in meters. A user could care less what is happening behind the scenes! It does not matter about esoteric use cases, it does not matter if this geo3d can be used to model the earth the shape of an apple or to compute distance on jupiter’s rings. We should provide one field: one that works on planet earth, takes lat,lon,meters, stuff like that. Anything else can be a custom query.
        Hide
        rjernst Ryan Ernst added a comment -

        One problem at the moment is that all of geo3d works against a unit sphere/ellipsoid, and uses units of radians.

        But this is an implementation detail. Literally, no user cares about this.

        I agree this should be an implementation detail. Having all of these "solids" and stuff as part of the public api just means 0.00001% will ever use this, and that is a lot of code to keep around for such an esoteric use case. Advanced users like that can still use eg PointsFormat to index in multiple dimensions as their use case necessitates, but "geo" functionality should really be simple for 99.9999% of users. I'm happy that we have geo3d here, but I think the name still needs to be clearer (this is not 3d that almost any user would ever think about) and we need public apis (and those to be the only public apis) that operate the exact same as the other sphere based lat/lon geo support we have (as the issue title implies).

        Show
        rjernst Ryan Ernst added a comment - One problem at the moment is that all of geo3d works against a unit sphere/ellipsoid, and uses units of radians. But this is an implementation detail. Literally, no user cares about this. I agree this should be an implementation detail. Having all of these "solids" and stuff as part of the public api just means 0.00001% will ever use this, and that is a lot of code to keep around for such an esoteric use case. Advanced users like that can still use eg PointsFormat to index in multiple dimensions as their use case necessitates, but "geo" functionality should really be simple for 99.9999% of users. I'm happy that we have geo3d here, but I think the name still needs to be clearer (this is not 3d that almost any user would ever think about) and we need public apis (and those to be the only public apis) that operate the exact same as the other sphere based lat/lon geo support we have (as the issue title implies).
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        It escapes me why degrees and distance in meters is a "natural" measurement. To me, that's an arbitrary decision that somebody made (here). The distance in meters as being the way you specify radii of circles is the most problematic.

        But I fully understand that Lucene has already decided what units it intends to use and Geo3d in its current form is incompatible with that. And yet, "meters" implies a surface distance and geo3d doesn't even have a concept of surface distance. You could describe radii in degrees and that could match, but that's not compatible with the 2D implementation.

        So should I withdraw this contribution entirely? What is your suggestion?

        Show
        kwright@metacarta.com Karl Wright added a comment - It escapes me why degrees and distance in meters is a "natural" measurement. To me, that's an arbitrary decision that somebody made (here). The distance in meters as being the way you specify radii of circles is the most problematic. But I fully understand that Lucene has already decided what units it intends to use and Geo3d in its current form is incompatible with that. And yet, "meters" implies a surface distance and geo3d doesn't even have a concept of surface distance. You could describe radii in degrees and that could match, but that's not compatible with the 2D implementation. So should I withdraw this contribution entirely? What is your suggestion?
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        How do you understand that people would use geometric queries? If your notion of a geometric query is only all documents within circles with a center then I challenge that – it's very very limiting and would not handle polygons, paths, rectangles, or any of the other many variants of areas somebody may well want to search within. If you agree that searching for those kinds of things is important, then 99.9999% of the code that you view as useless becomes 10% rather quickly, because that is what geo3d is about.

        Show
        kwright@metacarta.com Karl Wright added a comment - How do you understand that people would use geometric queries? If your notion of a geometric query is only all documents within circles with a center then I challenge that – it's very very limiting and would not handle polygons, paths, rectangles, or any of the other many variants of areas somebody may well want to search within. If you agree that searching for those kinds of things is important, then 99.9999% of the code that you view as useless becomes 10% rather quickly, because that is what geo3d is about.
        Hide
        rcmuir Robert Muir added a comment -

        It escapes me why degrees and distance in meters is a "natural" measurement. To me, that's an arbitrary decision that somebody made (here).

        Its not arbitrary. Look up your home town on wikipedia, the location is given in latitude and longitude, not radians. This is just the way it is, its not any decision that was made here, and its not arbitrary at all. It is just the way the world works, that is what people have.

        The distance in meters as being the way you specify radii of circles is the most problematic.

        Well, this is how people measure distance in this world! How else should i find a pizza restaurant near my house? What units should i use if not meters? What does Geo3D want instead?

        But I fully understand that Lucene has already decided what units it intends to use and Geo3d in its current form is incompatible with that. And yet, "meters" implies a surface distance and geo3d doesn't even have a concept of surface distance. You could describe radii in degrees and that could match, but that's not compatible with the 2D implementation.
        So should I withdraw this contribution entirely? What is your suggestion?

        My suggestion is to do some "hiding" of the internal details. I think all mike wanted to do was benchmark the distance query, to see how 3D compares against 2D.

        So it would be better to provide a Geo3DPoint class that has less parameters (no PlanetModel), takes degrees (not radians), etc. It can internally do any conversion that is needed to work with Geo3D.

        Nobody is arguing about what geo3d geometry apis should take. But can we fix the API to be easy to use for a lucene user? If Mike cannot figure out how to do these things, I think nobody else stands a chance.

        Show
        rcmuir Robert Muir added a comment - It escapes me why degrees and distance in meters is a "natural" measurement. To me, that's an arbitrary decision that somebody made (here). Its not arbitrary. Look up your home town on wikipedia, the location is given in latitude and longitude, not radians. This is just the way it is, its not any decision that was made here, and its not arbitrary at all. It is just the way the world works, that is what people have. The distance in meters as being the way you specify radii of circles is the most problematic. Well, this is how people measure distance in this world! How else should i find a pizza restaurant near my house? What units should i use if not meters? What does Geo3D want instead? But I fully understand that Lucene has already decided what units it intends to use and Geo3d in its current form is incompatible with that. And yet, "meters" implies a surface distance and geo3d doesn't even have a concept of surface distance. You could describe radii in degrees and that could match, but that's not compatible with the 2D implementation. So should I withdraw this contribution entirely? What is your suggestion? My suggestion is to do some "hiding" of the internal details. I think all mike wanted to do was benchmark the distance query, to see how 3D compares against 2D. So it would be better to provide a Geo3DPoint class that has less parameters (no PlanetModel), takes degrees (not radians), etc. It can internally do any conversion that is needed to work with Geo3D. Nobody is arguing about what geo3d geometry apis should take. But can we fix the API to be easy to use for a lucene user? If Mike cannot figure out how to do these things, I think nobody else stands a chance.
        Hide
        rjernst Ryan Ernst added a comment -

        It escapes me why degrees and distance in meters is a "natural" measurement. To me, that's an arbitrary decision that somebody made (here).

        It is the way humans have viewed distance and location for hundreds of years.

        If your notion of a geometric query is only all documents within circles with a center then I challenge that – it's very very limiting and would not handle polygons, paths, rectangles, or any of the other many variants of areas somebody may well want to search within.

        Of course I think users should be able to express queries (and indexed values!) with these concepts. But for a user to have to use do the conversion themselves from the standard units we describe locations on earth with into a spheroid location is asking too much, they will never use it.

        Show
        rjernst Ryan Ernst added a comment - It escapes me why degrees and distance in meters is a "natural" measurement. To me, that's an arbitrary decision that somebody made (here). It is the way humans have viewed distance and location for hundreds of years. If your notion of a geometric query is only all documents within circles with a center then I challenge that – it's very very limiting and would not handle polygons, paths, rectangles, or any of the other many variants of areas somebody may well want to search within. Of course I think users should be able to express queries (and indexed values!) with these concepts. But for a user to have to use do the conversion themselves from the standard units we describe locations on earth with into a spheroid location is asking too much, they will never use it.
        Hide
        kwright@metacarta.com Karl Wright added a comment - - edited

        What does Geo3D want instead?

        It wants an angle. As I said, it currently uses radians for this. Conversion from degrees is straightforward. Conversion from meters, on the other hand, implies that you have to compute the inverse of the Vincenty distance (if you want to do it accurately), or if your "distance" is in fact really the spherical surface distance (what you guys are calling the "haversine distance"), then it is a simple division.

        So it would be better to provide a Geo3DPoint class that has less parameters (no PlanetModel), takes degrees (not radians), etc. It can internally do any conversion that is needed to work with Geo3D.

        That's fine as far as it goes but runs into two issues: (1) the distance metric, and (2) how you specify shapes to search against. I proposed possibly wrapping the shape factory methods too, but before that happens you will need to inform me what measurements are meters and what are degrees. It really is not obvious to me what the "obvious" choice is.

        Show
        kwright@metacarta.com Karl Wright added a comment - - edited What does Geo3D want instead? It wants an angle. As I said, it currently uses radians for this. Conversion from degrees is straightforward. Conversion from meters, on the other hand, implies that you have to compute the inverse of the Vincenty distance (if you want to do it accurately), or if your "distance" is in fact really the spherical surface distance (what you guys are calling the "haversine distance"), then it is a simple division. So it would be better to provide a Geo3DPoint class that has less parameters (no PlanetModel), takes degrees (not radians), etc. It can internally do any conversion that is needed to work with Geo3D. That's fine as far as it goes but runs into two issues: (1) the distance metric, and (2) how you specify shapes to search against. I proposed possibly wrapping the shape factory methods too, but before that happens you will need to inform me what measurements are meters and what are degrees. It really is not obvious to me what the "obvious" choice is.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        If Mike cannot figure out how to do these things, I think nobody else stands a chance.

        I don't actually know what problem Mike is having. He claims that when he halves the search radius he gets twice the points. I gave him extremely simple code to try so I don't understand what is going on either. I'm going to reserve judgement until we figure out what's actually happening.

        I am a bit concerned that you folks may be (a) making a lot out of what is probably a very simple bug, and (b) trying to oversimplify a rather complex problem. I don't mind working hard to make API's as consistent as possible, but there is a risk. Looking at distance measurements as a metaphor, there are lots of ways to measure distance in an ellipsoidal world. And I don't know if you've picked the best one.

        Show
        kwright@metacarta.com Karl Wright added a comment - If Mike cannot figure out how to do these things, I think nobody else stands a chance. I don't actually know what problem Mike is having. He claims that when he halves the search radius he gets twice the points. I gave him extremely simple code to try so I don't understand what is going on either. I'm going to reserve judgement until we figure out what's actually happening. I am a bit concerned that you folks may be (a) making a lot out of what is probably a very simple bug, and (b) trying to oversimplify a rather complex problem. I don't mind working hard to make API's as consistent as possible, but there is a risk. Looking at distance measurements as a metaphor, there are lots of ways to measure distance in an ellipsoidal world. And I don't know if you've picked the best one.
        Hide
        rcmuir Robert Muir added a comment -

        That's fine as far as it goes but runs into two issues: (1) the distance metric, and (2) how you specify shapes to search against.

        Its no a problem at all. For 1) pick something and be consistent. Its not pluggable, its not optional, it does not need to be flexible here. For 2) something dead simple like LatLonPoint.newPolygonQuery(String, double[], double[]).

        This *Point implementation we provide should be as simple and basic and natural as possible. For anything else, someone can just make their own *Point class to do something special.

        Show
        rcmuir Robert Muir added a comment - That's fine as far as it goes but runs into two issues: (1) the distance metric, and (2) how you specify shapes to search against. Its no a problem at all. For 1) pick something and be consistent. Its not pluggable, its not optional, it does not need to be flexible here. For 2) something dead simple like LatLonPoint.newPolygonQuery(String, double[], double[]). This *Point implementation we provide should be as simple and basic and natural as possible. For anything else, someone can just make their own *Point class to do something special.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        pick something and be consistent

        If I pick "haversine distance in meters" as being the input distance for this API, is that consistent with the 2D api everywhere?

        Show
        kwright@metacarta.com Karl Wright added a comment - pick something and be consistent If I pick "haversine distance in meters" as being the input distance for this API, is that consistent with the 2D api everywhere?
        Hide
        rcmuir Robert Muir added a comment -

        I think as a start haversin would be nice? It would at least allow us to do some nice comparisons with the other geo impls. I too am curious about performance of not just distance but also things like polygon queries.

        Remember, we can always provide 'alternatives' at a later point or more advanced apis. We can always add Geo3DAdvancedPoint in the future with wider apis (e.g. taking planetmodel or using an ellipsoid model/vincenty/whatever), or add additional methods. Its just as a start we need a sorta dumb-downed/very simple lucene integration to start exploring the differences e.g. in query performance and so on.

        So I think we should steer in the direction of simplicity, make it brain dead simple, and hard for someone to mess up (like us, trying to run benchmarks). We have to start somewhere...

        Show
        rcmuir Robert Muir added a comment - I think as a start haversin would be nice? It would at least allow us to do some nice comparisons with the other geo impls. I too am curious about performance of not just distance but also things like polygon queries. Remember, we can always provide 'alternatives' at a later point or more advanced apis. We can always add Geo3DAdvancedPoint in the future with wider apis (e.g. taking planetmodel or using an ellipsoid model/vincenty/whatever), or add additional methods. Its just as a start we need a sorta dumb-downed/very simple lucene integration to start exploring the differences e.g. in query performance and so on. So I think we should steer in the direction of simplicity, make it brain dead simple, and hard for someone to mess up (like us, trying to run benchmarks). We have to start somewhere...
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Initial patch. For Mike to test. More patches with more shape builders will be included later.

        Show
        kwright@metacarta.com Karl Wright added a comment - Initial patch. For Mike to test. More patches with more shape builders will be included later.
        Hide
        rcmuir Robert Muir added a comment -

        Nice, thanks Karl!

        Can we add GeoUtils.checkLatitude() and checkLongitude() to the public methods taking lat/lon? They are helpful at detecting mistakes.

        Show
        rcmuir Robert Muir added a comment - Nice, thanks Karl! Can we add GeoUtils.checkLatitude() and checkLongitude() to the public methods taking lat/lon? They are helpful at detecting mistakes.
        Hide
        mikemccand Michael McCandless added a comment -

        Whoa, with this patch, I'm getting close to the hit counts I see with the geo2d impls (383M), even when using WGS84! Thank you Karl Wright!

        It's 383,371,877 with geo3d and 382,961,953 with geopoint, which is much closer than I've ever seen.

        Maybe we can share this constant for the earth?: https://github.com/apache/lucene-solr/commit/9c98f9d95801fe6e64f7653667feb30ed80b9b8e#diff-ff4af375ef79ebff087ad91ce7778959R136

        Can we just use Math.toRadians? Clearly I should not be trusted to write my own toRadians

        Show
        mikemccand Michael McCandless added a comment - Whoa, with this patch, I'm getting close to the hit counts I see with the geo2d impls (383M), even when using WGS84! Thank you Karl Wright ! It's 383,371,877 with geo3d and 382,961,953 with geopoint, which is much closer than I've ever seen. Maybe we can share this constant for the earth?: https://github.com/apache/lucene-solr/commit/9c98f9d95801fe6e64f7653667feb30ed80b9b8e#diff-ff4af375ef79ebff087ad91ce7778959R136 Can we just use Math.toRadians ? Clearly I should not be trusted to write my own toRadians
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        lucene.spatial is not apparently a dependency for lucene.spatial3d at this time. Do you want me to add this?

        Show
        kwright@metacarta.com Karl Wright added a comment - lucene.spatial is not apparently a dependency for lucene.spatial3d at this time. Do you want me to add this?
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Sure, we can do that. I'm sure it's doing the same thing I am: a single multiplication.

        Show
        kwright@metacarta.com Karl Wright added a comment - Sure, we can do that. I'm sure it's doing the same thing I am: a single multiplication.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Second version of patch.

        Can't hook up Robert's check because I don't know how to set up a cross-project dependency in Lucene, but include commented out code for somebody else to enable. Also, FWIW, the constructors in Geo3D all have range checks as well.

        Show
        kwright@metacarta.com Karl Wright added a comment - Second version of patch. Can't hook up Robert's check because I don't know how to set up a cross-project dependency in Lucene, but include commented out code for somebody else to enable. Also, FWIW, the constructors in Geo3D all have range checks as well.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        This version of the patch has builders for all existing GeoShape types.

        Show
        kwright@metacarta.com Karl Wright added a comment - This version of the patch has builders for all existing GeoShape types.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Turns out that Mike's problem was that he was incorrectly converting degrees to radians.

        I didn't think this was hard – after all, there is a java.lang.Math method toRadians(double degrees) which just does it. But...

        Show
        kwright@metacarta.com Karl Wright added a comment - Turns out that Mike's problem was that he was incorrectly converting degrees to radians. I didn't think this was hard – after all, there is a java.lang.Math method toRadians(double degrees) which just does it. But...
        Hide
        rcmuir Robert Muir added a comment -

        Turns out that Mike's problem was that he was incorrectly converting degrees to radians.

        I didn't think this was hard – after all, there is a java.lang.Math method toRadians(double degrees) which just does it. But...

        That is one way to see it.

        Another way is: Mike has struggled to try to benchmark 2D vs 3D for a long time without success. As soon as we add sugar methods to take latitude/longitude/kilometers, things suddenly appear to be working!!!! So the less error-prone API is a great improvement...

        Show
        rcmuir Robert Muir added a comment - Turns out that Mike's problem was that he was incorrectly converting degrees to radians. I didn't think this was hard – after all, there is a java.lang.Math method toRadians(double degrees) which just does it. But... That is one way to see it. Another way is: Mike has struggled to try to benchmark 2D vs 3D for a long time without success. As soon as we add sugar methods to take latitude/longitude/kilometers, things suddenly appear to be working!!!! So the less error-prone API is a great improvement...
        Hide
        mikemccand Michael McCandless added a comment -

        I think we should move GeoUtils to core? It's a small amount of code, geo search is important, and now we have 3 modules that have diverse geo support.

        Show
        mikemccand Michael McCandless added a comment - I think we should move GeoUtils to core? It's a small amount of code, geo search is important, and now we have 3 modules that have diverse geo support.
        Hide
        mikemccand Michael McCandless added a comment -

        OK I re-ran the 61 M point London, UK benchmark, on trunk, with the patch here so geo3d works (curse you, toRadians!!!):

        • geo3d: 31.5 QPS, 383,987,542 hits
        • points: 28.5 QPS, 382,962,003 hits
        • geopoint: 15.9 QPS, 382,961,953 hits
        Show
        mikemccand Michael McCandless added a comment - OK I re-ran the 61 M point London, UK benchmark, on trunk, with the patch here so geo3d works (curse you, toRadians !!!): geo3d: 31.5 QPS, 383,987,542 hits points: 28.5 QPS, 382,962,003 hits geopoint: 15.9 QPS, 382,961,953 hits
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Thanks, Mike!
        I'm actually surprised that geo3D is still at the top of the list. But not complaining...

        Show
        kwright@metacarta.com Karl Wright added a comment - Thanks, Mike! I'm actually surprised that geo3D is still at the top of the list. But not complaining...
        Hide
        rjernst Ryan Ernst added a comment - - edited

        Thanks for the patch Karl Wright! I'm a little confused about the use of a constant for radians to meters, that assumes a sphere model right? Yet the patch uses the WGS84 planet model, which is a spheroid? If we are going to start with a this simple approach (which is perfect!) should the query use PlanetModel.SPHERE?

        I also think using the same constant that we use for haversin is important? We currently have this defined as 6_378_137 meters for the radius in GeoUtils (Mike, +1 to sharing this so we don't redefine this stuff multiple times). If we use that constant, and a sphere model, then presumably we should get the exact same hit counts right?

        Show
        rjernst Ryan Ernst added a comment - - edited Thanks for the patch Karl Wright ! I'm a little confused about the use of a constant for radians to meters, that assumes a sphere model right? Yet the patch uses the WGS84 planet model, which is a spheroid? If we are going to start with a this simple approach (which is perfect!) should the query use PlanetModel.SPHERE? I also think using the same constant that we use for haversin is important? We currently have this defined as 6_378_137 meters for the radius in GeoUtils (Mike, +1 to sharing this so we don't redefine this stuff multiple times). If we use that constant, and a sphere model, then presumably we should get the exact same hit counts right?
        Hide
        kwright@metacarta.com Karl Wright added a comment - - edited

        As I discussed at length with Robert above, going from precise meters to angles for anything other than a sphere is hard. For this reason, Geo3d uses angles exclusively when specifying its shapes. This was a key design decision and cannot realistically change. And the accurate conversion math is well-known to be challenging; it's an iterative convergence loop, in fact, much like computing the forward Vincenty distance.

        It so happens that this problem exists for the 2D stuff as well; for example, the test code Mike was using to exercise the 2D stuff uses a Haversine distance in computing a circle radius based on a bounding box. So I chose to use the great circle (haversine) distance as the basis of the "distance in meters" rather than expect people to an expensive calculation every time they create their query.

        The number I have in the last two versions of the patch for mean earth radius, by the way, comes from the constant Mike pointed me at elsewhere in the code. The number you state above is in fact incorrect; it is not the mean radius, but rather the equatorial radius.

        https://en.wikipedia.org/wiki/Earth_radius

        About WGS84 vs. SPHERE: Yes, if you changed all usages of "PlanetModel.WGS84" in Geo3DPoint.java to "PlanetModel.SPHERE", you should get numbers exactly like any other spherical model. However, please also noted that for ellipsoids, Geo3D has a unique way of approximating the actual surface distance. All Geo3d "circles" are in fact described by a plane that cuts through the world. Since the world in WGS84 is an ellipsoid, the shape actually described on the surface of the world is an ellipse, not a circle. The error this approach introduces has been calculated to be at most 0.5% for WGS84 vs. the Vincenty calculation.

        Show
        kwright@metacarta.com Karl Wright added a comment - - edited As I discussed at length with Robert above, going from precise meters to angles for anything other than a sphere is hard. For this reason, Geo3d uses angles exclusively when specifying its shapes. This was a key design decision and cannot realistically change. And the accurate conversion math is well-known to be challenging; it's an iterative convergence loop, in fact, much like computing the forward Vincenty distance. It so happens that this problem exists for the 2D stuff as well; for example, the test code Mike was using to exercise the 2D stuff uses a Haversine distance in computing a circle radius based on a bounding box. So I chose to use the great circle (haversine) distance as the basis of the "distance in meters" rather than expect people to an expensive calculation every time they create their query. The number I have in the last two versions of the patch for mean earth radius, by the way, comes from the constant Mike pointed me at elsewhere in the code. The number you state above is in fact incorrect; it is not the mean radius, but rather the equatorial radius. https://en.wikipedia.org/wiki/Earth_radius About WGS84 vs. SPHERE: Yes, if you changed all usages of "PlanetModel.WGS84" in Geo3DPoint.java to "PlanetModel.SPHERE", you should get numbers exactly like any other spherical model. However, please also noted that for ellipsoids, Geo3D has a unique way of approximating the actual surface distance. All Geo3d "circles" are in fact described by a plane that cuts through the world. Since the world in WGS84 is an ellipsoid, the shape actually described on the surface of the world is an ellipse, not a circle. The error this approach introduces has been calculated to be at most 0.5% for WGS84 vs. the Vincenty calculation.
        Hide
        mikemccand Michael McCandless added a comment -

        Can we name it newBoxQuery not newBBoxQuery, matching LatLonPoint, since a bbox (bounding box) is just one reason why one would be querying with a box?

        Show
        mikemccand Michael McCandless added a comment - Can we name it newBoxQuery not newBBoxQuery , matching LatLonPoint , since a bbox (bounding box) is just one reason why one would be querying with a box?
        Hide
        mikemccand Michael McCandless added a comment -

        I'm surprised as well: it's operating on 3 dimensions instead of 2! I thought this would be a penalty...

        Show
        mikemccand Michael McCandless added a comment - I'm surprised as well: it's operating on 3 dimensions instead of 2! I thought this would be a penalty...
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Rename newBBoxQuery to newBoxQuery.

        Show
        kwright@metacarta.com Karl Wright added a comment - Rename newBBoxQuery to newBoxQuery.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Me too.
        Having faster math matters more, I guess.

        Show
        kwright@metacarta.com Karl Wright added a comment - Me too. Having faster math matters more, I guess.
        Hide
        mikemccand Michael McCandless added a comment -

        I temporarily cutover from WGS84 to SPHERE in my checkout (see attached patch, is it correct ) and re-ran the 61 M London, UK benchmark for geo3d:

        • 31.2 QPS, 383,578,937 hits

        But ... I expected that hit count to be closer to what the 2g geo impls are giving us ...

        Show
        mikemccand Michael McCandless added a comment - I temporarily cutover from WGS84 to SPHERE in my checkout (see attached patch, is it correct ) and re-ran the 61 M London, UK benchmark for geo3d: 31.2 QPS, 383,578,937 hits But ... I expected that hit count to be closer to what the 2g geo impls are giving us ...
        Hide
        mikemccand Michael McCandless added a comment -

        Hmm I'm adding polygon testing to the 6.1 M London, UK benchmark, but geo3d is angry with this:

        POLY:
          lats=[51.14814672999999, 51.24200108694998, 51.22066806296021, 51.07567472594126, 51.05429962903539, 51.14814672999999]
          lons=[-0.1104487519433594, -0.23263688084967538, -0.3629490615349049, -0.36288116612674665, -0.23263923846224235, -0.1104487519433594]
        Exception in thread "main" java.lang.IllegalArgumentException: Degenerate/parallel vector constructed
        	at org.apache.lucene.spatial3d.geom.Vector.<init>(Vector.java:77)
        	at org.apache.lucene.spatial3d.geom.Plane.<init>(Plane.java:59)
        	at org.apache.lucene.spatial3d.geom.SidedPlane.<init>(SidedPlane.java:48)
        	at org.apache.lucene.spatial3d.geom.GeoPolygonFactory.buildPolygonShape(GeoPolygonFactory.java:84)
        	at org.apache.lucene.spatial3d.geom.GeoPolygonFactory.makeGeoPolygon(GeoPolygonFactory.java:43)
        	at org.apache.lucene.spatial3d.Geo3DPoint.newPolygonQuery(Geo3DPoint.java:119)
        	at perf.IndexAndSearchOpenStreetMaps.queryIndex(IndexAndSearchOpenStreetMaps.java:339)
        	at perf.IndexAndSearchOpenStreetMaps.main(IndexAndSearchOpenStreetMaps.java:525)
        

        What does this mean...?

        Show
        mikemccand Michael McCandless added a comment - Hmm I'm adding polygon testing to the 6.1 M London, UK benchmark, but geo3d is angry with this: POLY: lats=[51.14814672999999, 51.24200108694998, 51.22066806296021, 51.07567472594126, 51.05429962903539, 51.14814672999999] lons=[-0.1104487519433594, -0.23263688084967538, -0.3629490615349049, -0.36288116612674665, -0.23263923846224235, -0.1104487519433594] Exception in thread "main" java.lang.IllegalArgumentException: Degenerate/parallel vector constructed at org.apache.lucene.spatial3d.geom.Vector.<init>(Vector.java:77) at org.apache.lucene.spatial3d.geom.Plane.<init>(Plane.java:59) at org.apache.lucene.spatial3d.geom.SidedPlane.<init>(SidedPlane.java:48) at org.apache.lucene.spatial3d.geom.GeoPolygonFactory.buildPolygonShape(GeoPolygonFactory.java:84) at org.apache.lucene.spatial3d.geom.GeoPolygonFactory.makeGeoPolygon(GeoPolygonFactory.java:43) at org.apache.lucene.spatial3d.Geo3DPoint.newPolygonQuery(Geo3DPoint.java:119) at perf.IndexAndSearchOpenStreetMaps.queryIndex(IndexAndSearchOpenStreetMaps.java:339) at perf.IndexAndSearchOpenStreetMaps.main(IndexAndSearchOpenStreetMaps.java:525) What does this mean...?
        Hide
        kwright@metacarta.com Karl Wright added a comment - - edited

        What does this mean...?

        It means you have two adjacent points that are identical.

        Michael McCandless Looking at the values passed, it appears that the last point is the same as the first. In geo3d it is unnecessary to specify the last point; it's implied. The javadoc for polygons elsewhere did not make it clear that you needed to do this – but is this how the API works for 2D? If so it's easy to correct for.

        Show
        kwright@metacarta.com Karl Wright added a comment - - edited What does this mean...? It means you have two adjacent points that are identical. Michael McCandless Looking at the values passed, it appears that the last point is the same as the first. In geo3d it is unnecessary to specify the last point; it's implied. The javadoc for polygons elsewhere did not make it clear that you needed to do this – but is this how the API works for 2D? If so it's easy to correct for.
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        Are you sure that the 2d code is using a sphere as a world model?

        Also, after talking with Ryan I am concerned that the 2d implementation is using a wacky value for the radius of the earth. It should be using the mean earth radius, not the equatorial radius.

        Show
        kwright@metacarta.com Karl Wright added a comment - Are you sure that the 2d code is using a sphere as a world model? Also, after talking with Ryan I am concerned that the 2d implementation is using a wacky value for the radius of the earth. It should be using the mean earth radius, not the equatorial radius.
        Hide
        mikemccand Michael McCandless added a comment -

        Ahh OK phew

        Yes, I think the 2D APIs require this. Can you fix the geo3d api to also require it (check it, then I guess remove it when forwarding to the geom impls)?

        Show
        mikemccand Michael McCandless added a comment - Ahh OK phew Yes, I think the 2D APIs require this. Can you fix the geo3d api to also require it (check it, then I guess remove it when forwarding to the geom impls)?
        Hide
        kwright@metacarta.com Karl Wright added a comment - - edited

        Use the same convention as for 2D for polygons of repeating one point. Michael McCandless, this should get you unstuck.

        Show
        kwright@metacarta.com Karl Wright added a comment - - edited Use the same convention as for 2D for polygons of repeating one point. Michael McCandless , this should get you unstuck.
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Karl Wright, I'll test!

        Show
        mikemccand Michael McCandless added a comment - Thanks Karl Wright , I'll test!
        Hide
        mikemccand Michael McCandless added a comment -

        OK -poly 5 -geo3d now works! Thanks.

        Show
        mikemccand Michael McCandless added a comment - OK -poly 5 -geo3d now works! Thanks.
        Hide
        mikemccand Michael McCandless added a comment -

        Here're the results for querying 5-gons:

        • geo3d (WGS84): 30.3 QPS, 282,041,551 hits
        • points: 33.5 QPS, 281,983,277 hits
        • geopoint: 21.0 QPS, 281,983,217 hits
        Show
        mikemccand Michael McCandless added a comment - Here're the results for querying 5-gons: geo3d (WGS84): 30.3 QPS, 282,041,551 hits points: 33.5 QPS, 281,983,277 hits geopoint: 21.0 QPS, 281,983,217 hits
        Hide
        kwright@metacarta.com Karl Wright added a comment - - edited

        Michael McCandless Thanks!

        FWIW, polygons do not have the same issues with approximation in geo3d as circles. They are in fact exact. Are the 2D implementations doing real great-circle polygons, or a 2D linear interpolation?

        Show
        kwright@metacarta.com Karl Wright added a comment - - edited Michael McCandless Thanks! FWIW, polygons do not have the same issues with approximation in geo3d as circles. They are in fact exact. Are the 2D implementations doing real great-circle polygons, or a 2D linear interpolation?
        Hide
        mikemccand Michael McCandless added a comment -

        I would assume they are a 2D interpolation? They operate in lat/lon space?

        Show
        mikemccand Michael McCandless added a comment - I would assume they are a 2D interpolation? They operate in lat/lon space?
        Hide
        kwright@metacarta.com Karl Wright added a comment -

        If they are 2D linear interpolations, then it is not unreasonable for there to be a hit count difference between true 3D polygons and linearly interpolated polygons.

        Show
        kwright@metacarta.com Karl Wright added a comment - If they are 2D linear interpolations, then it is not unreasonable for there to be a hit count difference between true 3D polygons and linearly interpolated polygons.
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Karl Wright, I think the last patch looks good.

        This is a great step forward in usability for geo3d. I'll add TODO and reference LUCENE-7152 about validating lat/lon values, and add some javadocs for the new methods.

        I think for now we should leave the default planet model at WGS84, and just explain in the javadocs that there is up to 0.5% error.

        Show
        mikemccand Michael McCandless added a comment - Thanks Karl Wright , I think the last patch looks good. This is a great step forward in usability for geo3d. I'll add TODO and reference LUCENE-7152 about validating lat/lon values, and add some javadocs for the new methods. I think for now we should leave the default planet model at WGS84, and just explain in the javadocs that there is up to 0.5% error.
        Hide
        mikemccand Michael McCandless added a comment -

        New patch, I think it's ready.

        I just added some javadocs, named parameters consistently with LatLonPoint (lat -> latitude, etc.), and I poached checkLatitude/Longitude with a TOOD to share it again.

        Show
        mikemccand Michael McCandless added a comment - New patch, I think it's ready. I just added some javadocs, named parameters consistently with LatLonPoint (lat -> latitude, etc.), and I poached checkLatitude/Longitude with a TOOD to share it again.
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit bf0e59223d0fdf6de28f2b8a495331222e3232c8 in lucene-solr's branch refs/heads/master from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bf0e592 ]

        LUCENE-7150: add Geo3DPoint.newDistance/Box/PolygonQuery

        Show
        jira-bot ASF subversion and git services added a comment - Commit bf0e59223d0fdf6de28f2b8a495331222e3232c8 in lucene-solr's branch refs/heads/master from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=bf0e592 ] LUCENE-7150 : add Geo3DPoint.newDistance/Box/PolygonQuery
        Hide
        jira-bot ASF subversion and git services added a comment -

        Commit 0dcf822f40f685748525698e52f0030e88cb6de9 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless
        [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0dcf822 ]

        LUCENE-7150: add Geo3DPoint.newDistance/Box/PolygonQuery

        Show
        jira-bot ASF subversion and git services added a comment - Commit 0dcf822f40f685748525698e52f0030e88cb6de9 in lucene-solr's branch refs/heads/branch_6x from Mike McCandless [ https://git-wip-us.apache.org/repos/asf?p=lucene-solr.git;h=0dcf822 ] LUCENE-7150 : add Geo3DPoint.newDistance/Box/PolygonQuery
        Hide
        mikemccand Michael McCandless added a comment -

        Thanks Karl Wright!

        Show
        mikemccand Michael McCandless added a comment - Thanks Karl Wright !
        Hide
        hossman Hoss Man added a comment -

        Manually correcting fixVersion per Step #S5 of LUCENE-7271

        Show
        hossman Hoss Man added a comment - Manually correcting fixVersion per Step #S5 of LUCENE-7271

          People

          • Assignee:
            kwright@metacarta.com Karl Wright
            Reporter:
            mikemccand Michael McCandless
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development