Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Given an index with spatial information (either as a geohash, SpatialTileField (see SOLR-1586) or just two lat/lon pairs), we should be able to pass in a filter query that takes in the field name, lat, lon and distance and produces an appropriate Filter (i.e. one that is aware of the underlying field type for use by Solr.

      The interface could look like:

      &fq={!sfilt dist=20}location:49.32,-79.0
      

      or it could be:

      &fq={!sfilt lat=49.32 lat=-79.0 f=location dist=20}
      

      or:

      &fq={!sfilt p=49.32,-79.0 f=location dist=20}
      

      or:

      &fq={!sfilt lat=49.32,-79.0 fl=lat,lon dist=20}
      
      1. CartesianTierQParserPlugin.java
        5 kB
        Grant Ingersoll
      2. SOLR-1568.patch
        20 kB
        Grant Ingersoll
      3. SOLR-1568.patch
        52 kB
        Grant Ingersoll
      4. SOLR-1568.Mattmann.031010.patch.txt
        79 kB
        Chris A. Mattmann
      5. SOLR-1568.patch
        59 kB
        Grant Ingersoll
      6. SOLR-1568.patch
        68 kB
        Grant Ingersoll
      7. SOLR-1568.patch
        75 kB
        Grant Ingersoll
      8. SOLR-1568.patch
        75 kB
        Grant Ingersoll
      9. SOLR-1568.patch
        76 kB
        Grant Ingersoll
      10. SOLR-1568.patch
        80 kB
        Grant Ingersoll
      11. SOLR-1568.patch
        76 kB
        Grant Ingersoll
      12. SOLR-1568.patch
        112 kB
        Grant Ingersoll
      13. SOLR-1568.patch
        112 kB
        Grant Ingersoll
      14. SOLR-1568.patch
        112 kB
        Grant Ingersoll
      15. SOLR-1568.patch
        2 kB
        Yonik Seeley
      16. SOLR-1568.patch
        26 kB
        Yonik Seeley
      17. SOLR-1568.patch
        32 kB
        Yonik Seeley
      18. SOLR-1568.patch
        33 kB
        Yonik Seeley
      19. SOLR-1568.patch
        11 kB
        Yonik Seeley
      20. SOLR-1568.patch
        27 kB
        Yonik Seeley
      21. SOLR-1568.patch
        32 kB
        Yonik Seeley
      22. SOLR-1568.patch
        2 kB
        Yonik Seeley

        Issue Links

          Activity

          Hide
          Grant Ingersoll added a comment -

          This should be solved by having a tier field type.

          Show
          Grant Ingersoll added a comment - This should be solved by having a tier field type.
          Hide
          Grant Ingersoll added a comment -

          Here's the start of a QParserPlugin for this functionality. It's just the Java code and is not integrated with Solr yet.

          It works using the Lucene spatial stuff, but it should not be committed at this point, b/c I want to make it work with Tier based field types so that the end user need not even think about what the field name structure is (i.e. tier_).

          Can query with it as something like:

          {!tier x=32 y=-79 dist=20 prefix=tier_}

          . If you did want to use it, you would need to add it to your solrconfig.xml.

          Show
          Grant Ingersoll added a comment - Here's the start of a QParserPlugin for this functionality. It's just the Java code and is not integrated with Solr yet. It works using the Lucene spatial stuff, but it should not be committed at this point, b/c I want to make it work with Tier based field types so that the end user need not even think about what the field name structure is (i.e. tier_). Can query with it as something like: {!tier x=32 y=-79 dist=20 prefix=tier_} . If you did want to use it, you would need to add it to your solrconfig.xml.
          Hide
          Chris Male added a comment -

          Hi,

          I'm just wondering why you have created your own TierFilter instead of using the CartesianShapeFilter from the Lucene spatial contrib?

          Cheers

          Show
          Chris Male added a comment - Hi, I'm just wondering why you have created your own TierFilter instead of using the CartesianShapeFilter from the Lucene spatial contrib? Cheers
          Hide
          Grant Ingersoll added a comment -

          Honestly, it's a bit tricky and I'm not sure how best to resolve it (keep in mind that I'm not recommending the above piece be committed). There are a few pieces that can be improved in the CartesianShapeFilter to perform better (I still need to fix them in Lucene) but if I go that route, then I need to update the Lucene JARs associated with that in Solr. I can't really do that, b/c that likely means one of two things:

          1. Patching the 2.9.1 branch and packaging up that JAR for Solr or wait for a 2.9.2 release from Lucene which isn't likely.
          2. Patching trunk and including it. This would be a huge undertaking for Solr

          So, in the end, my decision was based on the fact that the code for it was pretty simple and wouldn't be a big deal to fix. Longer term, I will fix the issue in trunk of Lucene and then over time Solr can adapt to use that once we are on 3.x

          Show
          Grant Ingersoll added a comment - Honestly, it's a bit tricky and I'm not sure how best to resolve it (keep in mind that I'm not recommending the above piece be committed). There are a few pieces that can be improved in the CartesianShapeFilter to perform better (I still need to fix them in Lucene) but if I go that route, then I need to update the Lucene JARs associated with that in Solr. I can't really do that, b/c that likely means one of two things: 1. Patching the 2.9.1 branch and packaging up that JAR for Solr or wait for a 2.9.2 release from Lucene which isn't likely. 2. Patching trunk and including it. This would be a huge undertaking for Solr So, in the end, my decision was based on the fact that the code for it was pretty simple and wouldn't be a big deal to fix. Longer term, I will fix the issue in trunk of Lucene and then over time Solr can adapt to use that once we are on 3.x
          Hide
          Chris A. Mattmann added a comment -

          So, in the end, my decision was based on the fact that the code for it was pretty simple and wouldn't be a big deal to fix. Longer term, I will fix the issue in trunk of Lucene and then over time Solr can adapt to use that once we are on 3.x

          +1 on this. My group at JPL has gone the route of trying to co-develop Lucene and SOLR patches for this without much success. I think the best way is to maintain the code pieces in SOLR, try and flow them back into Lucene (those updates needed there), and then flow back into SOLR as releases in Lucene-ville are made.

          Show
          Chris A. Mattmann added a comment - So, in the end, my decision was based on the fact that the code for it was pretty simple and wouldn't be a big deal to fix. Longer term, I will fix the issue in trunk of Lucene and then over time Solr can adapt to use that once we are on 3.x +1 on this. My group at JPL has gone the route of trying to co-develop Lucene and SOLR patches for this without much success. I think the best way is to maintain the code pieces in SOLR, try and flow them back into Lucene (those updates needed there), and then flow back into SOLR as releases in Lucene-ville are made.
          Hide
          Ryan McKinley added a comment -
          I think the best way is to maintain the code pieces in SOLR, try and flow them back into Lucene (those updates needed there), and then flow back into SOLR as releases in Lucene-ville are made.

          +1

          Show
          Ryan McKinley added a comment - I think the best way is to maintain the code pieces in SOLR, try and flow them back into Lucene (those updates needed there), and then flow back into SOLR as releases in Lucene-ville are made. +1
          Hide
          patrick o'leary added a comment -

          I would be concerned that this starts making it more complex for users to implement.
          we're going from

          &lat=49.45&long=-77.33&radius=10
          

          to :

          &fq={!sfilt p=49.32,-79.0 f=location dist=20}
          

          What do you get out of it?
          I remember FAST tm, a subsidiary of microsoft (think I got all that right), used have

          geo_location:{-79.0,49.32,10)
          

          And that was unnecessarily awkward to explain to folks, again because different folks viewed GIS calculations in different manors

          Remember GIS is often like an Abbot and Costello sketch, who's on first, lat or long?
          Keep it simple, and please don't obscure it

          --1

          Show
          patrick o'leary added a comment - I would be concerned that this starts making it more complex for users to implement. we're going from &lat=49.45& long =-77.33&radius=10 to : &fq={!sfilt p=49.32,-79.0 f=location dist=20} What do you get out of it? I remember FAST tm, a subsidiary of microsoft (think I got all that right), used have geo_location:{-79.0,49.32,10) And that was unnecessarily awkward to explain to folks, again because different folks viewed GIS calculations in different manors Remember GIS is often like an Abbot and Costello sketch, who's on first, lat or long? Keep it simple, and please don't obscure it --1
          Hide
          Grant Ingersoll added a comment -

          Here's the thing I keep coming back to with this issue. For all the field types other than SpatialTile (Cartesian Tier), filters are already easily creatable. It seems odd to have to write this special QParserPlugin that has all this syntax support, etc. to create filters for all of the other cases beside the spatial tile. For instance, pure radial distance filtering is easily handled through the FunctionRangeQParserPlugin and the various distance functions. In fact, going this route is more convoluted for those cases b/c now you need a way to pass in what distance function you want to use.

          Show
          Grant Ingersoll added a comment - Here's the thing I keep coming back to with this issue. For all the field types other than SpatialTile (Cartesian Tier), filters are already easily creatable. It seems odd to have to write this special QParserPlugin that has all this syntax support, etc. to create filters for all of the other cases beside the spatial tile. For instance, pure radial distance filtering is easily handled through the FunctionRangeQParserPlugin and the various distance functions. In fact, going this route is more convoluted for those cases b/c now you need a way to pass in what distance function you want to use.
          Hide
          Ryan McKinley added a comment -

          I'm torn on this also...

          The function query syntax (and all local params really) are a bit cryptic. When I look into it though they are really powerful, and i see an advantage to having that level of control. (and it works in solr now)

          I wonder if there is a simple way to add query rewriting that would convert:
          &lat=49&long=-77&radius=10
          to:
          &fq=

          {!sfilt p=49,-77 f=location dist=10}

          Perhaps a RequestRewrite component?

          I doubt adding something like http://tuckey.org/urlrewrite/ would work off-the-shelf, but it could go a long way to hide the ugly innards for a simple case like this, yet still allow the power of the function query syntax.

          Show
          Ryan McKinley added a comment - I'm torn on this also... The function query syntax (and all local params really) are a bit cryptic. When I look into it though they are really powerful, and i see an advantage to having that level of control. (and it works in solr now) I wonder if there is a simple way to add query rewriting that would convert: &lat=49&long=-77&radius=10 to: &fq= {!sfilt p=49,-77 f=location dist=10} Perhaps a RequestRewrite component? I doubt adding something like http://tuckey.org/urlrewrite/ would work off-the-shelf, but it could go a long way to hide the ugly innards for a simple case like this, yet still allow the power of the function query syntax.
          Hide
          Grant Ingersoll added a comment -

          Keep in mind distance functions are about more than just lat/lon, although there is no reason why we can't support both.

          Show
          Grant Ingersoll added a comment - Keep in mind distance functions are about more than just lat/lon, although there is no reason why we can't support both.
          Hide
          Yonik Seeley added a comment -

          The examples are not of equivalent power. What field are we working off of? Are we filtering by bounding box or by distance, sorting by distance, or boosting by distance?

          Further, given that query parsers default to grabbing global params if local ones aren't specified this

          &fq={!sfilt p=49,-77 d=1000 f=location}
          

          could be written as

          &fq={!sfilt f=location}
          &p=49,-77
          &d=1000
          

          Which I assumed people would do anyway to make it easier for clients to construct queries and do filtering + sorting w/o repeating parameters.

          Show
          Yonik Seeley added a comment - The examples are not of equivalent power. What field are we working off of? Are we filtering by bounding box or by distance, sorting by distance, or boosting by distance? Further, given that query parsers default to grabbing global params if local ones aren't specified this &fq={!sfilt p=49,-77 d=1000 f=location} could be written as &fq={!sfilt f=location} &p=49,-77 &d=1000 Which I assumed people would do anyway to make it easier for clients to construct queries and do filtering + sorting w/o repeating parameters.
          Hide
          Grant Ingersoll added a comment -

          Are we filtering by bounding box or by distance, sorting by distance, or boosting by distance?

          I see this as a filtering only issue. To me, all the other capabilities other than filtering by a Spatial Tile Field are already covered through normal Solr capabilities, so I don't see much benefit to inventing new syntax and more code to maintain even if it is slightly easier for the app designer to consume.

          Show
          Grant Ingersoll added a comment - Are we filtering by bounding box or by distance, sorting by distance, or boosting by distance? I see this as a filtering only issue. To me, all the other capabilities other than filtering by a Spatial Tile Field are already covered through normal Solr capabilities, so I don't see much benefit to inventing new syntax and more code to maintain even if it is slightly easier for the app designer to consume.
          Hide
          Grant Ingersoll added a comment -

          Also, in all of these proposals, how do we intend to specify what distance calculation to use? Which is another reason for not going down the path of a generic QParser for this, since if we use the FunctionRangeQParser, it's inherent in the passing in of the function.

          Show
          Grant Ingersoll added a comment - Also, in all of these proposals, how do we intend to specify what distance calculation to use? Which is another reason for not going down the path of a generic QParser for this, since if we use the FunctionRangeQParser, it's inherent in the passing in of the function.
          Hide
          Yonik Seeley added a comment -

          Also, in all of these proposals, how do we intend to specify what distance calculation to use?

          IMO, the user shouldn't have to. The vast majority of users want "normal" spatial search... fast yet good enough. They won't know the details and won't want to know (except perhaps that it's accurate to x meters).

          For those super-advanced users that wish to specify their own distance function, we could allow an optional parameter on sfilt that specifies an arbitrary function.

          Show
          Yonik Seeley added a comment - Also, in all of these proposals, how do we intend to specify what distance calculation to use? IMO, the user shouldn't have to. The vast majority of users want "normal" spatial search... fast yet good enough. They won't know the details and won't want to know (except perhaps that it's accurate to x meters). For those super-advanced users that wish to specify their own distance function, we could allow an optional parameter on sfilt that specifies an arbitrary function.
          Hide
          Ryan McKinley added a comment -

          could be written as

          &fq={!sfilt f=location}
          &p=49,-77
          &d=1000
          

          Jeez – I had no idea... then I think (most?) cases are taken care of...

          With this, a handler could be defined with the default param: 'fq=

          {!sfilt f=location}

          ' and the 'end user' could simply call:
          &p=49,-77&d=1000
          never needing to grock the local-params/function syntax.

          Show
          Ryan McKinley added a comment - could be written as &fq={!sfilt f=location} &p=49,-77 &d=1000 Jeez – I had no idea... then I think (most?) cases are taken care of... With this, a handler could be defined with the default param: 'fq= {!sfilt f=location} ' and the 'end user' could simply call: &p=49,-77&d=1000 never needing to grock the local-params/function syntax.
          Hide
          Yonik Seeley added a comment -

          since if we use the FunctionRangeQParser, it's inherent in the passing in of the function.

          It's not clear exactly what you're proposing - can you give an example request so we can compare?

          Show
          Yonik Seeley added a comment - since if we use the FunctionRangeQParser, it's inherent in the passing in of the function. It's not clear exactly what you're proposing - can you give an example request so we can compare?
          Hide
          Grant Ingersoll added a comment -
          fq={!frange l=0 u=400}hsin(0.57, -1.3, lat_rad, lon_rad, 3963.205)
          
          Show
          Grant Ingersoll added a comment - fq={!frange l=0 u=400}hsin(0.57, -1.3, lat_rad, lon_rad, 3963.205)
          Hide
          Yonik Seeley added a comment -

          While it's nice that this works (to double-check results, etc), this isn't particularly scalable as it calculates the distance (the expensive part) for every document. Or are you suggesting some other kind of implementation in conjunction with this syntax?

          Show
          Yonik Seeley added a comment - While it's nice that this works (to double-check results, etc), this isn't particularly scalable as it calculates the distance (the expensive part) for every document. Or are you suggesting some other kind of implementation in conjunction with this syntax?
          Hide
          Grant Ingersoll added a comment -

          For those who only want pure radial search, it is the only way, scalable or not, unless I'm missing something.

          As I see, it there are several options for filtering that already work out of the box and that someone may want to use:
          1. Radial distance as above
          2. Range query (i.e. bounding box)

          The one remaining that isn't implemented is the tile stuff.

          Now, obviously, we can provide tools to make these slightly easier to consume. I've already prototyped most of it, but in doing so, I just couldn't shake the sense that it is a waste of time for anything but the tile stuff and that it hides too much in terms of what the tradeoffs are going to be. I just think it is easy enough for an app designer to use their existing toolbox (#1 and #2) while throwing in one more lightweight, easy to use new tool (tile support.)

          Show
          Grant Ingersoll added a comment - For those who only want pure radial search, it is the only way, scalable or not, unless I'm missing something. As I see, it there are several options for filtering that already work out of the box and that someone may want to use: 1. Radial distance as above 2. Range query (i.e. bounding box) The one remaining that isn't implemented is the tile stuff. Now, obviously, we can provide tools to make these slightly easier to consume. I've already prototyped most of it, but in doing so, I just couldn't shake the sense that it is a waste of time for anything but the tile stuff and that it hides too much in terms of what the tradeoffs are going to be. I just think it is easy enough for an app designer to use their existing toolbox (#1 and #2) while throwing in one more lightweight, easy to use new tool (tile support.)
          Hide
          Yonik Seeley added a comment -

          For those who only want pure radial search, it is the only way, scalable or not, unless I'm missing something.

          A smart spatial filter, at a minimum, only does the distance calculation for those documents that lie within the bounding box. That's the main point to the bounding box. I had assumed this is what Patrick's local-solr already did?
          Although I do think it's a good idea to also allow a user to specify only a bounding box if they don't want to pay the price for the distance calculation at all.

          2. Range query (i.e. bounding box)

          And how would that be specified out of the box? (example please?) You're not suggesting that be delegated to the user do you? That's very hard, very field specific, and will often result in multiple range queries, not one.

          Show
          Yonik Seeley added a comment - For those who only want pure radial search, it is the only way, scalable or not, unless I'm missing something. A smart spatial filter, at a minimum, only does the distance calculation for those documents that lie within the bounding box. That's the main point to the bounding box. I had assumed this is what Patrick's local-solr already did? Although I do think it's a good idea to also allow a user to specify only a bounding box if they don't want to pay the price for the distance calculation at all. 2. Range query (i.e. bounding box) And how would that be specified out of the box? (example please?) You're not suggesting that be delegated to the user do you? That's very hard, very field specific, and will often result in multiple range queries, not one.
          Hide
          patrick o'leary added a comment -

          A smart spatial filter, at a minimum, only does the distance calculation for those documents that lie within the bounding box. That's the main point to the bounding box. I had assumed this is what Patrick's local-solr already did?

          It's a little more contrived - I actually executed the lucene query as part of the filter,

          • Bounded box
          • Text query standard query
          • And bitset
          • Pass in that bitset to radial filter to only calculate distances of both text search, and bounded box

          Thus only doc's matching the query within the bounding box have their distances calculated.
          Check out
          DistanceQueryBuilder

          public Filter getFilter(Query query) {
              // Chain the Query (as filter) with our distance filter
              if (distanceFilter != null) {
                distanceFilter.reset();
              }
              QueryWrapperFilter qf = new QueryWrapperFilter(query);
              return new ChainedFilter(new Filter[] {qf, filter},
                                       ChainedFilter.AND);
            }
          
          public Query getQuery(Query query){
              return new ConstantScoreQuery(getFilter(query));
            }
          
          Show
          patrick o'leary added a comment - A smart spatial filter, at a minimum, only does the distance calculation for those documents that lie within the bounding box. That's the main point to the bounding box. I had assumed this is what Patrick's local-solr already did? It's a little more contrived - I actually executed the lucene query as part of the filter, Bounded box Text query standard query And bitset Pass in that bitset to radial filter to only calculate distances of both text search, and bounded box Thus only doc's matching the query within the bounding box have their distances calculated. Check out DistanceQueryBuilder public Filter getFilter(Query query) { // Chain the Query (as filter) with our distance filter if (distanceFilter != null ) { distanceFilter.reset(); } QueryWrapperFilter qf = new QueryWrapperFilter(query); return new ChainedFilter( new Filter[] {qf, filter}, ChainedFilter.AND); } public Query getQuery(Query query){ return new ConstantScoreQuery(getFilter(query)); }
          Hide
          Grant Ingersoll added a comment -

          And how would that be specified out of the box?

          It's just a simple range query: point:[33.5,-80 TO 35.0,-81] or some other similar thing. The point is that Solr already supports this, so I don't see a lot to be gained by actually doing it in Solr.

          (example please?) You're not suggesting that be delegated to the user do you? That's very hard, very field specific, and will often result in multiple range queries, not one.

          It doesn't strike me as that hard, but perhaps I'm missing something. Many times this stuff is automatically generated by a user clicking on a map or via other, application side calculations.

          I actually executed the lucene query as part of the filter

          That's interesting. So, you end up calculating it twice, once for scoring and once for filtering? I can see why that would speed things up.

          Show
          Grant Ingersoll added a comment - And how would that be specified out of the box? It's just a simple range query: point: [33.5,-80 TO 35.0,-81] or some other similar thing. The point is that Solr already supports this, so I don't see a lot to be gained by actually doing it in Solr. (example please?) You're not suggesting that be delegated to the user do you? That's very hard, very field specific, and will often result in multiple range queries, not one. It doesn't strike me as that hard, but perhaps I'm missing something. Many times this stuff is automatically generated by a user clicking on a map or via other, application side calculations. I actually executed the lucene query as part of the filter That's interesting. So, you end up calculating it twice, once for scoring and once for filtering? I can see why that would speed things up.
          Hide
          Grant Ingersoll added a comment -

          A totally NON WORKING patch. The only thing really defined is the interface, per earlier discussions. I doubt it even compiles. I'm putting it up here for something to talk about eventually.

          I'd like to achieve at least 2 things:

          1. Good defaults for the filters based on the FieldType
          2. Pluggable implementations for the filters based on the FieldType. I've seen enough of these now that just hardcoding bounding box to the PointType is not sufficient. I think we need a bit looser coupling. I have that now as an abstract class called SpatialFilterable (will rename to SpatialFilter) which my plan is to have the QParserPlugin load up from the Solr Config. This way, if people want to implement there own, they can

          As I said, this totally NON WORKING.

          Show
          Grant Ingersoll added a comment - A totally NON WORKING patch. The only thing really defined is the interface, per earlier discussions. I doubt it even compiles. I'm putting it up here for something to talk about eventually. I'd like to achieve at least 2 things: 1. Good defaults for the filters based on the FieldType 2. Pluggable implementations for the filters based on the FieldType. I've seen enough of these now that just hardcoding bounding box to the PointType is not sufficient. I think we need a bit looser coupling. I have that now as an abstract class called SpatialFilterable (will rename to SpatialFilter) which my plan is to have the QParserPlugin load up from the Solr Config. This way, if people want to implement there own, they can As I said, this totally NON WORKING.
          Hide
          Grant Ingersoll added a comment -

          So, one of the things I'm not sure on here is how best to associate the filtering information with the FieldType. On the one hand, we could have a base class or a small interface that defines the filter call back on the FieldType and then the PointType and other spatial FieldTypes could implement/extend that capability. Going this approach means that if someone wants to provide a different way of filtering for a FieldType, they would have to implement a derived class overriding the method. For instance, on the PointType, the base implementation may be to just generate a range query for each field based on distance. However, if someone wanted a different approach, they would then have to extend PointType and register a whole other FieldType, let's call it NewFilterPointType.

          An alternative approach would be to separate the filter calculation in a different class and then somehow associate it with the FieldType (maybe as a map). I've started this to some extent on the last NON WORKING patch, but don't feel great about the actual implementation just yet. In the case above, Solr would provide a default implementation (automatically registered) and then it could be overridden by configuring in solrconfig.xml.

          I'm also open to other suggestions. I still am pretty open to taking baby steps here by defining the API as Yonik described above (more or less, see my last patch) but only providing a single implementation right now for the Spatial Tile Field Type (Cartesian Tier).

          Thoughts and suggestions welcome? I'd like to get something in Solr pretty soon.

          Show
          Grant Ingersoll added a comment - So, one of the things I'm not sure on here is how best to associate the filtering information with the FieldType. On the one hand, we could have a base class or a small interface that defines the filter call back on the FieldType and then the PointType and other spatial FieldTypes could implement/extend that capability. Going this approach means that if someone wants to provide a different way of filtering for a FieldType, they would have to implement a derived class overriding the method. For instance, on the PointType, the base implementation may be to just generate a range query for each field based on distance. However, if someone wanted a different approach, they would then have to extend PointType and register a whole other FieldType, let's call it NewFilterPointType. An alternative approach would be to separate the filter calculation in a different class and then somehow associate it with the FieldType (maybe as a map). I've started this to some extent on the last NON WORKING patch, but don't feel great about the actual implementation just yet. In the case above, Solr would provide a default implementation (automatically registered) and then it could be overridden by configuring in solrconfig.xml. I'm also open to other suggestions. I still am pretty open to taking baby steps here by defining the API as Yonik described above (more or less, see my last patch) but only providing a single implementation right now for the Spatial Tile Field Type (Cartesian Tier). Thoughts and suggestions welcome? I'd like to get something in Solr pretty soon.
          Hide
          Grant Ingersoll added a comment -

          Actually, in thinking some more about this, it seems like it is just as easy to extend the FieldType to override a method as it is to invent new syntax to support configuring these things. I'm going to go that route, which is, of course, what Yonik suggested all along

          Show
          Grant Ingersoll added a comment - Actually, in thinking some more about this, it seems like it is just as easy to extend the FieldType to override a method as it is to invent new syntax to support configuring these things. I'm going to go that route, which is, of course, what Yonik suggested all along
          Hide
          Yonik Seeley added a comment -

          I'm not a fan of optional units... I think we should just standardize on something (meters perhaps) and stick with it.
          This is a programmatic API, not a user API, and it's not a hardship for a programmer to convert to whatever units are appropriate.

          Show
          Yonik Seeley added a comment - I'm not a fan of optional units... I think we should just standardize on something (meters perhaps) and stick with it. This is a programmatic API, not a user API, and it's not a hardship for a programmer to convert to whatever units are appropriate.
          Hide
          Grant Ingersoll added a comment -

          More progress. Don't think it even compiles, but putting it up for others to start to look at and for me to get it off of my hard drive. Just going down the FieldType route.

          Show
          Grant Ingersoll added a comment - More progress. Don't think it even compiles, but putting it up for others to start to look at and for me to get it off of my hard drive. Just going down the FieldType route.
          Hide
          Chris A. Mattmann added a comment -

          Hey Grant:

          I'll take a look at your latest patch and try to iterate on it (at least make sure it compiles, and take a pass with javadocs, run the unit tests, etc.). Should have something up in the next few hours.

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - Hey Grant: I'll take a look at your latest patch and try to iterate on it (at least make sure it compiles, and take a pass with javadocs, run the unit tests, etc.). Should have something up in the next few hours. Cheers, Chris
          Hide
          Chris A. Mattmann added a comment - - edited

          OK, this guy compiles, and I tried to guess in a couple areas (e.g., please look at Haversine) where variables were missing. One nice thing you can take out of this is the normalize functions for lat and lon in DistanceUtils – those will probably be generally useful.

          I'll also look to bring some of this over to SIS, as we start to flesh it out.

          I saw an error during unit tests in org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest, but it seems unrelated (so suspicious – is this a real bug?):

          [chipotle:solr/build/test-results] mattmann% more TEST-org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest.txt 
          Testsuite: org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest
          Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec
          
          Testcase: testContentStreamRequest took 0.003 sec
                  Caused an ERROR
          Forked Java VM exited abnormally. Please note the time in the report does not re
          flect the time until the VM exit.
          junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please n
          ote the time in the report does not reflect the time until the VM exit.
          
          [chipotle:solr/build/test-results] mattmann% 
          
          Show
          Chris A. Mattmann added a comment - - edited OK, this guy compiles, and I tried to guess in a couple areas (e.g., please look at Haversine) where variables were missing. One nice thing you can take out of this is the normalize functions for lat and lon in DistanceUtils – those will probably be generally useful. I'll also look to bring some of this over to SIS, as we start to flesh it out. I saw an error during unit tests in org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest, but it seems unrelated (so suspicious – is this a real bug?): [chipotle:solr/build/test-results] mattmann% more TEST-org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest.txt Testsuite: org.apache.solr.client.solrj.embedded.SolrExampleEmbeddedTest Tests run: 1, Failures: 0, Errors: 1, Time elapsed: 0 sec Testcase: testContentStreamRequest took 0.003 sec Caused an ERROR Forked Java VM exited abnormally. Please note the time in the report does not re flect the time until the VM exit. junit.framework.AssertionFailedError: Forked Java VM exited abnormally. Please n ote the time in the report does not reflect the time until the VM exit. [chipotle:solr/build/test-results] mattmann%
          Hide
          Grant Ingersoll added a comment -

          Here's what I have in my trunk now. Chris, haven't had a chance to work on yours, but am going to look today, as well as move over to the new trunk. Just putting this up here in case I do something stupid on my local drive.

          Show
          Grant Ingersoll added a comment - Here's what I have in my trunk now. Chris, haven't had a chance to work on yours, but am going to look today, as well as move over to the new trunk. Just putting this up here in case I do something stupid on my local drive.
          Hide
          Grant Ingersoll added a comment -

          Compiles and is much closer, but still doesn't work exactly right. Currently focusing on filtering for LatLonType. Haven't tested filter creation for the other types yet. Need to write unit tests.

          Show
          Grant Ingersoll added a comment - Compiles and is much closer, but still doesn't work exactly right. Currently focusing on filtering for LatLonType. Haven't tested filter creation for the other types yet. Need to write unit tests.
          Hide
          Yonik Seeley added a comment -

          Things are looking good! I like the SpatialQueryable interface and the SpatialOptions class, which should allow us to use an interface and slowly migrate by changing SpatialOptions w/o breaking back compat. I haven't looked much at the implementations of geohash or tier stuff yet - more the basic point type and user interfaces.

          High Level Interface:

          • units: if we do specify units, more standard abbreviations would probably be "km" and "mi" rather than "K" and "M"
          • just standardizing on units might be the best option, as I think it would be simplest to use (and client conversion is trivial)
          • Examples of confusion and complications that "units" can case:
            • When filtering and sorting, what happens if there is a units mismatch?
            • does "units" apply to the optional "radius" argument?
            • what are the units for calculated and returned distances? People might assume that because they filtered using "km", that a distance sort or other function queries would be in "km" or "mi"
            • if returned distances can be in any unit, it complicates client code that may not know the exact request, and hence the units
            • boosting or adding in a function of distance to the relevancy score: changing the units would unexpectedly change how this worked.
          • fl should probably just be "f" (fl stands for "field list")? Update: looking at the code again, it seems that the idea is to allow specifying lat,lon separately too?
            we have point and latlon fields for this though... shouldn't it always just be a single field?
          • "radius" parameter is likely to confuse people... (i.e. I specified radius=1 because I wanted to search within a 1 mile radius), while in reality this represents the radius of the earth. Since 99.99% of people should not use this parameter, perhaps rename to "planet_radius" or "sphere_radius"?
          • Seems like it would be really nice to be able to do both distance filtering and bounding box filtering.
            • Distance filtering would normally be implemented as a combination of a bounding box + distance calculations for points within that box.
            • Distance filtering could get even more efficient in the future... we could also calculate an "inner" box within the bounding box.
              Any points in the inner box would be guaranteed to be less than the distance, hence no need to calculate exact distance.
            • It looks like the current implementation(s) does bounding box only?

          Code:

          • SpatialFilterQParserPlugin should probably just be a standard parser - no need to explicitly register in solrconfig.xml
            • oops, I see that this is already done under "sfilt" currently - so it can just be removed from solrconfig.xml
            • is there a reason why it's ResourceLoaderAware?
          • for the implementation that creates a single range query for lattitude and a single one for longitude: we actually need multiple ranges (or multiple boxes) to handle boxes that cross the 180 meridian, as well as boxes that cross the poles.

          Tests:

          • SpatialQParserPluginTest - it might be nice (and easier to maintain) if this were slightly higher level - test that the produced filter successfully filters out points outside the box, rather than testing that a specific type of query is returned (currently the test does "query instanceof NumericRangeQuery").

          Example schema:

          • before release, we'll prob want to cut down the spatial field types to just one recommended one that people can use w/o having to figure out the difference between all the types. OK to keep multiple in during development though... makes for easier ad hoc testing.

          With an eye toward the end game, here's an example of what we might want to shoot for:

          // use case: filter by distance, sort by distance
          &point=49,-77
          &dist=1000
          &fq={!sfilt fl=location}
          &sort=dist($point,location) asc
            // we can't quite do this yet with function query syntax, but it seems nice?
          
          // use case: filter by distance, boost relevance score by a function of distance
          &point=49,-77
          &dist=1000
          &fq={!sfilt fl=location}
          &boost=recip(dist($point,location),1,$dist,$dist)) 
            // "boost" is an edismax param.
            // function will yield values between .5 and 1
          
          Show
          Yonik Seeley added a comment - Things are looking good! I like the SpatialQueryable interface and the SpatialOptions class, which should allow us to use an interface and slowly migrate by changing SpatialOptions w/o breaking back compat. I haven't looked much at the implementations of geohash or tier stuff yet - more the basic point type and user interfaces. High Level Interface: units: if we do specify units, more standard abbreviations would probably be "km" and "mi" rather than "K" and "M" just standardizing on units might be the best option, as I think it would be simplest to use (and client conversion is trivial) Examples of confusion and complications that "units" can case: When filtering and sorting, what happens if there is a units mismatch? does "units" apply to the optional "radius" argument? what are the units for calculated and returned distances? People might assume that because they filtered using "km", that a distance sort or other function queries would be in "km" or "mi" if returned distances can be in any unit, it complicates client code that may not know the exact request, and hence the units boosting or adding in a function of distance to the relevancy score: changing the units would unexpectedly change how this worked. fl should probably just be "f" (fl stands for "field list")? Update: looking at the code again, it seems that the idea is to allow specifying lat,lon separately too? we have point and latlon fields for this though... shouldn't it always just be a single field? "radius" parameter is likely to confuse people... (i.e. I specified radius=1 because I wanted to search within a 1 mile radius), while in reality this represents the radius of the earth. Since 99.99% of people should not use this parameter, perhaps rename to "planet_radius" or "sphere_radius"? Seems like it would be really nice to be able to do both distance filtering and bounding box filtering. Distance filtering would normally be implemented as a combination of a bounding box + distance calculations for points within that box. Distance filtering could get even more efficient in the future... we could also calculate an "inner" box within the bounding box. Any points in the inner box would be guaranteed to be less than the distance, hence no need to calculate exact distance. It looks like the current implementation(s) does bounding box only? Code: SpatialFilterQParserPlugin should probably just be a standard parser - no need to explicitly register in solrconfig.xml oops, I see that this is already done under "sfilt" currently - so it can just be removed from solrconfig.xml is there a reason why it's ResourceLoaderAware? for the implementation that creates a single range query for lattitude and a single one for longitude: we actually need multiple ranges (or multiple boxes) to handle boxes that cross the 180 meridian, as well as boxes that cross the poles. Tests: SpatialQParserPluginTest - it might be nice (and easier to maintain) if this were slightly higher level - test that the produced filter successfully filters out points outside the box, rather than testing that a specific type of query is returned (currently the test does "query instanceof NumericRangeQuery"). Example schema: before release, we'll prob want to cut down the spatial field types to just one recommended one that people can use w/o having to figure out the difference between all the types. OK to keep multiple in during development though... makes for easier ad hoc testing. With an eye toward the end game, here's an example of what we might want to shoot for: // use case : filter by distance, sort by distance &point=49,-77 &dist=1000 &fq={!sfilt fl=location} &sort=dist($point,location) asc // we can't quite do this yet with function query syntax, but it seems nice? // use case : filter by distance, boost relevance score by a function of distance &point=49,-77 &dist=1000 &fq={!sfilt fl=location} &boost=recip(dist($point,location),1,$dist,$dist)) // "boost" is an edismax param. // function will yield values between .5 and 1
          Hide
          Lance Norskog added a comment -

          Dublin Core includes conventions for encoding points & "rectangles":
          DCMI Point Encoding Scheme
          DCMI Box Encoding Scheme

          These are maybe too high-level for the spatial tools, but are an interesting packaging of metadata & coordinates.

          Show
          Lance Norskog added a comment - Dublin Core includes conventions for encoding points & "rectangles": DCMI Point Encoding Scheme DCMI Box Encoding Scheme These are maybe too high-level for the spatial tools, but are an interesting packaging of metadata & coordinates.
          Hide
          Grant Ingersoll added a comment -

          Thanks for the thorough feedback, Yonik. Some thoughts inline.

          1. units: if we do specify units, more standard abbreviations would probably be "km" and "mi" rather than "K" and "M"
          2. just standardizing on units might be the best option, as I think it would be simplest to use (and client conversion is trivial)
          3. Examples of confusion and complications that "units" can case:
          • When filtering and sorting, what happens if there is a units mismatch?
          • does "units" apply to the optional "radius" argument?
          • what are the units for calculated and returned distances? People might assume that because they filtered using "km", that a distance sort or other function queries would be in "km" or "mi"
          • if returned distances can be in any unit, it complicates client code that may not know the exact request, and hence the units
          • boosting or adding in a function of distance to the relevancy score: changing the units would unexpectedly change how this worked.

          Agreed on abbreviations. The units mostly come into play with picking the radius. Lucene spatial seems to assume miles in a lot of places, but I imagine this is actually kind of silly given that the majority of the world uses metric. I guess ideally, I want people to be able to save on conversion, but it's probably silly to try to engineer that in. I suppose we could have a whole discussion on whether to assume English or Metric. I'm still on the fence.

          fl should probably just be "f" (fl stands for "field list")? Update: looking at the code again, it seems that the idea is to allow specifying lat,lon separately too?
          we have point and latlon fields for this though... shouldn't it always just be a single field?

          I was thinking that I might support multiple fields for those people who are already indexing lat/lon as two fields. That way they could get the benefits of this code w/o re-indexing. That being said, it is still unimplemented and I'm not sure how to implement it.

          "radius" parameter is likely to confuse people... (i.e. I specified radius=1 because I wanted to search within a 1 mile radius), while in reality this represents the radius of the earth. Since 99.99% of people should not use this parameter, perhaps rename to "planet_radius" or "sphere_radius"?

          sphere_radius it is.

          Seems like it would be really nice to be able to do both distance filtering and bounding box filtering.

          • Distance filtering would normally be implemented as a combination of a bounding box + distance calculations for points within that box.
          • Distance filtering could get even more efficient in the future... we could also calculate an "inner" box within the bounding box.
            Any points in the inner box would be guaranteed to be less than the distance, hence no need to calculate exact distance.
          • It looks like the current implementation(s) does bounding box only?

          LatLonType will implement bounding box + distance filtering. I imagine people will want to override the FieldTypes for their specifics on creating the Filter. Right now, PointType really is designed for a rectangular coordinate system. Most people will likely use LatLonType, which will be geared towards spherical (i.e. Earth). But, agreed, we will be able to optimize over time.

          is there a reason why it's ResourceLoaderAware?

          Artifact of earlier design. Removed.

          SpatialQParserPluginTest

          Already changed and "higher" level, just haven't posted the patch. I actually renamed it too.

          oops, I see that this is already done under "sfilt" currently - so it can just be removed from solrconfig.xml

          Done.

          before release, we'll prob want to cut down the spatial field types to just one recommended one that people can use w/o having to figure out the difference between all the types. OK to keep multiple in during development though... makes for easier ad hoc testing.

          For now, I have a comment in the schema that says something to the effect that you likely will not need all of these. I think this gets to a bigger discussion we should have on dev@ about what the example should be. I'd like the example to be more self contained and more self explanatory. We could have a light version and a full featured one.

          With an eye toward the end game, here's an example of what we might want to shoot for:

          Agreed the syntax is nice. I don't think I will do that as part of this patch. Otherwise, I think all of those use cases are covered.

          Show
          Grant Ingersoll added a comment - Thanks for the thorough feedback, Yonik. Some thoughts inline. units: if we do specify units, more standard abbreviations would probably be "km" and "mi" rather than "K" and "M" just standardizing on units might be the best option, as I think it would be simplest to use (and client conversion is trivial) Examples of confusion and complications that "units" can case: When filtering and sorting, what happens if there is a units mismatch? does "units" apply to the optional "radius" argument? what are the units for calculated and returned distances? People might assume that because they filtered using "km", that a distance sort or other function queries would be in "km" or "mi" if returned distances can be in any unit, it complicates client code that may not know the exact request, and hence the units boosting or adding in a function of distance to the relevancy score: changing the units would unexpectedly change how this worked. Agreed on abbreviations. The units mostly come into play with picking the radius. Lucene spatial seems to assume miles in a lot of places, but I imagine this is actually kind of silly given that the majority of the world uses metric. I guess ideally, I want people to be able to save on conversion, but it's probably silly to try to engineer that in. I suppose we could have a whole discussion on whether to assume English or Metric. I'm still on the fence. fl should probably just be "f" (fl stands for "field list")? Update: looking at the code again, it seems that the idea is to allow specifying lat,lon separately too? we have point and latlon fields for this though... shouldn't it always just be a single field? I was thinking that I might support multiple fields for those people who are already indexing lat/lon as two fields. That way they could get the benefits of this code w/o re-indexing. That being said, it is still unimplemented and I'm not sure how to implement it. "radius" parameter is likely to confuse people... (i.e. I specified radius=1 because I wanted to search within a 1 mile radius), while in reality this represents the radius of the earth. Since 99.99% of people should not use this parameter, perhaps rename to "planet_radius" or "sphere_radius"? sphere_radius it is. Seems like it would be really nice to be able to do both distance filtering and bounding box filtering. Distance filtering would normally be implemented as a combination of a bounding box + distance calculations for points within that box. Distance filtering could get even more efficient in the future... we could also calculate an "inner" box within the bounding box. Any points in the inner box would be guaranteed to be less than the distance, hence no need to calculate exact distance. It looks like the current implementation(s) does bounding box only? LatLonType will implement bounding box + distance filtering. I imagine people will want to override the FieldTypes for their specifics on creating the Filter. Right now, PointType really is designed for a rectangular coordinate system. Most people will likely use LatLonType, which will be geared towards spherical (i.e. Earth). But, agreed, we will be able to optimize over time. is there a reason why it's ResourceLoaderAware? Artifact of earlier design. Removed. SpatialQParserPluginTest Already changed and "higher" level, just haven't posted the patch. I actually renamed it too. oops, I see that this is already done under "sfilt" currently - so it can just be removed from solrconfig.xml Done. before release, we'll prob want to cut down the spatial field types to just one recommended one that people can use w/o having to figure out the difference between all the types. OK to keep multiple in during development though... makes for easier ad hoc testing. For now, I have a comment in the schema that says something to the effect that you likely will not need all of these. I think this gets to a bigger discussion we should have on dev@ about what the example should be. I'd like the example to be more self contained and more self explanatory. We could have a light version and a full featured one. With an eye toward the end game, here's an example of what we might want to shoot for: Agreed the syntax is nice. I don't think I will do that as part of this patch. Otherwise, I think all of those use cases are covered.
          Hide
          Grant Ingersoll added a comment -

          Got some tests and most of the rework from Yonik's comments. Some of the tests explicitly fail due to bugs in the underlying tile stuff in Lucene.

          Added support for handling the poles and the prime and 180th meridian to the LatLonType. I think we're in pretty good shape now, assuming the underlying Lucene bits get fixed soon.

          Show
          Grant Ingersoll added a comment - Got some tests and most of the rework from Yonik's comments. Some of the tests explicitly fail due to bugs in the underlying tile stuff in Lucene. Added support for handling the poles and the prime and 180th meridian to the LatLonType. I think we're in pretty good shape now, assuming the underlying Lucene bits get fixed soon.
          Hide
          Grant Ingersoll added a comment -

          Not an absolute blocker, but would be good to get this fixed.

          Show
          Grant Ingersoll added a comment - Not an absolute blocker, but would be good to get this fixed.
          Hide
          Grant Ingersoll added a comment -

          Unit test that shows problems at the pole.

          Show
          Grant Ingersoll added a comment - Unit test that shows problems at the pole.
          Hide
          Chris Male added a comment -

          Hi,

          I have taken a look at the latest patch and it seems fine. Any TODOs and problems with the poles can be tackled as separate/sub-issues. Particularly the pole bugs. I'd prefer to see this committed soon so we can take a look at the overall architecture, and see what parts we want to move into a module.

          Show
          Chris Male added a comment - Hi, I have taken a look at the latest patch and it seems fine. Any TODOs and problems with the poles can be tackled as separate/sub-issues. Particularly the pole bugs. I'd prefer to see this committed soon so we can take a look at the overall architecture, and see what parts we want to move into a module.
          Hide
          Grant Ingersoll added a comment -

          changing SpatialOptions w/o breaking back compat

          I wonder if we shouldn't also pass along the request parameters on this, so that if one is extending a FieldType for spatial, which I think will be somewhat common, they can have access to the params.

          Show
          Grant Ingersoll added a comment - changing SpatialOptions w/o breaking back compat I wonder if we shouldn't also pass along the request parameters on this, so that if one is extending a FieldType for spatial, which I think will be somewhat common, they can have access to the params.
          Hide
          Yonik Seeley added a comment -

          I wonder if we shouldn't also pass along the request parameters on this,

          IIRC, you pass along the QParser, and that has pointers to the complete context - stuff like the request object.

          Show
          Yonik Seeley added a comment - I wonder if we shouldn't also pass along the request parameters on this, IIRC, you pass along the QParser, and that has pointers to the complete context - stuff like the request object.
          Hide
          Grant Ingersoll added a comment -

          even better. Patch soon

          Show
          Grant Ingersoll added a comment - even better. Patch soon
          Hide
          Yonik Seeley added a comment -

          even better. Patch soon

          I meant, you already do pass along the QParser, so no patch should be necessary?

          public interface SpatialQueryable {
            public Query createSpatialQuery(QParser parser, SpatialOptions options);
          }
          

          Or perhaps you mean somewhere else?

          That extra request context is why I started adding QParser to many of the new FieldType methods such as FieldType.getRangeQuery() too.

          Show
          Yonik Seeley added a comment - even better. Patch soon I meant, you already do pass along the QParser, so no patch should be necessary? public interface SpatialQueryable { public Query createSpatialQuery(QParser parser, SpatialOptions options); } Or perhaps you mean somewhere else? That extra request context is why I started adding QParser to many of the new FieldType methods such as FieldType.getRangeQuery() too.
          Hide
          Grant Ingersoll added a comment -

          No, you are right. I wasn't thinking clearly.

          Show
          Grant Ingersoll added a comment - No, you are right. I wasn't thinking clearly.
          Hide
          Grant Ingersoll added a comment -

          small updates, started working on the multi-field solution. Almost ready

          Show
          Grant Ingersoll added a comment - small updates, started working on the multi-field solution. Almost ready
          Hide
          Chris A. Mattmann added a comment -

          +1 on the latest patch with the following caveat:

          Why not put:

             public static final double DEGREES_TO_RADIANS = Math.PI / 180.0;
             public static final double RADIANS_TO_DEGREES = 180.0 / Math.PI;
          +  public static final double DEG_45 = Math.PI / 4.0;
          +  public static final double DEG_225 = 5 * DEG_45;
          +  public static final double DEG_90 = Math.PI / 2;
          +  public static final double DEG_180 = Math.PI;
          +  public static final double SIN_45 = Math.sin(DEG_45);
          

          into solr/src/java/org/apache/solr/search/function/distance/Constants.java, and then make DistanceUtils implement (or static import) those constants?

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - +1 on the latest patch with the following caveat: Why not put: public static final double DEGREES_TO_RADIANS = Math .PI / 180.0; public static final double RADIANS_TO_DEGREES = 180.0 / Math .PI; + public static final double DEG_45 = Math .PI / 4.0; + public static final double DEG_225 = 5 * DEG_45; + public static final double DEG_90 = Math .PI / 2; + public static final double DEG_180 = Math .PI; + public static final double SIN_45 = Math .sin(DEG_45); into solr/src/java/org/apache/solr/search/function/distance/Constants.java, and then make DistanceUtils implement (or static import) those constants? Cheers, Chris
          Hide
          Brainski added a comment - - edited

          Hi

          I tested the last patch. I was able to apply it to the latest dev subversion check out and I was able to recompile the whole thing (kind of difficult for a non java programmer g).

          Finally I modified my schema.xml and was able to use the sfilt in my range query. Its working quite well with my data.

          My wishes:

          Please update the wiki according to the changes:
          http://wiki.apache.org/solr/SpatialSearch

          Under Filtering:
          fq=

          {!sfilt fl=store_lat_lon units=km}

          &pt=47.508249,8.731969&d=12

          The units parameter and the meas parameter is not explained.
          The dist parameter is renamed to d

          Under Scoring:
          An example for a boost function is the following. Please add this also to the boost functions, because it took me hours to find out the correct syntax without whitespaces.
          qt=dismax&bf=recip(2,store,vector(40.7143,-74.006)),1,1,1)^30

          Show
          Brainski added a comment - - edited Hi I tested the last patch. I was able to apply it to the latest dev subversion check out and I was able to recompile the whole thing (kind of difficult for a non java programmer g ). Finally I modified my schema.xml and was able to use the sfilt in my range query. Its working quite well with my data. My wishes: Please update the wiki according to the changes: http://wiki.apache.org/solr/SpatialSearch Under Filtering: fq= {!sfilt fl=store_lat_lon units=km} &pt=47.508249,8.731969&d=12 The units parameter and the meas parameter is not explained. The dist parameter is renamed to d Under Scoring: An example for a boost function is the following. Please add this also to the boost functions, because it took me hours to find out the correct syntax without whitespaces. qt=dismax&bf=recip(2,store,vector(40.7143,-74.006)),1,1,1)^30
          Hide
          Grant Ingersoll added a comment -

          not much changed, but just putting what I have up, so that it is captured.

          Show
          Grant Ingersoll added a comment - not much changed, but just putting what I have up, so that it is captured.
          Hide
          David Smiley added a comment -

          I've finally tried this out successfully. But I needed a lat-lon box search, not a circle. It wasn't hard to modify the code to offer a lower-left point and upper-right point. I only implemented this for the Spatial Tiers point implementation. The odd part was that the code internally wanted to know the diameter of a circle based search in miles. So I translated this to the distance from one corner of the box search to the oppose one... then divided by the square root of 2 to reduce this amount to what the diameter would be of a circle fitting within the box search.

          I haven't bothered with any scoring modification, I'm most interested in filtering.

          Show
          David Smiley added a comment - I've finally tried this out successfully. But I needed a lat-lon box search, not a circle. It wasn't hard to modify the code to offer a lower-left point and upper-right point. I only implemented this for the Spatial Tiers point implementation. The odd part was that the code internally wanted to know the diameter of a circle based search in miles. So I translated this to the distance from one corner of the box search to the oppose one... then divided by the square root of 2 to reduce this amount to what the diameter would be of a circle fitting within the box search. I haven't bothered with any scoring modification, I'm most interested in filtering.
          Hide
          Kyle Wilkinson added a comment -

          I was able to get this up and running with the latest patch, but I had to add a dynamicField to my schema to get everything working:
          <field name="latlon" type="latLon" indexed="true" stored="true"/>
          <dynamicField name="*_latLon" type="tdouble" indexed="true" stored="false"/>

          Is this expected? If so, can you add it to the wiki to save the next person the trouble? Also, some feedback from trying to write boost and sort functions: The LatLon type is great and it is very convenient that it handles calculating spherical distances (there's probably a better term for this) for you. However, I could not figure out how to take advantage of writing my own boost function that uses this distance using the LatLon field. Both the hsin and dist functions require the the components of the point / don't accept a LatLon (at least as documented). To use them, I ended up having to store separate fields in my document for latitude and longitude, as well as a LatLon. Also, it would be very convenient to not have to pass in the radius of the earth into the hsin function. There's a good chance with my feedback that I'm just not doing something right or don't fully understand the capabilities offered. Overall this functionality is very useful and saved me a ton of time. Thanks!

          Show
          Kyle Wilkinson added a comment - I was able to get this up and running with the latest patch, but I had to add a dynamicField to my schema to get everything working: <field name="latlon" type="latLon" indexed="true" stored="true"/> <dynamicField name="*_latLon" type="tdouble" indexed="true" stored="false"/> Is this expected? If so, can you add it to the wiki to save the next person the trouble? Also, some feedback from trying to write boost and sort functions: The LatLon type is great and it is very convenient that it handles calculating spherical distances (there's probably a better term for this) for you. However, I could not figure out how to take advantage of writing my own boost function that uses this distance using the LatLon field. Both the hsin and dist functions require the the components of the point / don't accept a LatLon (at least as documented). To use them, I ended up having to store separate fields in my document for latitude and longitude, as well as a LatLon. Also, it would be very convenient to not have to pass in the radius of the earth into the hsin function. There's a good chance with my feedback that I'm just not doing something right or don't fully understand the capabilities offered. Overall this functionality is very useful and saved me a ton of time. Thanks!
          Hide
          David Smiley added a comment -

          I commented too soon before. The results seemed erroneous so I reverted the Solr code to r941377 from last week when Solr's tests passed (as reported by Hudson) then applied a clean patch downloaded from this issue. SpatialFilterTest fails its tests except for testPoints(). I was most interested in testTiles which failed. Have others gotten these tests to work?

          Show
          David Smiley added a comment - I commented too soon before. The results seemed erroneous so I reverted the Solr code to r941377 from last week when Solr's tests passed (as reported by Hudson) then applied a clean patch downloaded from this issue. SpatialFilterTest fails its tests except for testPoints(). I was most interested in testTiles which failed. Have others gotten these tests to work?
          Hide
          Grant Ingersoll added a comment -

          Sorry for the confusion. The tests for this are still broken "by design", namely b/c of the pole bugs. Thus, while I believe most of the patch works, the Solr tests will fail.

          Show
          Grant Ingersoll added a comment - Sorry for the confusion. The tests for this are still broken "by design", namely b/c of the pole bugs. Thus, while I believe most of the patch works, the Solr tests will fail.
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          David Smiley added a comment -

          I have been testing the capabilities of this patch further. I am using LatLonType and I have multiple points per document. It doesn't appear that the logic in LatLonType properly handles multi-value fields since its doing range queries for the latitude and longitude separately. I have a document with a couple points it, Copenhagen Denmark and Ethiopia. If I query for points in the box Lat: 10.8 - 5.4, and Lon: 9.50 - 14.23 then I find this document when I shouldn't. This box is at a place south of Copenhagen and West of Ethiopia.

          Oh by the way, I made some modifications after applying the patch so that I could query by a lat-lon box instead of a point-radius. It was very easy.

          Show
          David Smiley added a comment - I have been testing the capabilities of this patch further. I am using LatLonType and I have multiple points per document. It doesn't appear that the logic in LatLonType properly handles multi-value fields since its doing range queries for the latitude and longitude separately. I have a document with a couple points it, Copenhagen Denmark and Ethiopia. If I query for points in the box Lat: 10.8 - 5.4, and Lon: 9.50 - 14.23 then I find this document when I shouldn't. This box is at a place south of Copenhagen and West of Ethiopia. Oh by the way, I made some modifications after applying the patch so that I could query by a lat-lon box instead of a point-radius. It was very easy.
          Hide
          Darren Govoni added a comment -

          Yeah, I brought this issue up on the mailing list. Its a problem that originates in other blogs circulating about doing ranged queries on SQL tables to achieve spatial. In that context it works because the lat and lot are constrained together in the row. In solr documents you can have numerous free floating lats and lons values so a range query cannot be used alone. The intermediate lat lon results from a ranged calculus need to be constrained together otherwise false positives can occur (e.g. separate points have a lat and lon that individually satisfy the range but the point does not).

          Still waiting to understand how it will work. Currently, it is not entirely spatial.

          Also, the distance parameter for points. How is it to be used for bounding box, where the distance varies? In that case, calculating distance across the extent of the box is a waste of cpu - a different calculation is needed for that - and a joined ranged could work.

          I have a proposal to fix this if its needed.

          Show
          Darren Govoni added a comment - Yeah, I brought this issue up on the mailing list. Its a problem that originates in other blogs circulating about doing ranged queries on SQL tables to achieve spatial. In that context it works because the lat and lot are constrained together in the row. In solr documents you can have numerous free floating lats and lons values so a range query cannot be used alone. The intermediate lat lon results from a ranged calculus need to be constrained together otherwise false positives can occur (e.g. separate points have a lat and lon that individually satisfy the range but the point does not). Still waiting to understand how it will work. Currently, it is not entirely spatial. Also, the distance parameter for points. How is it to be used for bounding box, where the distance varies? In that case, calculating distance across the extent of the box is a waste of cpu - a different calculation is needed for that - and a joined ranged could work. I have a proposal to fix this if its needed.
          Hide
          Grant Ingersoll added a comment -

          RE: multivalued.

          Yes, this isn't necessarily a spatial issue. Best solution would be a position based query that took into account a specific position kind of like SpanFirstQuery.

          Also, the distance parameter for points. How is it to be used for bounding box, where the distance varies? In that case, calculating distance across the extent of the box is a waste of cpu - a different calculation is needed for that - and a joined ranged could work.

          I have a proposal to fix this if its needed.

          Sure, it would be useful to have a proposal for this. Hasn't been my focus for now, but open to suggestions.

          Show
          Grant Ingersoll added a comment - RE: multivalued. Yes, this isn't necessarily a spatial issue. Best solution would be a position based query that took into account a specific position kind of like SpanFirstQuery. Also, the distance parameter for points. How is it to be used for bounding box, where the distance varies? In that case, calculating distance across the extent of the box is a waste of cpu - a different calculation is needed for that - and a joined ranged could work. I have a proposal to fix this if its needed. Sure, it would be useful to have a proposal for this. Hasn't been my focus for now, but open to suggestions.
          Hide
          Grant Ingersoll added a comment -

          Up to date with trunk, also now reuses the CartesianShapeFilter instead of rolling my own. Still fails tests, etc. around the poles. Otherwise, I think it is usable for those who don't have data around the poles.

          Show
          Grant Ingersoll added a comment - Up to date with trunk, also now reuses the CartesianShapeFilter instead of rolling my own. Still fails tests, etc. around the poles. Otherwise, I think it is usable for those who don't have data around the poles.
          Hide
          Grant Ingersoll added a comment -

          OK, here's my take on dealing w/ the poles.

          I know how to determine when we crossed the pole (you can use angular distance and latitude), so my proposal is that we handle the poles in a slightly different way than the typical "bounding box" approach b/c the queries get too complicated.

          I have two proposals for dealing with the Poles:

          1. Document that we do not deal with the poles and throw an exception. Someone who needs calculations around the pole can implement their own projection and likely come up w/ a far better way of handling things than I have time for at the moment.
          2. Instead of a bounding box, create a bounding bowl by taking the smaller (closer to the equator) latitude (i.e. the lower left latitude) and then get all points that are above this latitude. Obviously, this is inaccurate when considered in light of the bounding box model, but it will be faster/simpler and will mainly be a documentation issue and is better than simply throwing an exception.

          Thoughts? I'm also open to others who have more knowledge about dealing with the Poles to produce a patch.

          Show
          Grant Ingersoll added a comment - OK, here's my take on dealing w/ the poles. I know how to determine when we crossed the pole (you can use angular distance and latitude), so my proposal is that we handle the poles in a slightly different way than the typical "bounding box" approach b/c the queries get too complicated. I have two proposals for dealing with the Poles: Document that we do not deal with the poles and throw an exception. Someone who needs calculations around the pole can implement their own projection and likely come up w/ a far better way of handling things than I have time for at the moment. Instead of a bounding box, create a bounding bowl by taking the smaller (closer to the equator) latitude (i.e. the lower left latitude) and then get all points that are above this latitude. Obviously, this is inaccurate when considered in light of the bounding box model, but it will be faster/simpler and will mainly be a documentation issue and is better than simply throwing an exception. Thoughts? I'm also open to others who have more knowledge about dealing with the Poles to produce a patch.
          Hide
          Grant Ingersoll added a comment -

          So, the correct name for approach 2 above is spherical cap: http://mathworld.wolfram.com/SphericalCap.html

          Show
          Grant Ingersoll added a comment - So, the correct name for approach 2 above is spherical cap: http://mathworld.wolfram.com/SphericalCap.html
          Hide
          Grant Ingersoll added a comment -

          Note, this bowl approach also has benefits for the LatLonType in that we don't have to do a range query on the longitude, just the latitude, which translates to one less field query

          Show
          Grant Ingersoll added a comment - Note, this bowl approach also has benefits for the LatLonType in that we don't have to do a range query on the longitude, just the latitude, which translates to one less field query
          Hide
          Lance Norskog added a comment -

          Use quaternions: http://local.wasp.uwa.edu.au/~pbourke/miscellaneous/quaternions/

          Quaternions allow you to embed 3D & polar in a 4-number mathematical space. You can do all of the spatial manipulations without zero points. For example, geohash fails in London because the Greenwich line is the zero point. Quats give you a basis for a geohash format that works everywhere.

          Show
          Lance Norskog added a comment - Use quaternions: http://local.wasp.uwa.edu.au/~pbourke/miscellaneous/quaternions/ Quaternions allow you to embed 3D & polar in a 4-number mathematical space. You can do all of the spatial manipulations without zero points. For example, geohash fails in London because the Greenwich line is the zero point. Quats give you a basis for a geohash format that works everywhere.
          Hide
          Uwe Schindler added a comment -

          Grant: Spherical Cap is the way almost all Map-based solutions work to clip out the current map viewport. If you e.g. Look with Google Earth on the north pole and use viewport based KML reloading, GE will send a BBOX to your KML producing servlet only with the south latitude filled and all others are complete open. So as soon as you cross the pole, it just gets a lessThan/greaterThan.

          For crossing the date line, you get a "inverse" BBOX, with EastBoundLongitude < WestBoundLongitude.

          These are the only two special cases in the BBOX model used by maps applications (Google Earth, Google Maps, WMS/WFS).

          Show
          Uwe Schindler added a comment - Grant: Spherical Cap is the way almost all Map-based solutions work to clip out the current map viewport. If you e.g. Look with Google Earth on the north pole and use viewport based KML reloading, GE will send a BBOX to your KML producing servlet only with the south latitude filled and all others are complete open. So as soon as you cross the pole, it just gets a lessThan/greaterThan. For crossing the date line, you get a "inverse" BBOX, with EastBoundLongitude < WestBoundLongitude. These are the only two special cases in the BBOX model used by maps applications (Google Earth, Google Maps, WMS/WFS).
          Hide
          Grant Ingersoll added a comment -

          This version passes all tests, but I have not tried the example yet.

          I removed SpatialTileField (see http://search.lucidimagination.com/search/document/db59ada859289cfa/spatial_rethinking_cartesian_tiers_implementation#c32e81783642df47) and reverted all changes to the tiers package. Tiers are simply too broken to justify putting them into Solr

          Still a fair amount of clean up, review, testing, etc. to be done.

          Show
          Grant Ingersoll added a comment - This version passes all tests, but I have not tried the example yet. I removed SpatialTileField (see http://search.lucidimagination.com/search/document/db59ada859289cfa/spatial_rethinking_cartesian_tiers_implementation#c32e81783642df47 ) and reverted all changes to the tiers package. Tiers are simply too broken to justify putting them into Solr Still a fair amount of clean up, review, testing, etc. to be done.
          Hide
          Grant Ingersoll added a comment -

          Renames SpatialQParser to SpatialFilterQParser to be in line with the Plugin name.

          Cleans up some of the SpatialFitlerTest class and adds explicit checking of doc ids.

          Show
          Grant Ingersoll added a comment - Renames SpatialQParser to SpatialFilterQParser to be in line with the Plugin name. Cleans up some of the SpatialFitlerTest class and adds explicit checking of doc ids.
          Hide
          Grant Ingersoll added a comment -

          Minor doc fixes.

          I think this is ready to go.

          Show
          Grant Ingersoll added a comment - Minor doc fixes. I think this is ready to go.
          Hide
          Yonik Seeley added a comment -

          Configurable units may look like a convenience at first blush, but I think they will end up just adding significant burden or confusion to clients (for many of the reasons I already listed).

          Show
          Yonik Seeley added a comment - Configurable units may look like a convenience at first blush, but I think they will end up just adding significant burden or confusion to clients (for many of the reasons I already listed).
          Hide
          Grant Ingersoll added a comment -

          FYI: after committing, I intend to immediately move DistanceUtils up one package, as there is nothing tier specific about them. I'm leaving them where they are now due to the fact that I don't think SVN will work very nicely with a change, delete and add.

          Show
          Grant Ingersoll added a comment - FYI: after committing, I intend to immediately move DistanceUtils up one package, as there is nothing tier specific about them. I'm leaving them where they are now due to the fact that I don't think SVN will work very nicely with a change, delete and add.
          Hide
          Grant Ingersoll added a comment -

          Committed revision 962727.

          Show
          Grant Ingersoll added a comment - Committed revision 962727.
          Hide
          Grant Ingersoll added a comment -

          Committed to 3.x too.

          Show
          Grant Ingersoll added a comment - Committed to 3.x too.
          Hide
          Bill Bell added a comment -

          The calculations of distance appears to be off.

          Note: "The radius of the sphere to be used when calculating distances on a sphere (i.e. haversine). Default is the Earth's mean radius in kilometers (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) which is set to 3,958.761458084784856. Most applications will not need to set this."

          The radius of the earth in KM is 6371.009 km (≈3958.761 mi).

          Also filtering distance appears to be off - example data:

          45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 miles = 220 kilometers

          http://....../solr/select?fl=*,score&start=0&rows=10&q=

          {!sfilt%20fl=store_lat_lon}

          &qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348)) asc

          Nothing shows. d=285 shows results. This is off by a lot.

          Bill

          Show
          Bill Bell added a comment - The calculations of distance appears to be off. Note: "The radius of the sphere to be used when calculating distances on a sphere (i.e. haversine). Default is the Earth's mean radius in kilometers (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) which is set to 3,958.761458084784856. Most applications will not need to set this." The radius of the earth in KM is 6371.009 km (≈3958.761 mi). Also filtering distance appears to be off - example data: 45.17614,-93.87341 to 44.9369054,-91.3929348 Approx 137 miles Google. 169 miles = 220 kilometers http://....../solr/select?fl=*,score&start=0&rows=10&q= {!sfilt%20fl=store_lat_lon} &qt=standard&pt=44.9369054,-91.3929348&d=280&sort=dist(2,store,vector(44.9369054,-91.3929348)) asc Nothing shows. d=285 shows results. This is off by a lot. Bill
          Hide
          Grant Ingersoll added a comment -

          Default is the Earth's mean radius in kilometers (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) which is set to 3,958.761458084784856

          That's a typo. I'll fix.

          I think Yonik just fixed this on rev 999175. Can you try trunk?

          Show
          Grant Ingersoll added a comment - Default is the Earth's mean radius in kilometers (see org.apache.solr.search.function.distance.Constants.EARTH_MEAN_RADIUS_KM) which is set to 3,958.761458084784856 That's a typo. I'll fix. I think Yonik just fixed this on rev 999175. Can you try trunk?
          Hide
          Yonik Seeley added a comment -

          Didn't fix things (it was just a pole check) - the bounding box calc is off... see SOLR-2125

          Show
          Yonik Seeley added a comment - Didn't fix things (it was just a pole check) - the bounding box calc is off... see SOLR-2125
          Hide
          Yonik Seeley added a comment -

          Here's a patch that changes sfilt to use getParam() which first gets from local params, but then also checks global request params if missing. I also added "sfield" since getting "fl" from global params as the spatial field is obviously wrong.

          Since a single solr request may have multiple spatial related things (distance function, sort by distance, filter) it allows easier sharing.

          example: sfield=store&pt=10,20&d=200&fq=

          {!sfilt}

          &sort=sdist() asc

          (I made up sdist, it doesn't exist yet... but you get the idea).

          Show
          Yonik Seeley added a comment - Here's a patch that changes sfilt to use getParam() which first gets from local params, but then also checks global request params if missing. I also added "sfield" since getting "fl" from global params as the spatial field is obviously wrong. Since a single solr request may have multiple spatial related things (distance function, sort by distance, filter) it allows easier sharing. example: sfield=store&pt=10,20&d=200&fq= {!sfilt} &sort=sdist() asc (I made up sdist, it doesn't exist yet... but you get the idea).
          Hide
          Yonik Seeley added a comment -

          Hmmm, I'm not understanding the purpose of the checks for if we cross the equator, and
          addEquatorialBoundary(). It looks like it creates two ranges... [min TO 0] and [0 to max], where
          [min TO max] should always work. Is there something special about the equator I'm missing?

          Show
          Yonik Seeley added a comment - Hmmm, I'm not understanding the purpose of the checks for if we cross the equator, and addEquatorialBoundary(). It looks like it creates two ranges... [min TO 0] and [0 to max] , where [min TO max] should always work. Is there something special about the equator I'm missing?
          Hide
          Yonik Seeley added a comment -

          I'm further confused by the following:

             else if (ll[LONG] < 0.0 && ur[LONG] > 0.0) {//prime meridian (0 degrees
          

          I don't see what's special about crossing the 0 deg longitude line here.
          It seems like the only special-case for longitude is if ll > ur, in which case you went over the +-180 line and need two ranges to cover that [ur TO -180] OR [ll TO 180] ?

          Show
          Yonik Seeley added a comment - I'm further confused by the following: else if (ll[LONG] < 0.0 && ur[LONG] > 0.0) { //prime meridian (0 degrees I don't see what's special about crossing the 0 deg longitude line here. It seems like the only special-case for longitude is if ll > ur, in which case you went over the +-180 line and need two ranges to cover that [ur TO -180] OR [ll TO 180] ?
          Hide
          Yonik Seeley added a comment -

          FYI, I'm in the middle of refactoring this code, so please give feedback as comments, not patches for now...

          Show
          Yonik Seeley added a comment - FYI, I'm in the middle of refactoring this code, so please give feedback as comments, not patches for now...
          Hide
          Chris Male added a comment -

          Yonik, I recommend you glance over http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates (which I think Bill mentioned earlier). In addition to proposing quite a nice bounding box algorithm, he also discusses how to handle the poles and the 180 degree line. I recommend we follow the behaviour that he suggests.

          Show
          Chris Male added a comment - Yonik, I recommend you glance over http://janmatuschek.de/LatitudeLongitudeBoundingCoordinates (which I think Bill mentioned earlier). In addition to proposing quite a nice bounding box algorithm, he also discusses how to handle the poles and the 180 degree line. I recommend we follow the behaviour that he suggests.
          Hide
          Yonik Seeley added a comment -

          OK, so I didn't touch the current math, but I heavily re-factored (and fixed I hope) the logic.

          It breaks down like this: we always have exactly one latitude constraint, and between 0 and 2 longitude constraints. (0 when doing a polar cap, 1 normal, 2 when we cross the +-180 long line)

          The part of the code that calculates the ranges and builds the query is now much shorter (ignore the other work in progress, SpatialDistanceQuery, for now).

          Show
          Yonik Seeley added a comment - OK, so I didn't touch the current math, but I heavily re-factored (and fixed I hope) the logic. It breaks down like this: we always have exactly one latitude constraint, and between 0 and 2 longitude constraints. (0 when doing a polar cap, 1 normal, 2 when we cross the +-180 long line) The part of the code that calculates the ranges and builds the query is now much shorter (ignore the other work in progress, SpatialDistanceQuery, for now).
          Hide
          Yonik Seeley added a comment -

          I recommend we follow the behaviour that he suggests.

          Looks like a really nice page. I don't have time to figure out and redo any math right now, but I hope in the future that we can have random tests to verify this stuff.

          Show
          Yonik Seeley added a comment - I recommend we follow the behaviour that he suggests. Looks like a really nice page. I don't have time to figure out and redo any math right now, but I hope in the future that we can have random tests to verify this stuff.
          Hide
          Chris Male added a comment -

          Yeah it is a nice page, and its math I think we should follow, but I'll attack that in another issue.

          Show
          Chris Male added a comment - Yeah it is a nice page, and its math I think we should follow, but I'll attack that in another issue.
          Hide
          Grant Ingersoll added a comment -

          Yonik, you are right on the equator and 0 meridian. I think I was just so into the tile thing at the time, that I carried it over. We just need the 180 meridian and the poles.

          As for the math on the box, I think that works, but what we have now is pretty simple too and covers the whole circle of the radius specified by the distance.

          Show
          Grant Ingersoll added a comment - Yonik, you are right on the equator and 0 meridian. I think I was just so into the tile thing at the time, that I carried it over. We just need the 180 meridian and the poles. As for the math on the box, I think that works, but what we have now is pretty simple too and covers the whole circle of the radius specified by the distance.
          Hide
          Yonik Seeley added a comment -

          Here's a refresh (still uncompleted stuff in there).

          • adds a bbox query for "bounding box" (or basically, fast but may include points outside the actual distance/shape)
          • changes sfilt to filter out all points outside of the distance (for latlon type) - I think this matches users expectations better.

          So when you ask for a box, you get a range query on the underlying tri-range fields of course.
          But when you ask for a filter (i.e. distance must be calculated to tell if it truly matches) then I actually use the value sources and check against the box, since those values will be needed sometimes to calculate the radius anyway.

          Not sure what I think about that... it will be faster if the main query doesn't select many documents (I plan on executing this in parallel with the query), but slower if it does.

          Show
          Yonik Seeley added a comment - Here's a refresh (still uncompleted stuff in there). adds a bbox query for "bounding box" (or basically, fast but may include points outside the actual distance/shape) changes sfilt to filter out all points outside of the distance (for latlon type) - I think this matches users expectations better. So when you ask for a box, you get a range query on the underlying tri-range fields of course. But when you ask for a filter (i.e. distance must be calculated to tell if it truly matches) then I actually use the value sources and check against the box, since those values will be needed sometimes to calculate the radius anyway. Not sure what I think about that... it will be faster if the main query doesn't select many documents (I plan on executing this in parallel with the query), but slower if it does.
          Hide
          Yonik Seeley added a comment -

          Here's a patch with SpatialDistanceQuery working pretty well now, including nice explains / toString.
          It also produces scores (the distance) so it can be used as the main query if desired.

          We also really need a way to get these more expensive filters to run in parallel with the main query.

          Show
          Yonik Seeley added a comment - Here's a patch with SpatialDistanceQuery working pretty well now, including nice explains / toString. It also produces scores (the distance) so it can be used as the main query if desired. We also really need a way to get these more expensive filters to run in parallel with the main query.
          Hide
          Yonik Seeley added a comment - - edited

          Here's a patch that implements a more user friendly distance function called "sdist".
          Way back, the original plan was to dispatch function queries as well as filters through the SpatialQueryable interface - but for now this is hardcoded to haversine on vector value sources.

          Like sfilt, if needed, it can read arguments from the global params "pt" and "sfield". This will make it easy for a user to set these once in a request and still use a number of filter and distance operations w/o repeating those params.

          Examples:
          sfilt(1,2,3,4)
          sfilt($a,$b)&a=1,2&b=3,4
          sfilt(1,2)&pt=3,4
          sfilt(1,2)&sfield=store
          sfilt()&pt=1,2&sfield=store
          sfilt(store1,store2)
          sfilt(vector(mylat,mylon),3,4)
          sfilt(mylat,mylon,3,4)

          If it detects that one of the points is constant, and the other is a vector value source (the standard case)
          it uses a slightly more optimized version of haversine that avoids one cosine call and also avoids getting multiple values through an array.

          edit: oh, and everything is in degrees (users really don't want radians), distances are on earth (i.e. the sphere radius is not specified), and calculated distance values are in km.

          Show
          Yonik Seeley added a comment - - edited Here's a patch that implements a more user friendly distance function called "sdist". Way back, the original plan was to dispatch function queries as well as filters through the SpatialQueryable interface - but for now this is hardcoded to haversine on vector value sources. Like sfilt, if needed, it can read arguments from the global params "pt" and "sfield". This will make it easy for a user to set these once in a request and still use a number of filter and distance operations w/o repeating those params. Examples: sfilt(1,2,3,4) sfilt($a,$b)&a=1,2&b=3,4 sfilt(1,2)&pt=3,4 sfilt(1,2)&sfield=store sfilt()&pt=1,2&sfield=store sfilt(store1,store2) sfilt(vector(mylat,mylon),3,4) sfilt(mylat,mylon,3,4) If it detects that one of the points is constant, and the other is a vector value source (the standard case) it uses a slightly more optimized version of haversine that avoids one cosine call and also avoids getting multiple values through an array. edit: oh, and everything is in degrees (users really don't want radians), distances are on earth (i.e. the sphere radius is not specified), and calculated distance values are in km.
          Hide
          Grant Ingersoll added a comment -

          Note, we have a strdist. sdist sounds a lot like string distance to me.

          Also, I'm not sure, but perhaps you forget to add some new files to the patch? I don't see a new haversine call in the file, but maybe it's b/c it's too early in the morning.

          Show
          Grant Ingersoll added a comment - Note, we have a strdist. sdist sounds a lot like string distance to me. Also, I'm not sure, but perhaps you forget to add some new files to the patch? I don't see a new haversine call in the file, but maybe it's b/c it's too early in the morning.
          Hide
          Yonik Seeley added a comment -

          OK, here's the patch again (after I did "svn add" on the missing files).

          Any suggestions other than "sdist"?
          I would have preferred "dist", but it's taken.
          gdist (geo distance)? I just thought it was strange to have an "sfilt" but a "gdist".

          Show
          Yonik Seeley added a comment - OK, here's the patch again (after I did "svn add" on the missing files). Any suggestions other than "sdist"? I would have preferred "dist", but it's taken. gdist (geo distance)? I just thought it was strange to have an "sfilt" but a "gdist".
          Hide
          Grant Ingersoll added a comment -

          I could go for "gfilt" and "gdist" or "llfilt" and "lldist" (that's two lower case "L"s, as in lat-lon)

          Show
          Grant Ingersoll added a comment - I could go for "gfilt" and "gdist" or "llfilt" and "lldist" (that's two lower case "L"s, as in lat-lon)
          Hide
          Yonik Seeley added a comment -

          OK, final patch - this changes stuff to geofilt/geodist.

          Show
          Yonik Seeley added a comment - OK, final patch - this changes stuff to geofilt/geodist.
          Hide
          Lance Norskog added a comment - - edited

          1- There are no unit tests. One thing that has become apparent during the checkered history of Solr GIS projects is that this stuff needs really comprehensive testing. Barbie says "GIS is hard!".

          2- What is 'f' instead of CommonParams.FL ?

          Show
          Lance Norskog added a comment - - edited 1- There are no unit tests. One thing that has become apparent during the checkered history of Solr GIS projects is that this stuff needs really comprehensive testing. Barbie says "GIS is hard!". 2- What is 'f' instead of CommonParams.FL ?
          Hide
          Yonik Seeley added a comment -

          1- There are no unit tests.

          What's missing unit tests?

          What is 'f' instead of CommonParams.FL ?

          Since we fall back to global params, "fl" was not an option, so I changed it to "sfield".
          As a shortcut, local-param only, I changed "fl" to "f" for consistency with all of the other QParsers that use "f" for the field.

          Show
          Yonik Seeley added a comment - 1- There are no unit tests. What's missing unit tests? What is 'f' instead of CommonParams.FL ? Since we fall back to global params, "fl" was not an option, so I changed it to "sfield". As a shortcut, local-param only , I changed "fl" to "f" for consistency with all of the other QParsers that use "f" for the field.
          Hide
          Lance Norskog added a comment -

          I made a mistake in my text search, never mind. All I can find are nits about comparing doubles and printed doubles directly. This makes the test cases brittle. Perhaps SolrTestCaseJ4 could include 'compare double&float with epsilon' methods?

          Lance

          Show
          Lance Norskog added a comment - I made a mistake in my text search, never mind. All I can find are nits about comparing doubles and printed doubles directly. This makes the test cases brittle. Perhaps SolrTestCaseJ4 could include 'compare double&float with epsilon' methods? Lance
          Hide
          Yonik Seeley added a comment -

          All I can find are nits about comparing doubles and printed doubles directly.

          Yep - me too. Not an easy fix currently (junit can compare w/ epsilon, but you need the actual double values, which we don't have).
          I don't think xpath has epsilon comparison methods, but I was thinking of somehow adding epsilon support (maybe just global to start) to the json-testing framework I've been slowly building out. Prob won't get to it until tests break due to the brittleness

          Show
          Yonik Seeley added a comment - All I can find are nits about comparing doubles and printed doubles directly. Yep - me too. Not an easy fix currently (junit can compare w/ epsilon, but you need the actual double values, which we don't have). I don't think xpath has epsilon comparison methods, but I was thinking of somehow adding epsilon support (maybe just global to start) to the json-testing framework I've been slowly building out. Prob won't get to it until tests break due to the brittleness
          Hide
          Yonik Seeley added a comment -

          committed, w/ additional javadoc updates.

          Show
          Yonik Seeley added a comment - committed, w/ additional javadoc updates.
          Hide
          Yonik Seeley added a comment -

          Hmmm, I just discovered that our LatLon type doesn't support point queries or range queries!
          I'll see if I can rectify that quickly...

          Show
          Yonik Seeley added a comment - Hmmm, I just discovered that our LatLon type doesn't support point queries or range queries! I'll see if I can rectify that quickly...
          Hide
          Yonik Seeley added a comment -

          Here's the patch so you can do exact point matches (field queries) and manual bounding boxes (range queries) with our LatLonType.

          Show
          Yonik Seeley added a comment - Here's the patch so you can do exact point matches (field queries) and manual bounding boxes (range queries) with our LatLonType.
          Hide
          Grant Ingersoll added a comment -

          FYI, I believe I have backported this to 3.x

          Show
          Grant Ingersoll added a comment - FYI, I believe I have backported this to 3.x
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              3 Vote for this issue
              Watchers:
              9 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development