Solr
  1. Solr
  2. SOLR-3304

Add Solr support for the new Lucene spatial module

    Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Critical Critical
    • Resolution: Fixed
    • Affects Version/s: 4.0-ALPHA
    • Fix Version/s: 4.0
    • Component/s: None
    • Labels:

      Description

      Get the Solr spatial module integrated with the lucene spatial module.

        Issue Links

          Activity

          Hide
          David Smiley added a comment -

          We're going to get to it; don't worry Bill.

          Show
          David Smiley added a comment - We're going to get to it; don't worry Bill.
          Hide
          Bill Bell added a comment -

          Let me know if I can help.

          Show
          Bill Bell added a comment - Let me know if I can help.
          Hide
          David Smiley added a comment -

          Attached is the first draft.

          The Solr support for the new Lucene spatial module is a set of field types which are effectively adapters into the real logic in Lucene. Even all the configuration option initialization from the fieldType attributes are done in Lucene. A map of these is passed with a utility wrapper to detect which attributes were seen so that they can be removed afterwards. And so if you typo a parameter then you'll get an error like you would for any other field type.

          The Lucene spatial module supports sorting indirectly by having a spatial Query's score be the distance. Of course you don't always need the score/distance and so there is a local-param attribute needScore which can be set to false, such as in a filter query (see SOLR-2883).

          I'll say the tests here are so-so; I welcome input on how they can be structured to be more effective. Since the real logic is all in the Lucene spatial module, ideally the tests on the Solr side would be fairly minimal in this regard... just instantiate the various field types exercising a few features and a few options.

          Some TODOs:

          • Document how to use it in the wiki
          • At least some basic tests for the SpatialTwoDoublesFieldType and SpatialTermQueryPrefixTreeFieldtype.

          At least for now, I don't envision the capabilities here replacing the Solr spatial introduced in 3x; it is a new alternative.

          Show
          David Smiley added a comment - Attached is the first draft. The Solr support for the new Lucene spatial module is a set of field types which are effectively adapters into the real logic in Lucene. Even all the configuration option initialization from the fieldType attributes are done in Lucene. A map of these is passed with a utility wrapper to detect which attributes were seen so that they can be removed afterwards. And so if you typo a parameter then you'll get an error like you would for any other field type. The Lucene spatial module supports sorting indirectly by having a spatial Query's score be the distance. Of course you don't always need the score/distance and so there is a local-param attribute needScore which can be set to false, such as in a filter query (see SOLR-2883 ). I'll say the tests here are so-so; I welcome input on how they can be structured to be more effective. Since the real logic is all in the Lucene spatial module, ideally the tests on the Solr side would be fairly minimal in this regard... just instantiate the various field types exercising a few features and a few options. Some TODOs: Document how to use it in the wiki At least some basic tests for the SpatialTwoDoublesFieldType and SpatialTermQueryPrefixTreeFieldtype. At least for now, I don't envision the capabilities here replacing the Solr spatial introduced in 3x; it is a new alternative.
          Hide
          David Smiley added a comment -

          Another TODO:

          • in solr's example schema.xml add SpatialRecursivePrefixTreeFieldType and for that matter remove the old geohash one.
          Show
          David Smiley added a comment - Another TODO: in solr's example schema.xml add SpatialRecursivePrefixTreeFieldType and for that matter remove the old geohash one.
          Hide
          Itamar Syn-Hershko added a comment -

          In continuation to the discussion on the spatial4j list, +1 for having all the tests with actual spatial logic reside in the Lucene spatial module, and have the Solr tests rely on that

          Show
          Itamar Syn-Hershko added a comment - In continuation to the discussion on the spatial4j list, +1 for having all the tests with actual spatial logic reside in the Lucene spatial module, and have the Solr tests rely on that
          Hide
          Cody Young added a comment -

          Haven't seen any movement on this in a while. What needs to be finished for this to be committed?

          Show
          Cody Young added a comment - Haven't seen any movement on this in a while. What needs to be finished for this to be committed?
          Hide
          David Smiley added a comment -

          I've been slowly getting back into this. I developed some basic documentation but I haven't put it on the wiki yet. I think I'll start by simply adding it as new page. I welcome input on what to name the page since it works significantly different than the existing Solr spatial (what I call Solr 3 spatial). I moved a particular extensive geohash based search test that is in this patch to LUCENE-4157 last night. I think what remains as far as this code is concerned is mostly about testing various field types with their configuration, but not worrying about in-depth testing of the algorithms which instead belongs in Lucene spatial module's tests.

          Show
          David Smiley added a comment - I've been slowly getting back into this. I developed some basic documentation but I haven't put it on the wiki yet. I think I'll start by simply adding it as new page. I welcome input on what to name the page since it works significantly different than the existing Solr spatial (what I call Solr 3 spatial). I moved a particular extensive geohash based search test that is in this patch to LUCENE-4157 last night. I think what remains as far as this code is concerned is mostly about testing various field types with their configuration, but not worrying about in-depth testing of the algorithms which instead belongs in Lucene spatial module's tests.
          Hide
          David Smiley added a comment -
          Show
          David Smiley added a comment - Some initial documentation: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
          Hide
          David Smiley added a comment -

          Here is an updated patch.

          Even though Lucene spatial now has most/all of the tests that were in this patch, I feel they were short enough to just keep them here as well, in their Solr form, to demonstrate how to do queries at the Solr level.

          Changes:

          • Account for the Spatial4j query package moving to Lucene Spatial
          • Added SpatialRecursivePrefixTreeFieldType into the example schema.xml registered with name "location_2d_trie" – comments on name?
          • Removed the old GeoHashField from the example schema.xml
          • One Test file: TestSolr4Spatial that is parameterized for 3 geo fields with varying configuration.
          • Added test for lat,lon order in results (which is a known failure right now).

          Left to do:

          • Ensure output field is lat,lon order for a geospatial context. This should be addressed in Lucene Spatial or Spatial4j but probably not here, I feel. Comments?
          • Test 2d (non-geo) spatial. Ugh. I'd probably do another test file with similar tests, perhaps fewer.
          • Fix "distErrPct" taking a fraction and not a percent. This problem is in Lucene spatial.

          Concerns:

          • The TwoDoubles type is not tested here. If I were to add it, it would surely break as it can't handle circles and probably has other problems. Thoughts, Ryan? I know you wrote that one.
          • Consider moving the factory stuff out of Lucene spatial into the Solr adapters? I put them in Lucene spatial because I was thinking of the bigger picture of clients other than Solr that might use the same name-value initialization parameters. Chris doesn't seem to be a fan because the least-common-denominator way I represented this is a Map<String,String> which is admittedly dirtly.
          Show
          David Smiley added a comment - Here is an updated patch. Even though Lucene spatial now has most/all of the tests that were in this patch, I feel they were short enough to just keep them here as well, in their Solr form, to demonstrate how to do queries at the Solr level. Changes: Account for the Spatial4j query package moving to Lucene Spatial Added SpatialRecursivePrefixTreeFieldType into the example schema.xml registered with name "location_2d_trie" – comments on name? Removed the old GeoHashField from the example schema.xml One Test file: TestSolr4Spatial that is parameterized for 3 geo fields with varying configuration. Added test for lat,lon order in results (which is a known failure right now). Left to do: Ensure output field is lat,lon order for a geospatial context. This should be addressed in Lucene Spatial or Spatial4j but probably not here, I feel. Comments? Test 2d (non-geo) spatial. Ugh. I'd probably do another test file with similar tests, perhaps fewer. Fix "distErrPct" taking a fraction and not a percent. This problem is in Lucene spatial. Concerns: The TwoDoubles type is not tested here. If I were to add it, it would surely break as it can't handle circles and probably has other problems. Thoughts, Ryan? I know you wrote that one. Consider moving the factory stuff out of Lucene spatial into the Solr adapters? I put them in Lucene spatial because I was thinking of the bigger picture of clients other than Solr that might use the same name-value initialization parameters. Chris doesn't seem to be a fan because the least-common-denominator way I represented this is a Map<String,String> which is admittedly dirtly.
          Hide
          David Smiley added a comment -

          Attached is a new patch which assumes LUCENE-4188 (Strategy doesn't store shapes, and puts fieldName in Strategy) and LUCENE-4192 (no single createField()).

          Show
          David Smiley added a comment - Attached is a new patch which assumes LUCENE-4188 (Strategy doesn't store shapes, and puts fieldName in Strategy) and LUCENE-4192 (no single createField()).
          Hide
          Andy Fowler added a comment -

          I've been playing around with this against the current 4x branch. I've attached the original patch with one small tweak to AbstractSpatialFieldType.createFields, fixing a bug caught at compile-time.

          I'm pretty far out of my league trying to work on this, but I'm getting an NPE at AbstractSpatialFieldType.getQueryFromSpatialArgs(AbstractSpatialFieldType.java:178). It appears that createFields is never being called, so the fieldStrategyMap is empty when a query arrives. Not sure where in the lifecycle createFields is supposed to happen.

          Just throwing my progress out there for others looking at this.

          Show
          Andy Fowler added a comment - I've been playing around with this against the current 4x branch. I've attached the original patch with one small tweak to AbstractSpatialFieldType.createFields, fixing a bug caught at compile-time. I'm pretty far out of my league trying to work on this, but I'm getting an NPE at AbstractSpatialFieldType.getQueryFromSpatialArgs(AbstractSpatialFieldType.java:178). It appears that createFields is never being called, so the fieldStrategyMap is empty when a query arrives. Not sure where in the lifecycle createFields is supposed to happen. Just throwing my progress out there for others looking at this.
          Hide
          Andy Fowler added a comment -

          Attached is one more update to David Smiley's patch which resolves the NPE I was getting when trying to query on a geo field, before a document had been added (i.e. after restarting solr with an already-created index).

          Instead of assuming that the spacialStrategy had been instantiated during CreateFields, the same logic is used at query-time.

          It applies cleanly to branch_4x and all tests pass for me. Thanks for your work on this, David!

          Show
          Andy Fowler added a comment - Attached is one more update to David Smiley's patch which resolves the NPE I was getting when trying to query on a geo field, before a document had been added (i.e. after restarting solr with an already-created index). Instead of assuming that the spacialStrategy had been instantiated during CreateFields, the same logic is used at query-time. It applies cleanly to branch_4x and all tests pass for me. Thanks for your work on this, David!
          Hide
          David Smiley added a comment -

          Thanks for finding and fixing that bug Andy. Your fix wasn't quite right though since the getStrategy() method you refactored synchronized on a parameter (pointless) instead of the field. I fixed this.

          This new patch makes that and various other changes:

          • synchronized with the latest source tree (e.g. Spatial4j 0.3)
            • This means distances are now degrees based (0-180 for circle radius) not kilometers
          • removed ignoreIncompatibleGeometry option (see LUCENE-4173)
          • Use the input string as the stored value that is returned. So if you give "lat,lon" then that's what you get back, in whatever number of decimal places you chose.
          • added prefixGridScanLevel performance tuning option to SpatialRecursivePrefixTreeFieldType (simply exposed it from the strategy)
          • keep distErrPct as a fraction (no change)

          It would be nice to have a kilometer unit option but that isn't easily done until Spatial4j's shape reader gets to be more flexible. That can wait.

          That "needScore" local-param hack (see SOLR-2883) is unfortunate, as Solr can't get a Filter from a field type. I'm tempted to change the default to 'false' as leaving it at true' triggers large RAM requirements and slow-downs for SpatialRecursivePrefixTreeFieldType. This could be an opportunity to specify what the score should be, come to think of it. Instead of needScore="false", maybe score="none" (default) or score="distance" or score="recipDistance" or something like that.

          The TwoDoubles strategy needs more attention and tests in Lucene spatial, but I don't want that to hold up this patch. Shall I remove the adapter or let it get committed but don't advertise it until it's more worthy?

          Show
          David Smiley added a comment - Thanks for finding and fixing that bug Andy. Your fix wasn't quite right though since the getStrategy() method you refactored synchronized on a parameter (pointless) instead of the field. I fixed this. This new patch makes that and various other changes: synchronized with the latest source tree (e.g. Spatial4j 0.3) This means distances are now degrees based (0-180 for circle radius) not kilometers removed ignoreIncompatibleGeometry option (see LUCENE-4173 ) Use the input string as the stored value that is returned. So if you give "lat,lon" then that's what you get back, in whatever number of decimal places you chose. added prefixGridScanLevel performance tuning option to SpatialRecursivePrefixTreeFieldType (simply exposed it from the strategy) keep distErrPct as a fraction (no change) It would be nice to have a kilometer unit option but that isn't easily done until Spatial4j's shape reader gets to be more flexible. That can wait. That "needScore" local-param hack (see SOLR-2883 ) is unfortunate, as Solr can't get a Filter from a field type. I'm tempted to change the default to 'false' as leaving it at true' triggers large RAM requirements and slow-downs for SpatialRecursivePrefixTreeFieldType. This could be an opportunity to specify what the score should be, come to think of it. Instead of needScore="false", maybe score="none" (default) or score="distance" or score="recipDistance" or something like that. The TwoDoubles strategy needs more attention and tests in Lucene spatial, but I don't want that to hold up this patch. Shall I remove the adapter or let it get committed but don't advertise it until it's more worthy?
          Hide
          Andy Fowler added a comment -

          Thanks David! I'm just getting back to looking at this. We previously used your SOLR-2155 patch, and this one is a big deal to us. It looks like you're depending on Lucene 5 StorableField in the fieldtype (from LUCENE-3312). I can't tell from that ticket if those changes are planned to be backported.

          Show
          Andy Fowler added a comment - Thanks David! I'm just getting back to looking at this. We previously used your SOLR-2155 patch, and this one is a big deal to us. It looks like you're depending on Lucene 5 StorableField in the fieldtype (from LUCENE-3312 ). I can't tell from that ticket if those changes are planned to be backported.
          Hide
          David Smiley added a comment -

          RE StorableField – small details like that I work out at commit time. I develop against trunk and then the 4x backport has been trivial.

          Show
          David Smiley added a comment - RE StorableField – small details like that I work out at commit time. I develop against trunk and then the 4x backport has been trivial.
          Hide
          Robert Purdy added a comment -

          Hi there, I was wondering if there was any further documentation on the syntax to use this feature?

          I have applied the latest patch and read through the wiki page and I am unsure of how to get the distance back from the point provided in the url. What would be the best shape to use to accomplish finding documents with a lon/lat and a radius? My example url I am using is: http://localhost:8080/solr/test/select?q=*%3A*&wt=xml&fq=

          {!needScore=true}

          location:"Intersects(Circle(54.729696,-98.525391 d=0.0899320))"&fl=*,score

          Also mentioned in the wiki pages was that there was no documentation on sorting based on distance, anyone have an example of a sold query doing this?

          Thanks Robert.

          Show
          Robert Purdy added a comment - Hi there, I was wondering if there was any further documentation on the syntax to use this feature? I have applied the latest patch and read through the wiki page and I am unsure of how to get the distance back from the point provided in the url. What would be the best shape to use to accomplish finding documents with a lon/lat and a radius? My example url I am using is: http://localhost:8080/solr/test/select?q=*%3A*&wt=xml&fq= {!needScore=true} location:"Intersects(Circle(54.729696,-98.525391 d=0.0899320))"&fl=*,score Also mentioned in the wiki pages was that there was no documentation on sorting based on distance, anyone have an example of a sold query doing this? Thanks Robert.
          Hide
          David Smiley added a comment -

          Here is an updated patch. It depends on LUCENE-4208 (makeDistanceValueSource) and LUCENE-4389 (TwoDoubles dateline and circle) being applied.
          Notes:

          • TwoDoublesStrategy is a decent strategy now, due to LUCENE-4389 and tests in that issue and this one.
          • The 'needScore' local-param is gone, replaced by 'score'. 'score' can be set to 'none' (the default) to use a constant of 1, or 'distance' to use the distance, or 'recipDistance' to use the reciprocal distance. If you want to sort then use 'distance' (ascending) and if you want to boost relevancy by distance then I recommend 'recipDistance'.
          Show
          David Smiley added a comment - Here is an updated patch. It depends on LUCENE-4208 (makeDistanceValueSource) and LUCENE-4389 (TwoDoubles dateline and circle) being applied. Notes: TwoDoublesStrategy is a decent strategy now, due to LUCENE-4389 and tests in that issue and this one. The 'needScore' local-param is gone, replaced by 'score'. 'score' can be set to 'none' (the default) to use a constant of 1, or 'distance' to use the distance, or 'recipDistance' to use the reciprocal distance. If you want to sort then use 'distance' (ascending) and if you want to boost relevancy by distance then I recommend 'recipDistance'.
          Hide
          David Smiley added a comment -

          Hi Rob Purdy. I don't have time to explain this query as I write this but peruse this example query which does just about everything:

          http://localhost:8983/solr/select?q=*%3A*&wt=xml&fq=

          {!%20v=$sq}

          &sq=store:%22Intersects%28Circle%2854.729696,-98.525391%20d=10%29%29%22&debugQuery=on&sort=query%28$sortsq%29+asc&fl=id,store,score,distdeg:query%28$sortsq%29&sortsq=

          {!%20score=distance%20v=$sq}

          I'll update the wiki within the next few days, hopefully.

          Show
          David Smiley added a comment - Hi Rob Purdy. I don't have time to explain this query as I write this but peruse this example query which does just about everything: http://localhost:8983/solr/select?q=*%3A*&wt=xml&fq= {!%20v=$sq} &sq=store:%22Intersects%28Circle%2854.729696,-98.525391%20d=10%29%29%22&debugQuery=on&sort=query%28$sortsq%29+asc&fl=id,store,score,distdeg:query%28$sortsq%29&sortsq= {!%20score=distance%20v=$sq} I'll update the wiki within the next few days, hopefully.
          Hide
          David Smiley added a comment -

          The two issues this depends on are finally closed. I plan on committing this one tomorrow to allow more time for feedback.

          Show
          David Smiley added a comment - The two issues this depends on are finally closed. I plan on committing this one tomorrow to allow more time for feedback.
          Hide
          David Smiley added a comment -

          I made the return type of the AbstractSpatialFieldType.createField() be covariant and return Field, so as to be the same between both branches.

          Committed to trunk & 4x.

          Show
          David Smiley added a comment - I made the return type of the AbstractSpatialFieldType.createField() be covariant and return Field, so as to be the same between both branches. Committed to trunk & 4x.
          Hide
          David Smiley added a comment -

          The CHANGES.txt was just added as follows:

          * SOLR-3304: Added Solr adapters for Lucene 4's new spatial module.  With
            SpatialRecursivePrefixTreeFieldType ("location_rpt" in example schema), it is
            possible to index a variable number of points per document (and sort on them),
            index not just points but any Spatial4j supported shape such as polygons, and
            to query on these shapes too.  Polygons requires adding JTS to the classpath.
            (David Smiley)
          
          Show
          David Smiley added a comment - The CHANGES.txt was just added as follows: * SOLR-3304: Added Solr adapters for Lucene 4's new spatial module. With SpatialRecursivePrefixTreeFieldType ("location_rpt" in example schema), it is possible to index a variable number of points per document (and sort on them), index not just points but any Spatial4j supported shape such as polygons, and to query on these shapes too. Polygons requires adding JTS to the classpath. (David Smiley)
          Hide
          Ari Maniatis added a comment -

          Early in this thread David Smiley listed a TODO item: "Document how to use it in the wiki". We are having considerable difficulty migrating from Solr 3.5 with SOLR-2155 to this new way of doing things. Any pointers to documentation for the new mechanism would be very much appreciated.

          Show
          Ari Maniatis added a comment - Early in this thread David Smiley listed a TODO item: "Document how to use it in the wiki". We are having considerable difficulty migrating from Solr 3.5 with SOLR-2155 to this new way of doing things. Any pointers to documentation for the new mechanism would be very much appreciated.
          Hide
          Ari Maniatis added a comment -

          I know this is not a support forum, but for your reference, here are the issues our developer is struggling to find Solr4 documentation for:

          1) filter by the distance between course location field and Intersects(Circle) calculated by passed suburb params always returns empty result if we use solr.SpatialRecursivePrefixTreeFieldType filed type (as described in documentation this query and configuration changes should be used to kick off the plugin).
          2) updated org.apache.solr.schema.GeoHashField field type does not support geodist for multivalued fields and I not found the way how we can adjust this type to the same functionality as we have in SOLR-2155 plugin.

          Show
          Ari Maniatis added a comment - I know this is not a support forum, but for your reference, here are the issues our developer is struggling to find Solr4 documentation for: 1) filter by the distance between course location field and Intersects(Circle) calculated by passed suburb params always returns empty result if we use solr.SpatialRecursivePrefixTreeFieldType filed type (as described in documentation this query and configuration changes should be used to kick off the plugin). 2) updated org.apache.solr.schema.GeoHashField field type does not support geodist for multivalued fields and I not found the way how we can adjust this type to the same functionality as we have in SOLR-2155 plugin.
          Hide
          David Smiley added a comment -
          Show
          David Smiley added a comment - Ari; this is documented here: http://wiki.apache.org/solr/SolrAdaptersForLuceneSpatial4
          Hide
          Ari Maniatis added a comment -

          Thanks David. I've asked my developer to follow up on the mailing list with his detailed questions.

          http://apache.markmail.org/thread/af5kg26rdbdhkftw

          Show
          Ari Maniatis added a comment - Thanks David. I've asked my developer to follow up on the mailing list with his detailed questions. http://apache.markmail.org/thread/af5kg26rdbdhkftw
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] David Wayne Smiley
          http://svn.apache.org/viewvc?view=revision&revision=1386475

          SOLR-3304: Added CHANGES.txt entry

          Show
          Commit Tag Bot added a comment - [branch_4x commit] David Wayne Smiley http://svn.apache.org/viewvc?view=revision&revision=1386475 SOLR-3304 : Added CHANGES.txt entry
          Hide
          Commit Tag Bot added a comment -

          [branch_4x commit] David Wayne Smiley
          http://svn.apache.org/viewvc?view=revision&revision=1386466

          SOLR-3304 Solr adapters for the new Lucene 4 spatial module

          Show
          Commit Tag Bot added a comment - [branch_4x commit] David Wayne Smiley http://svn.apache.org/viewvc?view=revision&revision=1386466 SOLR-3304 Solr adapters for the new Lucene 4 spatial module
          Hide
          Uwe Schindler added a comment -

          Closed after release.

          Show
          Uwe Schindler added a comment - Closed after release.

            People

            • Assignee:
              David Smiley
              Reporter:
              Bill Bell
            • Votes:
              7 Vote for this issue
              Watchers:
              16 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development