Solr
  1. Solr
  2. SOLR-6797

Add score=degrees|kilometers|miles for AbstractSpatialFieldType

    Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.0, 6.0
    • Component/s: spatial
    • Labels:
      None

      Description

      Annoyingly, the units="degrees" attribute is required for fields extending AbstractSpatialFieldType (e.g. RPT, BBox). And it doesn't really have any effect. I propose the following:

      • Simply drop the attribute; ignore it if someone sets it to "degrees" (for back-compat).
      • When using score="distance", or score=area or area2D (as seen in BBoxField) then use kilometers if geo=true, otherwise degrees.
      • Add support for score=degrees|kilometers|miles|degrees
      1. SOLR-6797.patch
        62 kB
        David Smiley
      2. SOLR-6797.patch
        58 kB
        David Smiley
      3. SOLR-6797.patch
        36 kB
        Ishan Chattopadhyaya
      4. SOLR-6797.patch
        22 kB
        Ishan Chattopadhyaya
      5. SOLR-6797.patch
        22 kB
        Ishan Chattopadhyaya
      6. SOLR-6797.patch
        8 kB
        Ishan Chattopadhyaya
      7. SOLR-6797.patch
        4 kB
        Ishan Chattopadhyaya

        Activity

        Hide
        David Smiley added a comment -

        The origins of why I added the units attribute was for the coordinate values, which are in degrees. And for consistency, I returned distances in degrees as well.

        Show
        David Smiley added a comment - The origins of why I added the units attribute was for the coordinate values, which are in degrees. And for consistency, I returned distances in degrees as well.
        Hide
        Ishan Chattopadhyaya added a comment -

        David Smiley, I'm wondering how could a user specify score=distance (bullet point 2) and score=miles (bullet point 3), since both seem to use the score parameter?

        Shouldn't we use something like:
        score=distance|area|area2D
        and
        units=degrees|kilometers|miles
        ?

        Show
        Ishan Chattopadhyaya added a comment - David Smiley , I'm wondering how could a user specify score=distance (bullet point 2) and score=miles (bullet point 3), since both seem to use the score parameter? Shouldn't we use something like: score=distance|area|area2D and units=degrees|kilometers|miles ?
        Hide
        David Smiley added a comment -

        If 'units' were named 'distanceUnits' then perhaps, but as-is, 'units' is ambiguous. Units of what? It was units of the coordinate values, which actually remains degrees. My proposal is that score=degrees|kilometers implies distance as the score, plus it allows the application to state the preference in their code & URL parameter and have it be self-documenting to know what the units are, instead of requiring looking at the field type definition to know the units. I think that will be intuitive? This doesn't preclude adding a distanceUnits attribute on the field type to be used as a default for distance|area|area2D, which may be nice; I'm +0 on that (ambivalent).

        Show
        David Smiley added a comment - If 'units' were named 'distanceUnits' then perhaps, but as-is, 'units' is ambiguous. Units of what? It was units of the coordinate values, which actually remains degrees. My proposal is that score=degrees|kilometers implies distance as the score, plus it allows the application to state the preference in their code & URL parameter and have it be self-documenting to know what the units are, instead of requiring looking at the field type definition to know the units. I think that will be intuitive? This doesn't preclude adding a distanceUnits attribute on the field type to be used as a default for distance|area|area2D, which may be nice; I'm +0 on that (ambivalent).
        Hide
        Ishan Chattopadhyaya added a comment -

        That makes sense; more intuitive than a separate units/distanceUnits parameter. Attached a patch that supports score=distance (back compat, km when geo) | kilometers | miles | degrees | area/area2D (km^2 when geo, deg^2 in 2D). Tested manually and seems to work.

        Show
        Ishan Chattopadhyaya added a comment - That makes sense; more intuitive than a separate units/distanceUnits parameter. Attached a patch that supports score=distance (back compat, km when geo) | kilometers | miles | degrees | area/area2D (km^2 when geo, deg^2 in 2D). Tested manually and seems to work.
        Hide
        David Smiley added a comment -

        Thanks for supplying a patch!

        • if score=miles|kilometers then I think we can quite explicitly do that without checking ctx.isGeo. This is what the user asked for, so they know what they are doing.
        • if score=distance, we should have backwards-compatible behavior. If you set units="degrees" on the field type then that's what you get here. I'm unsure if we should otherwise throw an exception, or otherwise let you set units to miles|kilomters or otherwise always choose a default based on ctx.isGeo (km or degrees). My preference is the latter, and we should discourage score=distance generally as it's unclear. But just a preference; I think you've expressed an opinion for the 2nd and that's fine with me too.
        • We should allow "units" to be unset (preferred actually). This is my 1st bullet in the description.
        • in ShapeAreaValueSource, you can pull the computed multiplier of the function so that it's not called each time: (geoArea ? DistanceUtils.DEG_TO_KM * DistanceUtils.DEG_TO_KM: 1.0)

        Tests please?

        Show
        David Smiley added a comment - Thanks for supplying a patch! if score=miles|kilometers then I think we can quite explicitly do that without checking ctx.isGeo. This is what the user asked for, so they know what they are doing. if score=distance, we should have backwards-compatible behavior. If you set units="degrees" on the field type then that's what you get here. I'm unsure if we should otherwise throw an exception, or otherwise let you set units to miles|kilomters or otherwise always choose a default based on ctx.isGeo (km or degrees). My preference is the latter, and we should discourage score=distance generally as it's unclear. But just a preference; I think you've expressed an opinion for the 2nd and that's fine with me too. We should allow "units" to be unset (preferred actually). This is my 1st bullet in the description. in ShapeAreaValueSource, you can pull the computed multiplier of the function so that it's not called each time: (geoArea ? DistanceUtils.DEG_TO_KM * DistanceUtils.DEG_TO_KM: 1.0) Tests please?
        Hide
        Ishan Chattopadhyaya added a comment - - edited

        Thanks for your review, David.
        I've incorporated your suggestions in the attached patch. This unsets and ignores/doesn't use the "units" param (checks for "degrees" for backcompat behaviour), introduces an optional param "distanceUnits" (if someone wants to specify units explicitly).

        I'm still writing the tests, will update the patch soon.

        Show
        Ishan Chattopadhyaya added a comment - - edited Thanks for your review, David. I've incorporated your suggestions in the attached patch. This unsets and ignores/doesn't use the "units" param (checks for "degrees" for backcompat behaviour), introduces an optional param "distanceUnits" (if someone wants to specify units explicitly). I'm still writing the tests, will update the patch soon.
        Hide
        Ishan Chattopadhyaya added a comment -

        Updated patch with tests using a managed schema approach. I wasn't sure where the tests should reside, so went with a separate test class.
        Please review, I hope I didn't miss out anything this time

        Show
        Ishan Chattopadhyaya added a comment - Updated patch with tests using a managed schema approach. I wasn't sure where the tests should reside, so went with a separate test class. Please review, I hope I didn't miss out anything this time
        Hide
        David Smiley added a comment -

        I like this patch Ishan.
        There are a few bits of feedback:

        • in TestBBoxStrategy, you should test a multiplier other than 1.0. Try 2.0.
        • I'm not a fan of allowing attributes to be an empty string, which this code allows (for distanceUnits, line 375). Normalize it or don't allow it (preferable, IMO).

        Wow, this is a nice new test. I like the dynamic schema modifications... probably much easier to test schema stuff than to always have to look at a separate file. What was the template of this code (in other words, what did you copy-paste to get started)? schema-bm25.xml etc. looks very misplaced; please strip it down. BTW, did you notice the existing spatial tests (particularly the new since Solr 4 ones)? They aren't particularly obvious but they are TestSolr4Spatial and TestSolr4Spatial2.

        After addressing the couple items above, I think it'll be ready.

        Show
        David Smiley added a comment - I like this patch Ishan. There are a few bits of feedback: in TestBBoxStrategy, you should test a multiplier other than 1.0. Try 2.0. I'm not a fan of allowing attributes to be an empty string, which this code allows (for distanceUnits, line 375). Normalize it or don't allow it (preferable, IMO). Wow, this is a nice new test. I like the dynamic schema modifications... probably much easier to test schema stuff than to always have to look at a separate file. What was the template of this code (in other words, what did you copy-paste to get started)? schema-bm25.xml etc. looks very misplaced; please strip it down. BTW, did you notice the existing spatial tests (particularly the new since Solr 4 ones)? They aren't particularly obvious but they are TestSolr4Spatial and TestSolr4Spatial2. After addressing the couple items above, I think it'll be ready.
        Hide
        Ishan Chattopadhyaya added a comment -

        Thanks David!

        > What was the template of this code (in other words, what did you copy-paste to get started)?
        I used the TestManagedSchema's testAddFieldThenReload() template.

        > schema-bm25.xml etc. looks very misplaced; please strip it down.
        Cool, I'll have a look!

        > BTW, did you notice the existing spatial tests (particularly the new since Solr 4 ones)?
        > They aren't particularly obvious but they are TestSolr4Spatial and TestSolr4Spatial2.
        I did have a look, and changed the schema used in that test to use distanceUnits instead of units. However, I verified that those tests pass even with older units parameter, so backward compatibility is intact.

        Show
        Ishan Chattopadhyaya added a comment - Thanks David! > What was the template of this code (in other words, what did you copy-paste to get started)? I used the TestManagedSchema's testAddFieldThenReload() template. > schema-bm25.xml etc. looks very misplaced; please strip it down. Cool, I'll have a look! > BTW, did you notice the existing spatial tests (particularly the new since Solr 4 ones)? > They aren't particularly obvious but they are TestSolr4Spatial and TestSolr4Spatial2. I did have a look, and changed the schema used in that test to use distanceUnits instead of units. However, I verified that those tests pass even with older units parameter, so backward compatibility is intact.
        Hide
        Ishan Chattopadhyaya added a comment -

        Updated the patch with your suggestions. Thanks

        Show
        Ishan Chattopadhyaya added a comment - Updated the patch with your suggestions. Thanks
        Hide
        David Smiley added a comment -

        Oh, I forgot something key. maxDistErr needs to reflect the same units as the new distanceUnits you have here. Same with SpatialArgs.distErr. Example/default schema should reflect 1 meter's worth and not depend on degrees. You might want to add a little Units class for this – we've got 3 values.

        Show
        David Smiley added a comment - Oh, I forgot something key. maxDistErr needs to reflect the same units as the new distanceUnits you have here. Same with SpatialArgs.distErr. Example/default schema should reflect 1 meter's worth and not depend on degrees. You might want to add a little Units class for this – we've got 3 values.
        Hide
        David Smiley added a comment -

        FYI it may be non-obvious how to address maxDistErr since that part is actually populated by Lucene spatial's SpatialPrefixTreeFactory. I suggest updating the Map with an updated maxDistErr after doing the conversion at the Solr layer.
        p.s. on IRC I'm "dsmiley"

        Show
        David Smiley added a comment - FYI it may be non-obvious how to address maxDistErr since that part is actually populated by Lucene spatial's SpatialPrefixTreeFactory. I suggest updating the Map with an updated maxDistErr after doing the conversion at the Solr layer. p.s. on IRC I'm "dsmiley"
        Hide
        David Smiley added a comment -

        Ugh, and yet another piece of the codebase affected: SpatialFilterQParser (which parses 'd'). Perhaps the SpatialQueryable interface should have a getSphereRadius() method.

        Show
        David Smiley added a comment - Ugh, and yet another piece of the codebase affected: SpatialFilterQParser (which parses 'd'). Perhaps the SpatialQueryable interface should have a getSphereRadius() method.
        Hide
        Ishan Chattopadhyaya added a comment -

        Here's a patch that addresses some of these issues. Please suggest if I'm on the right track.

        Few things that are yet to do:

        • Example/default schema to reflect 1m of maxDistErr and not depend on degrees.
        • Wasn't sure what use a Units class would have. Although I didn't understand its need, I felt something similar should sit in Spatial4j instead of Solr.
        • Need to write tests for all these conversions done to "d", sphere radius, SpatialArgs.distErr, maxDistErr.
        Show
        Ishan Chattopadhyaya added a comment - Here's a patch that addresses some of these issues. Please suggest if I'm on the right track. Few things that are yet to do: Example/default schema to reflect 1m of maxDistErr and not depend on degrees. Wasn't sure what use a Units class would have. Although I didn't understand its need, I felt something similar should sit in Spatial4j instead of Solr. Need to write tests for all these conversions done to "d", sphere radius, SpatialArgs.distErr, maxDistErr.
        Hide
        David Smiley added a comment -

        You're sort of on the right track with the exception of getting what I mean by a Units (or perhaps better named DistanceUnits) class. I think it's fine for it to be on the Solr side (not Spatial4j). There are several places where you're testing distanceUnits against constants, which is a code smell. Instead you could have an instance of DistanceUnits. The back-compat aspect means there's more to it but it will help. An enum is tempting but someone might want to add their own and we should let them... though there would need to be a hook to parse the string value so some other one could be parsed (maybe simply a protected method on the field type).

        In AbstractSpatialFieldType.getSphereRadius, you were trying to come up with the sphere radius for degrees. That's solving for the radius of this equation: 2*pi*R = 360, which is 180 / pi

        The only other thing I noticed was that I'm not sure I like SpatialOptions.radius being < 0 as a means of saying it's unset. Did you introduce this or was it -1 before but I didn't notice? I suggest a Double type and let it be null.

        Thanks for pushing forward with this one, Ishan.
        p.s. if you like, feel free to fork on GitHub and we can do code reviews there if convenient for you. As the reviewer I found it way better than reviewing diffs to review progress for my GSOC student, and I noticed your patch was a git diff. Alternatively there's reviews.apache.org but it didn't like your git diff file. IntelliJ had no issue with it on my svn checkout.

        Show
        David Smiley added a comment - You're sort of on the right track with the exception of getting what I mean by a Units (or perhaps better named DistanceUnits) class. I think it's fine for it to be on the Solr side (not Spatial4j). There are several places where you're testing distanceUnits against constants, which is a code smell. Instead you could have an instance of DistanceUnits. The back-compat aspect means there's more to it but it will help. An enum is tempting but someone might want to add their own and we should let them... though there would need to be a hook to parse the string value so some other one could be parsed (maybe simply a protected method on the field type). In AbstractSpatialFieldType.getSphereRadius, you were trying to come up with the sphere radius for degrees. That's solving for the radius of this equation: 2*pi*R = 360, which is 180 / pi The only other thing I noticed was that I'm not sure I like SpatialOptions.radius being < 0 as a means of saying it's unset. Did you introduce this or was it -1 before but I didn't notice? I suggest a Double type and let it be null. Thanks for pushing forward with this one, Ishan. p.s. if you like, feel free to fork on GitHub and we can do code reviews there if convenient for you. As the reviewer I found it way better than reviewing diffs to review progress for my GSOC student, and I noticed your patch was a git diff. Alternatively there's reviews.apache.org but it didn't like your git diff file. IntelliJ had no issue with it on my svn checkout.
        Show
        David Smiley added a comment - https://reviews.apache.org/r/29487/
        Hide
        David Smiley added a comment -

        RE test maxDistErr and/or distErr... you might trivially just see if strategy.getDistErrPct and/or SpatialPrefixTree.getMaxLevels() yields what you expect given some particular input.

        Show
        David Smiley added a comment - RE test maxDistErr and/or distErr... you might trivially just see if strategy.getDistErrPct and/or SpatialPrefixTree.getMaxLevels() yields what you expect given some particular input.
        Hide
        David Smiley added a comment -

        Here's a patch, modified from the most recent one you (Ishan) put up on ReviewBoard. I hoped to upload it to the same review ticket but I saw no way to do so... perhaps since you created it only you can? I'd appreciate it it you add this one if for no other reason then to see what my changes were exactly.

        From memory...

        • A test or two failed and it was related to a modification to the schema to use kilometers without correspondingly updated maxDistErr.
        • Some places were checking units.equals("degrees") when in fact the distanceUnits can be used because you've got a back-compat instance.
        • I refactored the units & distanceUnits initialization logic. It should be equivalent, I think. I didn't want parseDistanceUnits concerned with initialization-only logic.
        • Moved DistanceUnits out of the spatial sub-package you made... it was the only class there and I felt it didn't warrant a package. New packages require package level javadocs too. "ant precommit" exposes such things.
        Show
        David Smiley added a comment - Here's a patch, modified from the most recent one you (Ishan) put up on ReviewBoard. I hoped to upload it to the same review ticket but I saw no way to do so... perhaps since you created it only you can? I'd appreciate it it you add this one if for no other reason then to see what my changes were exactly. From memory... A test or two failed and it was related to a modification to the schema to use kilometers without correspondingly updated maxDistErr. Some places were checking units.equals("degrees") when in fact the distanceUnits can be used because you've got a back-compat instance. I refactored the units & distanceUnits initialization logic. It should be equivalent, I think. I didn't want parseDistanceUnits concerned with initialization-only logic. Moved DistanceUnits out of the spatial sub-package you made... it was the only class there and I felt it didn't warrant a package. New packages require package level javadocs too. "ant precommit" exposes such things.
        Hide
        Ishan Chattopadhyaya added a comment -

        Thanks David! Updated the review request with this patch. Have a happy new year!

        Show
        Ishan Chattopadhyaya added a comment - Thanks David! Updated the review request with this patch. Have a happy new year!
        Hide
        David Smiley added a comment -

        The remaining part we forgot about was geodist, as mentioned in the Review Board minutes ago. That's a blocker.

        BTW this is the CHANGES.txt additions I wrote up:

        Upgrading:

        * Spatial fields originating from Solr 4 (e.g. SpatialRecursivePrefixTreeFieldType, BBoxField)
          have the 'units' attribute deprecated, now replaced with 'distanceUnits'.  If you change it to
          a unit other than 'degrees' (or if you don't specify it, which will default to kilometers if
          geo=true), then be sure to update maxDistErr as it's in those units.  If you keep units=degrees
          then it should be backwards compatible but you'll get a deprecation warning on startup.  See
          SOLR-6797.
        

        New Features:

        * SOLR-6797: Spatial fields that used to require units=degrees like
          SpatialRecursivePrefixTreeFieldType (RPT) now don't; it's deprecated.  It's replaced with a
          distanceUnits attribute allowing degrees|kilometers|miles. It affects the result of the
          score=distance|area|area2d mode of queries using this field which will now use these units,
          as well as maxDistErr, distErr and the 'd' parameter referenced by geofilt and geodist.  score
          now supports kilometers|miles|degrees to choose at query time. It does NOT affect distances
          embedded in WKT strings like BUFFER(POINT(200 10),0.2))
          (Ishan Chattopadhyaya, David Smiley)
        
        Show
        David Smiley added a comment - The remaining part we forgot about was geodist, as mentioned in the Review Board minutes ago. That's a blocker. BTW this is the CHANGES.txt additions I wrote up: Upgrading: * Spatial fields originating from Solr 4 (e.g. SpatialRecursivePrefixTreeFieldType, BBoxField) have the 'units' attribute deprecated, now replaced with 'distanceUnits'. If you change it to a unit other than 'degrees' (or if you don't specify it, which will default to kilometers if geo=true), then be sure to update maxDistErr as it's in those units. If you keep units=degrees then it should be backwards compatible but you'll get a deprecation warning on startup. See SOLR-6797. New Features: * SOLR-6797: Spatial fields that used to require units=degrees like SpatialRecursivePrefixTreeFieldType (RPT) now don't; it's deprecated. It's replaced with a distanceUnits attribute allowing degrees|kilometers|miles. It affects the result of the score=distance|area|area2d mode of queries using this field which will now use these units, as well as maxDistErr, distErr and the 'd' parameter referenced by geofilt and geodist. score now supports kilometers|miles|degrees to choose at query time. It does NOT affect distances embedded in WKT strings like BUFFER(POINT(200 10),0.2)) (Ishan Chattopadhyaya, David Smiley)
        Hide
        David Smiley added a comment -

        The attached patch is modified a little from Ishan's last to tweak GeoDistValueSourceParser.SpatialStrategyMultiValueSource (a private inner class) to hold the distanceUnits in addition to the strategy, which is simpler than re-looking up the field type just to get the units.

        I'll commit shortly.

        Show
        David Smiley added a comment - The attached patch is modified a little from Ishan's last to tweak GeoDistValueSourceParser.SpatialStrategyMultiValueSource (a private inner class) to hold the distanceUnits in addition to the strategy, which is simpler than re-looking up the field type just to get the units. I'll commit shortly.
        Hide
        ASF subversion and git services added a comment -

        Commit 1649243 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1649243 ]

        SOLR-6797: spatial distanceUnits=degrees|kilometers|miles
        units=degrees is now deprecated.

        Show
        ASF subversion and git services added a comment - Commit 1649243 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1649243 ] SOLR-6797 : spatial distanceUnits=degrees|kilometers|miles units=degrees is now deprecated.
        Hide
        ASF subversion and git services added a comment -

        Commit 1649245 from David Smiley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1649245 ]

        SOLR-6797: spatial distanceUnits=degrees|kilometers|miles
        units=degrees is now deprecated.

        Show
        ASF subversion and git services added a comment - Commit 1649245 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1649245 ] SOLR-6797 : spatial distanceUnits=degrees|kilometers|miles units=degrees is now deprecated.
        Hide
        David Smiley added a comment -

        Thanks Ishan!

        Show
        David Smiley added a comment - Thanks Ishan!
        Hide
        Anshum Gupta added a comment -

        Bulk close after 5.0 release.

        Show
        Anshum Gupta added a comment - Bulk close after 5.0 release.

          People

          • Assignee:
            David Smiley
            Reporter:
            David Smiley
          • Votes:
            0 Vote for this issue
            Watchers:
            4 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development