Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Major Major
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 5.2
    • Component/s: spatial
    • Labels:
      None

      Description

      We should add a Solr spatial FieldType that uses the new CompositeSpatialStrategy. Essentially this enabled geometry backed shapes for accuracy combined with the grid indexes of RPT for speed.

        Activity

        Hide
        David Smiley added a comment -

        This one will be a tad more interesting than simply extending the abstract spatial field type as it should have these features:

        • reference RPT field type via dynamicField pattern or no asterisk for explicitly referencing an existing field. createField should probably route through the other field Type (like a kind of internal copyField) or maybe it's overkill and could simply call the createField on the underlying Strategy.
          • Come to think of it, BBoxField & PointVector ought to be able to work similarly (reference field completely instead of by pattern) but that's not part of this issue.
          • note: SDV (SerializedDVStrategy) probably doesn't deserve it's own field type but we can internally create one.
        • SerializedDVStrategy could be subclassed to override makeShapeValueSource to be backed by a SolrCache.
          • We could cast to JtsGeometry if possible to call index() which uses JTS PreparedGeometry to speedup the checks. When it first puts it into the cache it might not do this but on successfull fetch from the cache it could do this – which has no effect if it's already cached. It's thread-safe.
        Show
        David Smiley added a comment - This one will be a tad more interesting than simply extending the abstract spatial field type as it should have these features: reference RPT field type via dynamicField pattern or no asterisk for explicitly referencing an existing field. createField should probably route through the other field Type (like a kind of internal copyField) or maybe it's overkill and could simply call the createField on the underlying Strategy. Come to think of it, BBoxField & PointVector ought to be able to work similarly (reference field completely instead of by pattern) but that's not part of this issue. note: SDV (SerializedDVStrategy) probably doesn't deserve it's own field type but we can internally create one. SerializedDVStrategy could be subclassed to override makeShapeValueSource to be backed by a SolrCache. We could cast to JtsGeometry if possible to call index() which uses JTS PreparedGeometry to speedup the checks. When it first puts it into the cache it might not do this but on successfull fetch from the cache it could do this – which has no effect if it's already cached. It's thread-safe.
        Hide
        David Smiley added a comment -

        Here's a patch. Notes:

        • I chose the name RptWithGeometrySpatialField; feedback welcome. It inherits the same schema attribute options as the RPT field but strictly speaking doesn't subclass that field type.
          • I overrode the default distErrPct at indexing time to be 0.15.
        • Compatibility with heatmaps.
        • Uses a SolrCache if you define one in solrconfig.xml.
        • Includes some getters on Lucene spatial's CompositeSpatialStrategy.

        I was tempted, and attempted to subclass the Rpt field type which would have made the initialization less error prone & simple, and would have made heatmap compatibility work without issue. But it started becoming an ugly hack. The approach in this patch is perhaps a hack in that it contains another fieldType and deals with some initialization quirks in init(); but there isn't much to it. Another option is to do like BBoxField's component numeric fields... though I don't love that it requires more definitions for the user to make in the schema. But maybe that's a better trade-off, all things considered (it wouldn't have required the modification to heatmap here).

        The cache is very interesting. Typically, a SolrCache gets blown away on every commit. But using a NoOpRegenerator, it will effectively get re-instated. But that can only be used for caching certain types of things and may require the code using the cache to facilitate this working – so don't expect it to work on the FilterCache, for example. The trick I do here is a special key to the cache that is comprised of a weak reference to a LeafReader segment core key, plus the segment-local docId. Unfortunately these cache entries won't eagerly clean themselves up if the segment becomes unreachable; however, it shouldn't stick around long if an LRU cache is used, since those entries won't be used again. The cache should be configured similar to the following, assuming a hypothetical field named "geom":

        <cache name="perSegSpatialFieldCache_geom"
                   class="solr.LRUCache"
                   size="256"
                   initialSize="0"
                   autowarmCount="100%"
                   regenerator="solr.NoOpRegenerator"/>
        

        The 2nd and subsequent successful cache lookups will be the fastest for polygons in particular, since on the 1st cache hit, JtsGeometry.index() is called on it (if it is of that type).

        Show
        David Smiley added a comment - Here's a patch. Notes: I chose the name RptWithGeometrySpatialField; feedback welcome. It inherits the same schema attribute options as the RPT field but strictly speaking doesn't subclass that field type. I overrode the default distErrPct at indexing time to be 0.15. Compatibility with heatmaps. Uses a SolrCache if you define one in solrconfig.xml. Includes some getters on Lucene spatial's CompositeSpatialStrategy. I was tempted, and attempted to subclass the Rpt field type which would have made the initialization less error prone & simple, and would have made heatmap compatibility work without issue. But it started becoming an ugly hack. The approach in this patch is perhaps a hack in that it contains another fieldType and deals with some initialization quirks in init(); but there isn't much to it. Another option is to do like BBoxField's component numeric fields... though I don't love that it requires more definitions for the user to make in the schema. But maybe that's a better trade-off, all things considered (it wouldn't have required the modification to heatmap here). The cache is very interesting. Typically, a SolrCache gets blown away on every commit. But using a NoOpRegenerator, it will effectively get re-instated. But that can only be used for caching certain types of things and may require the code using the cache to facilitate this working – so don't expect it to work on the FilterCache, for example. The trick I do here is a special key to the cache that is comprised of a weak reference to a LeafReader segment core key, plus the segment-local docId. Unfortunately these cache entries won't eagerly clean themselves up if the segment becomes unreachable; however, it shouldn't stick around long if an LRU cache is used, since those entries won't be used again. The cache should be configured similar to the following, assuming a hypothetical field named "geom": <cache name= "perSegSpatialFieldCache_geom" class= "solr.LRUCache" size= "256" initialSize= "0" autowarmCount= "100%" regenerator= "solr.NoOpRegenerator" /> The 2nd and subsequent successful cache lookups will be the fastest for polygons in particular, since on the 1st cache hit, JtsGeometry.index() is called on it (if it is of that type).
        Hide
        David Smiley added a comment -

        I'll commit this tonight. I'll classify this feature as experimental so that it can get more usage.

        Show
        David Smiley added a comment - I'll commit this tonight. I'll classify this feature as experimental so that it can get more usage.
        Hide
        Viktor Gal added a comment -

        i'll try to give it a go today...

        Show
        Viktor Gal added a comment - i'll try to give it a go today...
        Hide
        ASF subversion and git services added a comment -

        Commit 1680862 from David Smiley in branch 'dev/trunk'
        [ https://svn.apache.org/r1680862 ]

        SOLR-7379: Spatial RptWithGeometrySpatialField (based on CompositeSpatialStrategy)

        Show
        ASF subversion and git services added a comment - Commit 1680862 from David Smiley in branch 'dev/trunk' [ https://svn.apache.org/r1680862 ] SOLR-7379 : Spatial RptWithGeometrySpatialField (based on CompositeSpatialStrategy)
        Hide
        ASF subversion and git services added a comment -

        Commit 1680869 from David Smiley in branch 'dev/branches/branch_5x'
        [ https://svn.apache.org/r1680869 ]

        SOLR-7379: Spatial RptWithGeometrySpatialField (based on CompositeSpatialStrategy)

        Show
        ASF subversion and git services added a comment - Commit 1680869 from David Smiley in branch 'dev/branches/branch_5x' [ https://svn.apache.org/r1680869 ] SOLR-7379 : Spatial RptWithGeometrySpatialField (based on CompositeSpatialStrategy)
        Hide
        David Smiley added a comment -

        Thanks Viktor Gal for kicking the tires.

        Show
        David Smiley added a comment - Thanks Viktor Gal for kicking the tires.
        Hide
        Anshum Gupta added a comment -

        Bulk close for 5.2.0.

        Show
        Anshum Gupta added a comment - Bulk close for 5.2.0.

          People

          • Assignee:
            David Smiley
            Reporter:
            David Smiley
          • Votes:
            0 Vote for this issue
            Watchers:
            5 Start watching this issue

            Dates

            • Created:
              Updated:
              Resolved:

              Development