Details

    • Type: Improvement Improvement
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 1.5, 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Per SOLR-773, create field types that hid the details of creating tiers, geohash and lat/lon fields.

      Fields should take in lat/lon points in a single form, as in:
      <field name="foo">lat lon</field>

      1. examplegeopointdoc.patch.txt
        1 kB
        Chris A. Mattmann
      2. SOLR-1586.Mattmann.112209.geopointonly.patch.txt
        6 kB
        Chris A. Mattmann
      3. SOLR-1586.Mattmann.112209.geopointonly.patch.txt
        5 kB
        Chris A. Mattmann
      4. SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt
        13 kB
        Chris A. Mattmann
      5. SOLR-1586.Mattmann.112409.geopointandgeohash.patch.txt
        12 kB
        Chris A. Mattmann
      6. SOLR-1586.Mattmann.112509.geopointandgeohash.patch.txt
        13 kB
        Chris A. Mattmann
      7. SOLR-1586.Mattmann.120709.geohashonly.patch.txt
        10 kB
        Chris A. Mattmann
      8. SOLR-1586.Mattmann.121209.geohash.outarr.patch.txt
        10 kB
        Chris A. Mattmann
      9. SOLR-1586.Mattmann.121209.geohash.outstr.patch.txt
        9 kB
        Chris A. Mattmann
      10. SOLR-1586.Mattmann.122609.patch.txt
        34 kB
        Chris A. Mattmann
      11. SOLR-1586.patch
        32 kB
        Grant Ingersoll
      12. SOLR-1586.patch
        31 kB
        Grant Ingersoll
      13. SOLR-1586-geohash.patch
        9 kB
        Grant Ingersoll

        Issue Links

          Activity

          Hide
          Chris A. Mattmann added a comment -

          For reference, I'm going to put up a patch (or series of them) on this issue that implements Ryan McKinley's suggestion (from SOLR-773):

          It would be great if the schema field type could define everything needed to index and search. There are (at least) three approaches to indexing points that each have their advantages (and disadvantages) - we should be able to support any of these options.

          • GeoPointField (abstract? the standard stuff about dealing with points)
            • GeoPointFieldHash (represented as a GeoHash, fast bounds query (with limited accuracy))
            • GeoPointFieldTiers (highly scalable, fast, complex)
              • GeoPointFieldTrie (...)
          • GeoLineField...
          • GeoPolygonField...

          I think it makes sense to try to follow the georss format to represent geometry:

          <georss:point>45.256 -71.92</georss:point>
          
          <georss:line>45.256 -110.45 46.46 -109.48 43.84 -109.86</georss:line>
          
          <georss:polygon>
          	45.256 -110.45 46.46 -109.48 43.84 -109.86 45.256 -110.45
          </georss:polygon>
          
          <georss:box>42.943 -71.032 43.039 -69.856</georss:box>
          
          Show
          Chris A. Mattmann added a comment - For reference, I'm going to put up a patch (or series of them) on this issue that implements Ryan McKinley's suggestion (from SOLR-773 ): It would be great if the schema field type could define everything needed to index and search. There are (at least) three approaches to indexing points that each have their advantages (and disadvantages) - we should be able to support any of these options. GeoPointField (abstract? the standard stuff about dealing with points) GeoPointFieldHash (represented as a GeoHash, fast bounds query (with limited accuracy)) GeoPointFieldTiers (highly scalable, fast, complex) GeoPointFieldTrie (...) GeoLineField... GeoPolygonField... I think it makes sense to try to follow the georss format to represent geometry: <georss:point>45.256 -71.92</georss:point> <georss:line>45.256 -110.45 46.46 -109.48 43.84 -109.86</georss:line> <georss:polygon> 45.256 -110.45 46.46 -109.48 43.84 -109.86 45.256 -110.45 </georss:polygon> <georss:box>42.943 -71.032 43.039 -69.856</georss:box>
          Hide
          Grant Ingersoll added a comment -

          Sounds good, but how are you going to deal with the field types that need multiple fields (i.e. SOLR-1131)?

          We certainly could put up a GeohashField to get things started.

          Show
          Grant Ingersoll added a comment - Sounds good, but how are you going to deal with the field types that need multiple fields (i.e. SOLR-1131 )? We certainly could put up a GeohashField to get things started.
          Hide
          Chris A. Mattmann added a comment -

          Hey Grant:

          Sounds good, but how are you going to deal with the field types that need multiple fields (i.e. SOLR-1131)?

          Heh, I wasn't. I was just starting with GeoPointField, and was going to start indexing it as e.g., a single String value, in georss:point format. I thought about the whole 2-field approach, i.e., to do a double lat, double lon thingeee, but I just wanted to start simple, with what exists, and see where it leads me. Sound OK?

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - Hey Grant: Sounds good, but how are you going to deal with the field types that need multiple fields (i.e. SOLR-1131 )? Heh, I wasn't. I was just starting with GeoPointField, and was going to start indexing it as e.g., a single String value, in georss:point format. I thought about the whole 2-field approach, i.e., to do a double lat, double lon thingeee, but I just wanted to start simple, with what exists, and see where it leads me. Sound OK? Cheers, Chris
          Hide
          Grant Ingersoll added a comment -

          I'm not sure what good that does to put a lat/lon in a single String in georss:point format. What's your intent for searching/sorting/faceting?

          Show
          Grant Ingersoll added a comment - I'm not sure what good that does to put a lat/lon in a single String in georss:point format. What's your intent for searching/sorting/faceting?
          Hide
          Chris A. Mattmann added a comment -

          Good question: I'm not sure what my intent is yet either. Sorting on a lat,lon pair is different depending on which lines you are trying to follow (North, South, etc.), so it's not entirely clear to me the best way to do that. As far as searching my guess is that, at least in the beginning, like requiring the user to input Dates in ISO 8601, maybe we start out asking the users to input georss points, but then get smarter as we understand more. Dunno, just a thought.

          Show
          Chris A. Mattmann added a comment - Good question: I'm not sure what my intent is yet either. Sorting on a lat,lon pair is different depending on which lines you are trying to follow (North, South, etc.), so it's not entirely clear to me the best way to do that. As far as searching my guess is that, at least in the beginning, like requiring the user to input Dates in ISO 8601, maybe we start out asking the users to input georss points, but then get smarter as we understand more. Dunno, just a thought.
          Hide
          Grant Ingersoll added a comment -

          I'd say wait until SOLR-1131 is done for everything other than the GeohashFieldType, as what you are proposing doesn't get you anything over just using StrField. By all means, put up a patch for GeohashFieldType when you have. We can commit that now.

          Show
          Grant Ingersoll added a comment - I'd say wait until SOLR-1131 is done for everything other than the GeohashFieldType, as what you are proposing doesn't get you anything over just using StrField. By all means, put up a patch for GeohashFieldType when you have. We can commit that now.
          Hide
          Chris A. Mattmann added a comment -

          Hey Grant:

          what you are proposing doesn't get you anything over just using StrField.

          I sort of get this, but then I don't. The outlier is DateField – what does it get you other than some magic around ensuring that dates are stored as ISO 8601 dates? In the end, it's just a special type of StrField though too, right? Not trying to be difficult, just trying to understand.

          In any case, I'll focus on GeoHashFieldType for now, regardless...

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - Hey Grant: what you are proposing doesn't get you anything over just using StrField. I sort of get this, but then I don't. The outlier is DateField – what does it get you other than some magic around ensuring that dates are stored as ISO 8601 dates? In the end, it's just a special type of StrField though too, right? Not trying to be difficult, just trying to understand. In any case, I'll focus on GeoHashFieldType for now, regardless... Cheers, Chris
          Hide
          Chris A. Mattmann added a comment -

          I'm linking this to SOLR-1592 since regardless of how we store the spatial point field types, we should have the ability to output those fields as georss per ryan's suggestion.

          Show
          Chris A. Mattmann added a comment - I'm linking this to SOLR-1592 since regardless of how we store the spatial point field types, we should have the ability to output those fields as georss per ryan's suggestion.
          Hide
          Chris A. Mattmann added a comment -
          • patch only includes GeoField, and doesn't do any fancy multi-field stuff yet, as discussed. However, the parts about writing its output in georss is probably still useable in its current form. Note to test this patch, post.sh the example doc I'm attaching to your solr instance using the mods to the example schema I've attached. Then, do a default solr query to get the doc back, and then observe a georss:point field coming back
          Show
          Chris A. Mattmann added a comment - patch only includes GeoField, and doesn't do any fancy multi-field stuff yet, as discussed. However, the parts about writing its output in georss is probably still useable in its current form. Note to test this patch, post.sh the example doc I'm attaching to your solr instance using the mods to the example schema I've attached. Then, do a default solr query to get the doc back, and then observe a georss:point field coming back
          Hide
          Chris A. Mattmann added a comment -
          • note check out this patch: I messed up the copyrights on the other one I included (Eclipse threw my work copyright ones in there rather than Apache)...
          Show
          Chris A. Mattmann added a comment - note check out this patch: I messed up the copyrights on the other one I included (Eclipse threw my work copyright ones in there rather than Apache)...
          Hide
          Grant Ingersoll added a comment -

          Hey Chris,

          I'm not sure we want to bring in the actual namespace for georss. That seems like overkill, but I'm open to hear what others think.

          Show
          Grant Ingersoll added a comment - Hey Chris, I'm not sure we want to bring in the actual namespace for georss. That seems like overkill, but I'm open to hear what others think.
          Hide
          Grant Ingersoll added a comment -

          Also, where does this patch actually encode the Geohash value? The Lucene spatial contrib JAR has GeoHashUtils for just this. See the GeohashFunction for usage.

          Show
          Grant Ingersoll added a comment - Also, where does this patch actually encode the Geohash value? The Lucene spatial contrib JAR has GeoHashUtils for just this. See the GeohashFunction for usage.
          Hide
          Chris A. Mattmann added a comment -

          Hey Grant:

          It doesn't encode the geohash, I was working on that. What's hilarious is that I was reading up on Wikipedia on how to implement Geohash: http://en.wikipedia.org/wiki/Geohash. I noted that it needed a Base32 encoder/decoder as part of this. So, of course I went over to commons-codec and looked for it there: http://commons.apache.org/codec/. I saw CODEC-88 and said oh, no one has implemented an ASL base32 encoder: I guess I'll implement one as part of this issue and then contribute it back to commons-codec. However if you are saying this exists already in the spatial contrib jar, acccck!!!! What's more if that implements the whole GeoHash thing then double acccck!

          I'll have a patch up in 30 minutes if that's the case. However, if it is the case, then I'm sad because I just got my Base32.encode function to work: http://tools.ietf.org/html/rfc3548

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - Hey Grant: It doesn't encode the geohash, I was working on that. What's hilarious is that I was reading up on Wikipedia on how to implement Geohash: http://en.wikipedia.org/wiki/Geohash . I noted that it needed a Base32 encoder/decoder as part of this. So, of course I went over to commons-codec and looked for it there: http://commons.apache.org/codec/ . I saw CODEC-88 and said oh, no one has implemented an ASL base32 encoder: I guess I'll implement one as part of this issue and then contribute it back to commons-codec. However if you are saying this exists already in the spatial contrib jar, acccck!!!! What's more if that implements the whole GeoHash thing then double acccck! I'll have a patch up in 30 minutes if that's the case. However, if it is the case, then I'm sad because I just got my Base32.encode function to work: http://tools.ietf.org/html/rfc3548 Cheers, Chris
          Hide
          Chris A. Mattmann added a comment -

          sniffle, found the geohash in spatial contrib Uh uh I mean, yes!

          Patch, attached...

          Show
          Chris A. Mattmann added a comment - sniffle, found the geohash in spatial contrib Uh uh I mean, yes! Patch, attached...
          Hide
          Chris A. Mattmann added a comment -

          another Eclipse copyright snafu (I took it out back though and I don't think this will be happening again), please use this latest patch...

          Show
          Chris A. Mattmann added a comment - another Eclipse copyright snafu (I took it out back though and I don't think this will be happening again), please use this latest patch...
          Hide
          Chris A. Mattmann added a comment -
          • updated patch based on SOLR-1592 being committed...
          Show
          Chris A. Mattmann added a comment - updated patch based on SOLR-1592 being committed...
          Hide
          Grant Ingersoll added a comment -

          FYI, see the SOLR-1131 for an implementation of a Point Field Type.

          Show
          Grant Ingersoll added a comment - FYI, see the SOLR-1131 for an implementation of a Point Field Type.
          Hide
          Grant Ingersoll added a comment -

          we should have the ability to output those fields as georss per ryan's suggestion

          Ryan can correct me if I am putting words in his mouth, but I don't think he literally meant we needed to use those exact tags. I think he just meant the format of the actual values.

          Show
          Grant Ingersoll added a comment - we should have the ability to output those fields as georss per ryan's suggestion Ryan can correct me if I am putting words in his mouth, but I don't think he literally meant we needed to use those exact tags. I think he just meant the format of the actual values.
          Hide
          Chris A. Mattmann added a comment -

          Hey Grant:

          Ryan can correct me if I am putting words in his mouth, but I don't think he literally meant we needed to use those exact tags. I think he just meant the format of the actual values.

          Ah no worries – I think it would be a nice feature to actual output using those exact tags. That's the point of a standard, right? With the tags comes namespacing and all that good stuff, which I believe to be important.

          Also, since XmlWriter is even more flexible per SOLR-1592, then I see no reason not to use those tags in the output?

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - Hey Grant: Ryan can correct me if I am putting words in his mouth, but I don't think he literally meant we needed to use those exact tags. I think he just meant the format of the actual values. Ah no worries – I think it would be a nice feature to actual output using those exact tags. That's the point of a standard, right? With the tags comes namespacing and all that good stuff, which I believe to be important. Also, since XmlWriter is even more flexible per SOLR-1592 , then I see no reason not to use those tags in the output? Cheers, Chris
          Hide
          Chris A. Mattmann added a comment -

          FYI, see the SOLR-1131 for an implementation of a Point Field Type.

          Sure, I'll take a look @ it and try to bring this patch up to speed w.r.t to that. Independently though, the geohash implementation i put up should be good to go right now. Please take a look and let me know if you are +1 to commit. I included an example doc to test it out with.

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - FYI, see the SOLR-1131 for an implementation of a Point Field Type. Sure, I'll take a look @ it and try to bring this patch up to speed w.r.t to that. Independently though, the geohash implementation i put up should be good to go right now. Please take a look and let me know if you are +1 to commit. I included an example doc to test it out with. Cheers, Chris
          Hide
          Grant Ingersoll added a comment -

          Can you put a patch containing just the geohash stuff?

          Show
          Grant Ingersoll added a comment - Can you put a patch containing just the geohash stuff?
          Hide
          Chris A. Mattmann added a comment -

          updated patch containing only the geohash goodies.

          Show
          Chris A. Mattmann added a comment - updated patch containing only the geohash goodies.
          Hide
          Chris A. Mattmann added a comment -

          Okay, so I gave up on outputting georss in the SOLRXmlResponse (sniffle). Instead, here's the 1st of 2 patches. This one outputs the point as a double array. I'm torn. It's probably more conceptually correct, but it's weirder from a I put in a string delimited by a whitespace and got out a point as an array. Nevertheless, I'm attaching it. Next one will just be a string.

          Show
          Chris A. Mattmann added a comment - Okay, so I gave up on outputting georss in the SOLRXmlResponse ( sniffle ). Instead, here's the 1st of 2 patches. This one outputs the point as a double array. I'm torn. It's probably more conceptually correct, but it's weirder from a I put in a string delimited by a whitespace and got out a point as an array. Nevertheless, I'm attaching it. Next one will just be a string.
          Hide
          Chris A. Mattmann added a comment -

          , and #2, the string version. My +1 for this in the end.

          Show
          Chris A. Mattmann added a comment - , and #2, the string version. My +1 for this in the end.
          Hide
          Grant Ingersoll added a comment -

          I committed PointType as part of SOLR-1131. This leaves the geohash stuff, which I take a look at now.

          Show
          Grant Ingersoll added a comment - I committed PointType as part of SOLR-1131 . This leaves the geohash stuff, which I take a look at now.
          Hide
          Grant Ingersoll added a comment -

          Here's a patch for Geohash along w/ tests and support in the examples.

          Show
          Grant Ingersoll added a comment - Here's a patch for Geohash along w/ tests and support in the examples.
          Hide
          Chris A. Mattmann added a comment -

          Grant:

          Thanks! +1 on the patch – I think it's pretty much ready to go.

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - Grant: Thanks! +1 on the patch – I think it's pretty much ready to go. Cheers, Chris
          Hide
          Grant Ingersoll added a comment -

          Should have a CartesianTier field type patch today.

          Show
          Grant Ingersoll added a comment - Should have a CartesianTier field type patch today.
          Hide
          Chris Male added a comment -

          Hi Grant,

          Are you building the CartesianTier field type against the existing CartesianTier API?

          Show
          Chris Male added a comment - Hi Grant, Are you building the CartesianTier field type against the existing CartesianTier API?
          Hide
          Grant Ingersoll added a comment -

          For better or worse, yes. It's either that, or it needs to be duplicated here until Solr is on 3.x of Lucene and can incorporate your changes there.

          Show
          Grant Ingersoll added a comment - For better or worse, yes. It's either that, or it needs to be duplicated here until Solr is on 3.x of Lucene and can incorporate your changes there.
          Hide
          Grant Ingersoll added a comment -

          Here's a patch with both geohash and Cartesian Tier.

          Note, the test for Cartesian Tier (in PolyFieldTest) is not yet correct even though I think the underlying functionality is. (In other words, the test itself is not right).

          Comments/improvements welcome. Still needs javadocs (and wiki docs on http://wiki.apache.org/solr/SpatialSearch).

          One of the interesting things for Cart Tier is what the notion of a field query and range query are. See my thoughts in the comments. Also, I currently am throwing an UnsupportedOpException in getValueSource in Cart Tier stuff. Not sure if it is meaningful or not to allow functions to operate on the whole tier.

          Show
          Grant Ingersoll added a comment - Here's a patch with both geohash and Cartesian Tier. Note, the test for Cartesian Tier (in PolyFieldTest) is not yet correct even though I think the underlying functionality is. (In other words, the test itself is not right). Comments/improvements welcome. Still needs javadocs (and wiki docs on http://wiki.apache.org/solr/SpatialSearch ). One of the interesting things for Cart Tier is what the notion of a field query and range query are. See my thoughts in the comments. Also, I currently am throwing an UnsupportedOpException in getValueSource in Cart Tier stuff. Not sure if it is meaningful or not to allow functions to operate on the whole tier.
          Hide
          Chris A. Mattmann added a comment -
          • updated Grant's patch with more javadocs
          • formatting updates
          • fixed bug about referencing StrFieldSource via SOLR-1688

          I get errors on the following tests:

          [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 2.473 sec
          [junit] Test org.apache.solr.search.function.distance.DistanceFunctionTest FAILED
          [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 6.32 sec
          [junit] Test org.apache.solr.schema.PolyFieldTest FAILED

          Which I think Grant noted in his prior comment.

          Show
          Chris A. Mattmann added a comment - updated Grant's patch with more javadocs formatting updates fixed bug about referencing StrFieldSource via SOLR-1688 I get errors on the following tests: [junit] Tests run: 2, Failures: 1, Errors: 0, Time elapsed: 2.473 sec [junit] Test org.apache.solr.search.function.distance.DistanceFunctionTest FAILED [junit] Tests run: 5, Failures: 1, Errors: 0, Time elapsed: 6.32 sec [junit] Test org.apache.solr.schema.PolyFieldTest FAILED Which I think Grant noted in his prior comment.
          Hide
          Grant Ingersoll added a comment -

          The DistanceFunctionTest failure is unrelated and has been fixed. The PolyFieldTest failure is expected.

          Show
          Grant Ingersoll added a comment - The DistanceFunctionTest failure is unrelated and has been fixed. The PolyFieldTest failure is expected.
          Hide
          Grant Ingersoll added a comment -

          fixed bug about referencing StrFieldSource via SOLR-1688

          What's the bug here?

          Show
          Grant Ingersoll added a comment - fixed bug about referencing StrFieldSource via SOLR-1688 What's the bug here?
          Hide
          Grant Ingersoll added a comment -

          The problem w/ the PolyFieldTest is that the CartesianShapeFilterbuilder automatically picks the "best fit" tier w/o knowing what tiers were actually indexed. So, it picks tier 15 when the test only sets up tiers 4-10.

          Show
          Grant Ingersoll added a comment - The problem w/ the PolyFieldTest is that the CartesianShapeFilterbuilder automatically picks the "best fit" tier w/o knowing what tiers were actually indexed. So, it picks tier 15 when the test only sets up tiers 4-10.
          Hide
          Yonik Seeley added a comment -

          One of the interesting things for Cart Tier is what the notion of a field query and range query are

          Hmmm, why is there even a fieldType for cartesian tier?
          How will it be used? I don't see any examples of end-user config and query syntax this is meant to support.

          Show
          Yonik Seeley added a comment - One of the interesting things for Cart Tier is what the notion of a field query and range query are Hmmm, why is there even a fieldType for cartesian tier? How will it be used? I don't see any examples of end-user config and query syntax this is meant to support.
          Hide
          Chris A. Mattmann added a comment - - edited

          Hi Grant:

          What's the bug here?

          Your last patch referenced StrFieldSource in the GeoHashField class we developed:

          +  @Override
          +  public ValueSource getValueSource(SchemaField field, QParser parser) {
          +    return new StrFieldSource(field.name);
          +  }
          +
          

          Of course, StrFieldSource is a private class defined in StrField in the o.a.solr.schema package. This led me to take a look and realize that the FieldSources are really defined inconsistently (check out SOLR-1688, patch available).

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - - edited Hi Grant: What's the bug here? Your last patch referenced StrFieldSource in the GeoHashField class we developed: + @Override + public ValueSource getValueSource(SchemaField field, QParser parser) { + return new StrFieldSource(field.name); + } + Of course, StrFieldSource is a private class defined in StrField in the o.a.solr.schema package. This led me to take a look and realize that the FieldSources are really defined inconsistently (check out SOLR-1688 , patch available). Cheers, Chris
          Hide
          Grant Ingersoll added a comment -

          How will it be used?

          It's a common GIS technique for reducing the number of terms/space to enumerate in a single field by dynamically selecting the appropriate tier based on the lat/lon input and a distance. It's mainly used for creating filters.

          Once SOLR-1568 is converted to use this field type, then it will be fully supported.

          Show
          Grant Ingersoll added a comment - How will it be used? It's a common GIS technique for reducing the number of terms/space to enumerate in a single field by dynamically selecting the appropriate tier based on the lat/lon input and a distance. It's mainly used for creating filters. Once SOLR-1568 is converted to use this field type, then it will be fully supported.
          Hide
          Grant Ingersoll added a comment -

          Of course, StrFieldSource is a private class

          No it's not. It's package private. Is there something that isn't working?

          Show
          Grant Ingersoll added a comment - Of course, StrFieldSource is a private class No it's not. It's package private. Is there something that isn't working?
          Hide
          Chris A. Mattmann added a comment -

          No it's not. It's package private. Is there something that isn't working?

          Interesting. Eclipse was giving me an error on using StrFieldSource in GeoHashField, right on that line I pasted above. It said StrFieldSource not found, and so taking a look it looked like StrFieldSource was an inner class (hard to see that little "}" at the end and whether it was defined inside of StrField or not). So I thought you were referencing an inner (private) class from StrField external to it. Funny now though after an Eclipse rebuild, Eclispe seems to be OK with StrFieldSource and its referencing in GeoHashField (which it should since they are both in the same package).

          Regardless though, this illustrates my point on SOLR-1688 – these FieldCacheSources should be defined a bit more consistently – when looking at a bunch of code, it's hard to see whether it was an inner class or a separate class defined in the same java file.

          Show
          Chris A. Mattmann added a comment - No it's not. It's package private. Is there something that isn't working? Interesting. Eclipse was giving me an error on using StrFieldSource in GeoHashField, right on that line I pasted above. It said StrFieldSource not found, and so taking a look it looked like StrFieldSource was an inner class (hard to see that little "}" at the end and whether it was defined inside of StrField or not). So I thought you were referencing an inner (private) class from StrField external to it. Funny now though after an Eclipse rebuild, Eclispe seems to be OK with StrFieldSource and its referencing in GeoHashField (which it should since they are both in the same package). Regardless though, this illustrates my point on SOLR-1688 – these FieldCacheSources should be defined a bit more consistently – when looking at a bunch of code, it's hard to see whether it was an inner class or a separate class defined in the same java file.
          Hide
          Yonik Seeley added a comment -

          > How will it be used?

          It's a common GIS technique

          I meant as it pertains to Solr... what will one put in their schema and then what will an example query look like that does both a filter and a sort by distance? Or is that out of scope for this issue?

          Show
          Yonik Seeley added a comment - > How will it be used? It's a common GIS technique I meant as it pertains to Solr... what will one put in their schema and then what will an example query look like that does both a filter and a sort by distance? Or is that out of scope for this issue?
          Hide
          Grant Ingersoll added a comment -

          There is an example of what goes in the schema on this patch:

          <!--
                A CartesianTier is like a set of zoom levels on an interactive map (i.e. Google Maps or MapQuest).  It takes a lat/lon
                field and indexes it into (endTier - startTier) different fields, each representing a different zoom level.
                This can then be leveraged to quickly narrow the search space by creating a filter, at an appropriate tier level,
                that only has to enumerate a minimum number of terms.
          
                See http://wiki.apache.org/solr/SpatialSearch
               -->
              <fieldType name="tier" class="solr.CartesianTierField" start="4" end="15" subFieldSuffix="_d"/>
          

          I think the filter question is best answered on SOLR-1568, but I'll give a brief thought. Something like:

          &fq={!tier dist=20}location:49.32,-79.0
          

          or it could be:

          &fq={!tier lat=49.32 lat=-79.0 dist=20}
          

          I'm not sure which I prefer.

          Show
          Grant Ingersoll added a comment - There is an example of what goes in the schema on this patch: <!-- A CartesianTier is like a set of zoom levels on an interactive map (i.e. Google Maps or MapQuest). It takes a lat/lon field and indexes it into (endTier - startTier) different fields, each representing a different zoom level. This can then be leveraged to quickly narrow the search space by creating a filter, at an appropriate tier level, that only has to enumerate a minimum number of terms. See http: //wiki.apache.org/solr/SpatialSearch --> <fieldType name= "tier" class= "solr.CartesianTierField" start= "4" end= "15" subFieldSuffix= "_d" /> I think the filter question is best answered on SOLR-1568 , but I'll give a brief thought. Something like: &fq={!tier dist=20}location:49.32,-79.0 or it could be: &fq={!tier lat=49.32 lat=-79.0 dist=20} I'm not sure which I prefer.
          Hide
          Grant Ingersoll added a comment -

          Renamed CartesianTierFieldType to SpatialTileField and renamed the other nomenclature to be called a SpatialTileField as I think the "tile" name is much more commonly used in the GIS communities.

          Show
          Grant Ingersoll added a comment - Renamed CartesianTierFieldType to SpatialTileField and renamed the other nomenclature to be called a SpatialTileField as I think the "tile" name is much more commonly used in the GIS communities.
          Hide
          Yonik Seeley added a comment -

          The reason why I was asking about interface examples is that it looks like filtering is being considered separate (i.e. it would be up to the user to correlate the point field with the tile field). While it's fine to allow the explicit creation of a tile filter, it doesn't seem like we should require clients to know all the details.

          #gfilt short for geo-filter?
          q=foo&fq=

          {!gfilt p=10,20 f=store_location, d=1000}

          &sort=gdist(store_location,10,20)

          So it would be really nice if the same request would work regardless of which point field was being used (trie based, spacial tile, or geohash).

          Show
          Yonik Seeley added a comment - The reason why I was asking about interface examples is that it looks like filtering is being considered separate (i.e. it would be up to the user to correlate the point field with the tile field). While it's fine to allow the explicit creation of a tile filter, it doesn't seem like we should require clients to know all the details. #gfilt short for geo-filter? q=foo&fq= {!gfilt p=10,20 f=store_location, d=1000} &sort=gdist(store_location,10,20) So it would be really nice if the same request would work regardless of which point field was being used (trie based, spacial tile, or geohash).
          Hide
          Grant Ingersoll added a comment -

          So it would be really nice if the same request would work regardless of which point field was being used (trie based, spacial tile, or geohash).

          Agreed, how about I rename SOLR-1568 to be "Create a Spatial Filter Parser Plugin" or something to that effect and we handle it there?

          Show
          Grant Ingersoll added a comment - So it would be really nice if the same request would work regardless of which point field was being used (trie based, spacial tile, or geohash). Agreed, how about I rename SOLR-1568 to be "Create a Spatial Filter Parser Plugin" or something to that effect and we handle it there?
          Hide
          Grant Ingersoll added a comment -

          Committed revision 894301.

          Show
          Grant Ingersoll added a comment - Committed revision 894301.
          Hide
          Yonik Seeley added a comment -

          Unlike PointType, it seems odd that a user would need to understand and declare any kind of subFieldTyp/Suffiix for SpatialTileField... seems like this one we really want to be an implementation detail somehow... and ultimately it seems like we want to allow the SpacialTileFIeld to be an implementation detail as well. It would be nice to just define a SpacialTilePoint and just use it for everything (filtering, distance calculations, etc).

          Show
          Yonik Seeley added a comment - Unlike PointType, it seems odd that a user would need to understand and declare any kind of subFieldTyp/Suffiix for SpatialTileField... seems like this one we really want to be an implementation detail somehow... and ultimately it seems like we want to allow the SpacialTileFIeld to be an implementation detail as well. It would be nice to just define a SpacialTilePoint and just use it for everything (filtering, distance calculations, etc).
          Hide
          Grant Ingersoll added a comment -

          Unlike PointType, it seems odd that a user would need to understand and declare any kind of subFieldTyp/Suffiix for SpatialTileField... seems like this one we really want to be an implementation detail somehow..

          I suppose they could just always be a DoubleField subtype, but how do you guarantee it is registered in the schema? I suppose it is 99.99% likely it will be there, so we could just assume it. I can change it to do this.

          and ultimately it seems like we want to allow the SpacialTileFIeld to be an implementation detail as well. It would be nice to just define a SpacialTilePoint and just use it for everything (filtering, distance calculations, etc).

          I'm not sure I see how to use a tile for anything other than filtering (is the point in the box or not). I suppose it could be used for vary crude distance calculations, but that doesn't seem all that useful. I think there may be too much of a goal to hide all the details from the application. The choice of the data structure is going to depend on the application, just as one chooses to use int, float or double depending on their application needs. Many applications will do just fine using PointType with a double, even for range queries. Others may specifically want a tile approach as it best solves their problem.

          Show
          Grant Ingersoll added a comment - Unlike PointType, it seems odd that a user would need to understand and declare any kind of subFieldTyp/Suffiix for SpatialTileField... seems like this one we really want to be an implementation detail somehow.. I suppose they could just always be a DoubleField subtype, but how do you guarantee it is registered in the schema? I suppose it is 99.99% likely it will be there, so we could just assume it. I can change it to do this. and ultimately it seems like we want to allow the SpacialTileFIeld to be an implementation detail as well. It would be nice to just define a SpacialTilePoint and just use it for everything (filtering, distance calculations, etc). I'm not sure I see how to use a tile for anything other than filtering (is the point in the box or not). I suppose it could be used for vary crude distance calculations, but that doesn't seem all that useful. I think there may be too much of a goal to hide all the details from the application. The choice of the data structure is going to depend on the application, just as one chooses to use int, float or double depending on their application needs. Many applications will do just fine using PointType with a double, even for range queries. Others may specifically want a tile approach as it best solves their problem.
          Hide
          Yonik Seeley added a comment - - edited

          I'm not sure I see how to use a tile for anything other than filtering

          That's the point though - as a casual user, I want a point field. I want to be able to do efficient spacial search on that field and not worry about all of the details. See the example I gave above... it's doing everything spacial-related on the same field. So a higher level SpatialTilePoint would be a point field (i.e. it would still have lat/lon separately) that used tiles under the cover for efficient bounding box / filters.

          edit: don't get me wrong, I think it's also good to also enable the use of SpatialTileField separately (as this issue does). It's the overall spacial-solr capabilities I'm talking about.

          The choice of the data structure is going to depend on the application

          There will often be many applications / clients. One should be able to change the underlying implementation and use the same requests. We can do this today with range queries on any type of numeric field... we should be able to do it with a bounding box or distance filter.

          Show
          Yonik Seeley added a comment - - edited I'm not sure I see how to use a tile for anything other than filtering That's the point though - as a casual user, I want a point field. I want to be able to do efficient spacial search on that field and not worry about all of the details. See the example I gave above... it's doing everything spacial-related on the same field. So a higher level SpatialTilePoint would be a point field (i.e. it would still have lat/lon separately) that used tiles under the cover for efficient bounding box / filters. edit: don't get me wrong, I think it's also good to also enable the use of SpatialTileField separately (as this issue does). It's the overall spacial-solr capabilities I'm talking about. The choice of the data structure is going to depend on the application There will often be many applications / clients. One should be able to change the underlying implementation and use the same requests. We can do this today with range queries on any type of numeric field... we should be able to do it with a bounding box or distance filter.
          Hide
          Grant Ingersoll added a comment -

          A tile is not a point. A tile is a box containing lots of points and is used as a quick substitution for all of those points in the box. It's basically an indexing time optimization that precalculates the bounding boxes ahead of time.

          Show
          Grant Ingersoll added a comment - A tile is not a point. A tile is a box containing lots of points and is used as a quick substitution for all of those points in the box. It's basically an indexing time optimization that precalculates the bounding boxes ahead of time.
          Hide
          Yonik Seeley added a comment -

          A tile is not a point.

          Then we're talking past each other a bit. I understand what a tile is.
          A tile can also be viewed as an implementation detail to speed up spacial querying / filtering. So we could have a SpatialTilePoint that is a point, and under the covers, it also does stuff (like index spatial tiles) to speed up filtering.

          I'm not suggesting changing SpatialTileField... I'm suggesting that in the overall scheme of things, it's not the highest level abstraction we want.

          Show
          Yonik Seeley added a comment - A tile is not a point. Then we're talking past each other a bit. I understand what a tile is. A tile can also be viewed as an implementation detail to speed up spacial querying / filtering. So we could have a SpatialTilePoint that is a point, and under the covers, it also does stuff (like index spatial tiles) to speed up filtering. I'm not suggesting changing SpatialTileField... I'm suggesting that in the overall scheme of things, it's not the highest level abstraction we want.
          Hide
          Grant Ingersoll added a comment -

          A tile can also be viewed as an implementation detail to speed up spacial querying / filtering. So we could have a SpatialTilePoint that is a point, and under the covers, it also does stuff (like index spatial tiles) to speed up filtering.

          Yeah, I debated whether we wanted the SpatialTileField to also index the point (as in deferring to PointType) but decided this was easily enough done via a copy field. If you think there is value there, though, it would be trivial to implement the combination of PointType and SpatialTileField.

          FWIW, I think SOLR-1568 will take care of hiding the details sufficiently from the user.

          Show
          Grant Ingersoll added a comment - A tile can also be viewed as an implementation detail to speed up spacial querying / filtering. So we could have a SpatialTilePoint that is a point, and under the covers, it also does stuff (like index spatial tiles) to speed up filtering. Yeah, I debated whether we wanted the SpatialTileField to also index the point (as in deferring to PointType) but decided this was easily enough done via a copy field. If you think there is value there, though, it would be trivial to implement the combination of PointType and SpatialTileField. FWIW, I think SOLR-1568 will take care of hiding the details sufficiently from the user.
          Hide
          Chris Male added a comment -

          Hi,

          I'm not entirely clear of the outcome of the discussion re the SpatialTilePoint, but I would really recommend keeping this as far away from the user as possible. Ideally they shouldn't have to know about it at all since it seems that the implementation of the spatial tiling is still heavily in development. Alternatives have even been suggested that would make it redundant.

          I agree with the idea that to the user they should only be concerned about their documents have a Point. It then frees us up to do all kinds of changes to the underlying logic, without the definition of their documents having to change.

          Show
          Chris Male added a comment - Hi, I'm not entirely clear of the outcome of the discussion re the SpatialTilePoint, but I would really recommend keeping this as far away from the user as possible. Ideally they shouldn't have to know about it at all since it seems that the implementation of the spatial tiling is still heavily in development. Alternatives have even been suggested that would make it redundant. I agree with the idea that to the user they should only be concerned about their documents have a Point. It then frees us up to do all kinds of changes to the underlying logic, without the definition of their documents having to change.
          Hide
          Grant Ingersoll added a comment -

          Yep, the user still simply adds a Point to the document, that side of the coin won't change. How the tile is implemented underneath the hood is in fact one of the benefits of doing it as a FieldType. At some point, though, if a app designers does wants a tile-based system, they need to declare as much.

          Alternatives have even been suggested that would make it redundant.

          Would be good to provide a reference if you have it handy, just so it is recorded here.

          Show
          Grant Ingersoll added a comment - Yep, the user still simply adds a Point to the document, that side of the coin won't change. How the tile is implemented underneath the hood is in fact one of the benefits of doing it as a FieldType. At some point, though, if a app designers does wants a tile-based system, they need to declare as much. Alternatives have even been suggested that would make it redundant. Would be good to provide a reference if you have it handy, just so it is recorded here.
          Hide
          Chris Male added a comment -

          The alternatives I'm alluding to are the use of TrieRanges to do an efficient bounding box style filter instead of the tiling system. In SOLR-773 this was touched on, but I never saw an outcome to that discussion. I think it is a worthwhile thing to explore, even as part of the work being done here in Solr.

          Show
          Chris Male added a comment - The alternatives I'm alluding to are the use of TrieRanges to do an efficient bounding box style filter instead of the tiling system. In SOLR-773 this was touched on, but I never saw an outcome to that discussion. I think it is a worthwhile thing to explore, even as part of the work being done here in Solr.
          Hide
          Grant Ingersoll added a comment -

          Do you mean TrieFields (not familiar w/ TrieRanges)? Assuming you do, Trie Fields can be used, but their downside is they require searching two fields instead of one. They are already supported out of the box by Solr.

          Show
          Grant Ingersoll added a comment - Do you mean TrieFields (not familiar w/ TrieRanges)? Assuming you do, Trie Fields can be used, but their downside is they require searching two fields instead of one. They are already supported out of the box by Solr.
          Hide
          Chris Male added a comment -

          Ah yes sorry TrieFields. I don't see searching 2 fields as a downside since that's just an implementation detail like the Spatial Tile (which requires you to have upto 15 fields). Assuming you can use the Point FieldType to index an x and y field, then it just becomes another option like Spatial Tile. The fact they are supported out of box is part of the attraction, as it would reduce how much custom code has to be maintained.

          Show
          Chris Male added a comment - Ah yes sorry TrieFields. I don't see searching 2 fields as a downside since that's just an implementation detail like the Spatial Tile (which requires you to have upto 15 fields). Assuming you can use the Point FieldType to index an x and y field, then it just becomes another option like Spatial Tile. The fact they are supported out of box is part of the attraction, as it would reduce how much custom code has to be maintained.
          Hide
          Grant Ingersoll added a comment -

          I don't see searching 2 fields as a downside since that's just an implementation detail like the Spatial Tile

          Searching 2 fields instead of one can be significant. AIUI, the big problem comes in when you have really dense areas that are used by high traffic sites, such as Manhattan or somewhere similar and could have a million lat/lon pairs all in a 5 mile radius.

          Show
          Grant Ingersoll added a comment - I don't see searching 2 fields as a downside since that's just an implementation detail like the Spatial Tile Searching 2 fields instead of one can be significant. AIUI, the big problem comes in when you have really dense areas that are used by high traffic sites, such as Manhattan or somewhere similar and could have a million lat/lon pairs all in a 5 mile radius.
          Hide
          Chris Male added a comment -

          If its not something we want to support, then thats fine. Particularly given the stats Patrick has, its been clear that in those high density environments its not a good choice. My original point was that this aspect of the spatial search is still heavily in development and I was advocating trying to reduce the visibility of the Spatial Tile implementation so that we are freer to do that development.

          Show
          Chris Male added a comment - If its not something we want to support, then thats fine. Particularly given the stats Patrick has, its been clear that in those high density environments its not a good choice. My original point was that this aspect of the spatial search is still heavily in development and I was advocating trying to reduce the visibility of the Spatial Tile implementation so that we are freer to do that development.
          Hide
          Yonik Seeley added a comment -

          My original point was that this aspect of the spatial search is still heavily in development and I was advocating trying to reduce the visibility of the Spatial Tile implementation so that we are freer to do that development.

          +1
          One could also imagine future implementations that allow varying resolution depending on the area to help fix the dense city issues.
          Also, an implementation based just on trie range queries might be nice, as a reference to test other implementations against. As Patrick points out, the only missing code is that to determine the bounding box so that range queries can be created.

          Show
          Yonik Seeley added a comment - My original point was that this aspect of the spatial search is still heavily in development and I was advocating trying to reduce the visibility of the Spatial Tile implementation so that we are freer to do that development. +1 One could also imagine future implementations that allow varying resolution depending on the area to help fix the dense city issues. Also, an implementation based just on trie range queries might be nice, as a reference to test other implementations against. As Patrick points out, the only missing code is that to determine the bounding box so that range queries can be created.
          Hide
          Grant Ingersoll added a comment -

          Sounds good. I'm open to specific suggestions on how to do that. I think the key lies in the QParser, which will completely hide it from the app other than the schema designer needs to make the choice about setting up a field to index it. I don't see it as something that would work as an attribute on the generic PointType, but we could have a derived 2D PointType that specifically captures both the point capabilities and the Tile capabilities.

          I also don't feel like having a SpatialTileField necessarily ties our hands dev. wise. We can still change the underlying implementation (heck, it could likely all be done in a single field w/ payloads. I'd like to see the performance characteristics of that) The user is still just passing in a lat/lon pair against that field.

          Show
          Grant Ingersoll added a comment - Sounds good. I'm open to specific suggestions on how to do that. I think the key lies in the QParser, which will completely hide it from the app other than the schema designer needs to make the choice about setting up a field to index it. I don't see it as something that would work as an attribute on the generic PointType, but we could have a derived 2D PointType that specifically captures both the point capabilities and the Tile capabilities. I also don't feel like having a SpatialTileField necessarily ties our hands dev. wise. We can still change the underlying implementation (heck, it could likely all be done in a single field w/ payloads. I'd like to see the performance characteristics of that) The user is still just passing in a lat/lon pair against that field.
          Hide
          Yonik Seeley added a comment -

          Sounds good. I'm open to specific suggestions on how to do that. I think the key lies in the QParser

          Right - I've mentioned a spacial base class a few times and this is why. It allows the implementation to be hidden, while also allowing custom classes to plug right into it. The QParser for "sfilt" would simply delegate to a method on the spacial base.

          I don't see it as something that would work as an attribute on the generic PointType, but we could have a derived 2D PointType that specifically captures both the point capabilities and the Tile capabilities.

          Yep, that's what I had in mind.

          Show
          Yonik Seeley added a comment - Sounds good. I'm open to specific suggestions on how to do that. I think the key lies in the QParser Right - I've mentioned a spacial base class a few times and this is why. It allows the implementation to be hidden, while also allowing custom classes to plug right into it. The QParser for "sfilt" would simply delegate to a method on the spacial base. I don't see it as something that would work as an attribute on the generic PointType, but we could have a derived 2D PointType that specifically captures both the point capabilities and the Tile capabilities. Yep, that's what I had in mind.
          Hide
          Hoss Man added a comment -

          Correcting Fix Version based on CHANGES.txt, see this thread for more details...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Show
          Hoss Man added a comment - Correcting Fix Version based on CHANGES.txt, see this thread for more details... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              1 Vote for this issue
              Watchers:
              2 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development