Details

    • Type: New Feature New Feature
    • Status: Closed
    • Priority: Minor Minor
    • Resolution: Fixed
    • Affects Version/s: None
    • Fix Version/s: 3.1, 4.0-ALPHA
    • Component/s: None
    • Labels:
      None

      Description

      Local Lucene has been donated to the Lucene project. It has some Solr components, but we should evaluate how best to incorporate it into Solr.

      See http://lucene.markmail.org/message/orzro22sqdj3wows?q=LocalLucene

      1. spatial-solr.tar.gz
        1.86 MB
        patrick o'leary
      2. SOLR-773-local-lucene.patch
        17 kB
        Ryan McKinley
      3. SOLR-773-local-lucene.patch
        23 kB
        Ryan McKinley
      4. SOLR-773-local-lucene.patch
        24 kB
        Ryan McKinley
      5. SOLR-773-local-lucene.patch
        21 kB
        Ryan McKinley
      6. SOLR-773-local-lucene.patch
        22 kB
        Ryan McKinley
      7. SOLR-773.patch
        24 kB
        Grant Ingersoll
      8. lucene.tar.gz
        1.29 MB
        Grant Ingersoll
      9. SOLR-773.patch
        24 kB
        patrick o'leary
      10. lucene-spatial-2.9-dev.jar
        61 kB
        Chris Male
      11. SOLR-773-spatial_solr.patch
        44 kB
        Chris Male
      12. exampleSpatial.zip
        58 kB
        Chris Male
      13. solrGeoQuery.tar
        30 kB
        Brad Giaccio
      14. screenshot-1.jpg
        66 kB
        Eric Pugh

        Issue Links

          Activity

          Hide
          Yonik Seeley added a comment -

          I haven't yet looked at the contributions (but I did read the whitepaper).

          It seems like we want lat+lon in the same field value. That will remove the need for any other mechanism to correlate the two (what lat goes with what lon) and will allow future indexing mechanisms that operate on both values at once.

          Do we need a new basic output type (in addition to str, int, long, etc)? For now perhaps we should just use a string representation?
          <str name="my_house">12.345,-67.89</str>
          or in JSON
          "my_house":"12.345,-67.89"

          So breaking things down, it seems like we basically need to be able to:
          1) filter by a bounding box
          2) filter by a geo radius (impl could first get the bounding box and narrow within that...)
          3) sort by distance
          4) return the distance

          It also seems like there could be an opportunity to make much/most of this generic (not specific to geosearch).

          Show
          Yonik Seeley added a comment - I haven't yet looked at the contributions (but I did read the whitepaper). It seems like we want lat+lon in the same field value. That will remove the need for any other mechanism to correlate the two (what lat goes with what lon) and will allow future indexing mechanisms that operate on both values at once. Do we need a new basic output type (in addition to str, int, long, etc)? For now perhaps we should just use a string representation? <str name="my_house">12.345,-67.89</str> or in JSON "my_house":"12.345,-67.89" So breaking things down, it seems like we basically need to be able to: 1) filter by a bounding box 2) filter by a geo radius (impl could first get the bounding box and narrow within that...) 3) sort by distance 4) return the distance It also seems like there could be an opportunity to make much/most of this generic (not specific to geosearch).
          Hide
          Ryan McKinley added a comment -

          LocalLucene/Solr are currently designed to do exactly points 1-4.

          As for storing lat/lon in a single field... that sounds really interesting. Currently the local lucene stuff uses two fields and NumberUtils.java to index/store the distance. It does a lot of good work to break various bounding box levels into tokens and only performs math on the minimum result set.

          We should consider a geohash field type: http://en.wikipedia.org/wiki/Geohash to store lat/lon in a single string. This has some really interesting features that are ideal for lucene. In particular, checking if a point is within a bounding box is simply a lexicographic range query.

          Here is a public domain python geohash implementation: http://mappinghacks.com/code/geohash.py.txt

          Show
          Ryan McKinley added a comment - LocalLucene/Solr are currently designed to do exactly points 1-4. As for storing lat/lon in a single field... that sounds really interesting. Currently the local lucene stuff uses two fields and NumberUtils.java to index/store the distance. It does a lot of good work to break various bounding box levels into tokens and only performs math on the minimum result set. We should consider a geohash field type: http://en.wikipedia.org/wiki/Geohash to store lat/lon in a single string. This has some really interesting features that are ideal for lucene. In particular, checking if a point is within a bounding box is simply a lexicographic range query. Here is a public domain python geohash implementation: http://mappinghacks.com/code/geohash.py.txt
          Hide
          Ryan McKinley added a comment -

          We should also consider the OGC standard "Well Known Text":
          http://en.wikipedia.org/wiki/Well-known_text

          This is what MySQL and PostGIS use to enter GIS data

          Show
          Ryan McKinley added a comment - We should also consider the OGC standard "Well Known Text": http://en.wikipedia.org/wiki/Well-known_text This is what MySQL and PostGIS use to enter GIS data
          Hide
          patrick o'leary added a comment -

          Hey guys

          Placing both lat and long in the same field is good when used internally, the majority of users
          of localsolr have separate fields representing lat long, so make sure the representation
          does not effect the original document.

          WKT uses point as the naming convention for single item's and I'd suggest that rather than just str, it would
          also be nice to get to a KML wt format as well. I've done some stuff integrating in mapping components and
          KML goes down real well.

          However be aware as soon as you start supporting WKT, you will be asked for ERSI support, and poly support,
          ray tracing, collision and a lot more fun things

          P

          Show
          patrick o'leary added a comment - Hey guys Placing both lat and long in the same field is good when used internally, the majority of users of localsolr have separate fields representing lat long, so make sure the representation does not effect the original document. WKT uses point as the naming convention for single item's and I'd suggest that rather than just str, it would also be nice to get to a KML wt format as well. I've done some stuff integrating in mapping components and KML goes down real well. However be aware as soon as you start supporting WKT, you will be asked for ERSI support, and poly support, ray tracing, collision and a lot more fun things P
          Hide
          Ryan McKinley added a comment -

          I'm looking over the code grant now... (thanks again!)

          There are two implementations of LocalUpdateProcessorFactory:
          com.mapquest
          com.pjaol

          They do slightly different things...

          • The pjaol version creates a CartesianTierPlotter and then builds a bunch of fields for each level: _localTierN
          • The mapquest verison puts a bunch of spatial tokens (sid/SpatialIndex) into a single field.

          Any pointers on why one approach over the other? Do they solve the same problem?

          The mapquest version seems like it could be easily replaced with an Analyzer... perhaps one that takes a single lat/lon string:
          <str name="location">12.345 -67.89</str>
          and then generates tokens for it. All the plumbing to encode the data in an updateProcessor then decode it in a FieldType seems a bit awkward.

          Show
          Ryan McKinley added a comment - I'm looking over the code grant now... (thanks again!) There are two implementations of LocalUpdateProcessorFactory: com.mapquest com.pjaol They do slightly different things... The pjaol version creates a CartesianTierPlotter and then builds a bunch of fields for each level: _localTierN The mapquest verison puts a bunch of spatial tokens (sid/SpatialIndex) into a single field. Any pointers on why one approach over the other? Do they solve the same problem? The mapquest version seems like it could be easily replaced with an Analyzer... perhaps one that takes a single lat/lon string: <str name="location">12.345 -67.89</str> and then generates tokens for it. All the plumbing to encode the data in an updateProcessor then decode it in a FieldType seems a bit awkward.
          Hide
          patrick o'leary added a comment -

          I believe you guys are using a branch of the code as we were looking at using the mapquest sids.

          Both versions are solving the same basic problem, creating a sudo quad tree implementation.
          com.pjaol was the initial API I built, com.mapquest is donated to us by MapQuest.

          Both versions work by flattening out the earth onto a series of grids, the grids get progressively smaller
          with each _localTierN, in the MapQuest version there is a notion of zooming.
          Some quick info graphics here:
          http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html

          The differences are, com.pjaol uses pretty exact measurements, the flattening method is based on something
          called a sinusoidal projection, where I translate lat / longs to x,y coordinates which provide an equal spaced projection on a flat surface. Then I use GeoTools for the actual precise distance calculation.

          It comes at a slight performance cost to be that exact, but users have a need for it.

          The com.mapquest code, does a direct conversion to cartesian x,y coordinate from lat / long, encodes and generates sids and uses a standard great circle equation for distance calculation. So not as convoluted.
          It does come at a slight accuracy cost - but only in a few places, Greenland, New Zealand, some places around the poles and equator.

          So it's perfect for web based applications as the +/- error differential is small enough to be acceptable for most users.
          There is however a good audience for local lucene, who use it for more exact calculation, even down to the meter range. It's also used by some research groups for non-land based activities hence the desire to retain the exactness.

          Show
          patrick o'leary added a comment - I believe you guys are using a branch of the code as we were looking at using the mapquest sids. Both versions are solving the same basic problem, creating a sudo quad tree implementation. com.pjaol was the initial API I built, com.mapquest is donated to us by MapQuest. Both versions work by flattening out the earth onto a series of grids, the grids get progressively smaller with each _localTierN, in the MapQuest version there is a notion of zooming. Some quick info graphics here: http://www.nsshutdown.com/projects/lucene/whitepaper/locallucene_v2.html The differences are, com.pjaol uses pretty exact measurements, the flattening method is based on something called a sinusoidal projection, where I translate lat / longs to x,y coordinates which provide an equal spaced projection on a flat surface. Then I use GeoTools for the actual precise distance calculation. It comes at a slight performance cost to be that exact, but users have a need for it. The com.mapquest code, does a direct conversion to cartesian x,y coordinate from lat / long, encodes and generates sids and uses a standard great circle equation for distance calculation. So not as convoluted. It does come at a slight accuracy cost - but only in a few places, Greenland, New Zealand, some places around the poles and equator. So it's perfect for web based applications as the +/- error differential is small enough to be acceptable for most users. There is however a good audience for local lucene, who use it for more exact calculation, even down to the meter range. It's also used by some research groups for non-land based activities hence the desire to retain the exactness.
          Hide
          Ryan McKinley added a comment -

          Thanks for the clarification...

          >
          > Then I use GeoTools for the actual precise distance calculation.
          >

          FYI, in the initial apache checking, I'm removing the GeoTools dependancy (it is LGPL)

          I'd like to make the distanceHandler logic for distance calculations pluggable so its easy to link to GeoTools when necessary.

          Show
          Ryan McKinley added a comment - Thanks for the clarification... > > Then I use GeoTools for the actual precise distance calculation. > FYI, in the initial apache checking, I'm removing the GeoTools dependancy (it is LGPL) I'd like to make the distanceHandler logic for distance calculations pluggable so its easy to link to GeoTools when necessary.
          Hide
          patrick o'leary added a comment -

          Yep agree

          Show
          patrick o'leary added a comment - Yep agree
          Hide
          patrick o'leary added a comment -

          Port of LocalSolr to spatial-solr for inclusion in solr's contrib.
          Provides geographical based search capabilities to solr

          Show
          patrick o'leary added a comment - Port of LocalSolr to spatial-solr for inclusion in solr's contrib. Provides geographical based search capabilities to solr
          Hide
          Ryan McKinley added a comment -

          Thanks patrick!

          Two things stick out to me:

          1. LocalSolrQueryComponent duplicates most of the code from SolrQueryComponent. Perhaps a better solution would be to have a custom QParser that builds the query and then add a SearchComponent to the chain to augment the results with the calculated distance.

          2. (related) If the query is implemented as a QParser, we would just need to implement:

            public SortSpec getSort(boolean useGlobalParams) throws ParseException 
          

          rather then use the LocalSolrSortParser.

          Show
          Ryan McKinley added a comment - Thanks patrick! Two things stick out to me: 1. LocalSolrQueryComponent duplicates most of the code from SolrQueryComponent. Perhaps a better solution would be to have a custom QParser that builds the query and then add a SearchComponent to the chain to augment the results with the calculated distance. 2. (related) If the query is implemented as a QParser, we would just need to implement: public SortSpec getSort( boolean useGlobalParams) throws ParseException rather then use the LocalSolrSortParser.
          Hide
          Ryan McKinley added a comment -

          Not a big deal, but it looks like the List<CartesianTierPlotter> plotters could be initialized once for the Factory then reused rather then initializing it for each request.

          Show
          Ryan McKinley added a comment - Not a big deal, but it looks like the List<CartesianTierPlotter> plotters could be initialized once for the Factory then reused rather then initializing it for each request.
          Hide
          Ryan McKinley added a comment -

          Here is a (totally untested) patch that uses QParser.

          This requires some small tweeks to the QParser class to make the sort parsing extensible.

          Take a look and see what you think...

          Show
          Ryan McKinley added a comment - Here is a (totally untested) patch that uses QParser. This requires some small tweeks to the QParser class to make the sort parsing extensible. Take a look and see what you think...
          Hide
          Ryan McKinley added a comment -

          This version runs, but still no tests.

          I added spatial stuff to the example configs, but I'm not sure I like that long term. The examples are getting a bit over cluttered.

          http://localhost:8983/solr/select?q=*:*&qt=geo&lat=40&long=-75&radius=99

          Show
          Ryan McKinley added a comment - This version runs, but still no tests. I added spatial stuff to the example configs, but I'm not sure I like that long term. The examples are getting a bit over cluttered. http://localhost:8983/solr/select?q=*:*&qt=geo&lat=40&long=-75&radius=99
          Hide
          patrick o'leary added a comment -

          Lucene uses a static sort comparator getCachedComparator in lucene's FieldSortedHitQueue.java
          The assumption being that the sort comparator would never have any data in it I guess.

          As the distances in the geo sort are a hashmap produced by the distance query, the ScoreDocComparator creates a memory leak
          unless the scope of the distance query is within the process block.
          It's messy but the only work around I could find.

          Putting the distance query in the response builder could make this leak again.

          Show
          patrick o'leary added a comment - Lucene uses a static sort comparator getCachedComparator in lucene's FieldSortedHitQueue.java The assumption being that the sort comparator would never have any data in it I guess. As the distances in the geo sort are a hashmap produced by the distance query, the ScoreDocComparator creates a memory leak unless the scope of the distance query is within the process block. It's messy but the only work around I could find. Putting the distance query in the response builder could make this leak again.
          Hide
          Ryan McKinley added a comment -

          hymmm. I don't follow. Is the problem that the HashMap stays in static memory for each request? If so, could we put the map in the request context?

          Is this an issue with the lucene Sort Comparator interface or with how the solr implementation passes the results around?

          Show
          Ryan McKinley added a comment - hymmm. I don't follow. Is the problem that the HashMap stays in static memory for each request? If so, could we put the map in the request context? Is this an issue with the lucene Sort Comparator interface or with how the solr implementation passes the results around?
          Hide
          patrick o'leary added a comment -

          It's because of the FieldSortedHitQueue in lucene, even though sorts are generally created as new objects, the FieldSortedHitQueue maintains a static cache of them-

          Somebody actually had another work around
          http://mail-archives.apache.org/mod_mbox/lucene-java-user/200806.mbox/%3C571296.22735.qm@web50301.mail.re2.yahoo.com%3E

          I haven't tried it, but it might be an option.

          Show
          patrick o'leary added a comment - It's because of the FieldSortedHitQueue in lucene, even though sorts are generally created as new objects, the FieldSortedHitQueue maintains a static cache of them- Somebody actually had another work around http://mail-archives.apache.org/mod_mbox/lucene-java-user/200806.mbox/%3C571296.22735.qm@web50301.mail.re2.yahoo.com%3E I haven't tried it, but it might be an option.
          Hide
          Ryan McKinley added a comment -

          here is an updated patch using SOLR-948

          -------

          The memory leak issue is bad news. Its worth looking at the SortComparatorSourceUncacheable option...

          Show
          Ryan McKinley added a comment - here is an updated patch using SOLR-948 ------- The memory leak issue is bad news. Its worth looking at the SortComparatorSourceUncacheable option...
          Hide
          patrick o'leary added a comment -

          There is a patch LUCENE-1304 for SortComparatorSourceUncacheable, which hasn't had any TLC in a while
          It's been associated with LUCENE-1483 which looks like a major change that could take a while to get in.

          I'd like to see if we can get movement on LUCENE-1304 as it would help with some of the scope madness I've had to deal
          with, and resolve the issue once and for all.

          Show
          patrick o'leary added a comment - There is a patch LUCENE-1304 for SortComparatorSourceUncacheable, which hasn't had any TLC in a while It's been associated with LUCENE-1483 which looks like a major change that could take a while to get in. I'd like to see if we can get movement on LUCENE-1304 as it would help with some of the scope madness I've had to deal with, and resolve the issue once and for all.
          Hide
          Grant Ingersoll added a comment -

          Anyone know the status on this one?

          Show
          Grant Ingersoll added a comment - Anyone know the status on this one?
          Hide
          Grant Ingersoll added a comment -

          This latest patch doesn't compile b/c it is missing the SpatialParams class.

          Show
          Grant Ingersoll added a comment - This latest patch doesn't compile b/c it is missing the SpatialParams class.
          Hide
          Ryan McKinley added a comment -

          dooh – here is a patch that includes SpatialParams

          I just ran 'svn up' and 'ant test', a bunch of solrj things fail – i can't look into just now, but I'll post anyway.

          • - - - -

          Note, this patch has a bunch of weirdness to try to avoid a memory error with custom sorting in lucene. The new field options in LUCENE-1483 should avoid this problem, but LocalLucene must be refactored to use the new sorting classes first.

          Show
          Ryan McKinley added a comment - dooh – here is a patch that includes SpatialParams I just ran 'svn up' and 'ant test', a bunch of solrj things fail – i can't look into just now, but I'll post anyway. - - - - Note, this patch has a bunch of weirdness to try to avoid a memory error with custom sorting in lucene. The new field options in LUCENE-1483 should avoid this problem, but LocalLucene must be refactored to use the new sorting classes first.
          Hide
          patrick o'leary added a comment -

          Thanks Ryan, I've also updated local / spatial lucene to use the new FieldComparatorSource with LUCENE-1588
          But haven't had a chance to test it in Solr yet

          Show
          patrick o'leary added a comment - Thanks Ryan, I've also updated local / spatial lucene to use the new FieldComparatorSource with LUCENE-1588 But haven't had a chance to test it in Solr yet
          Hide
          Grant Ingersoll added a comment -

          I started documentation at: http://wiki.apache.org/solr/LocalSolr

          I've also at least taken care of PJ's comment on incorporating FieldCompSource from a compilation standpoint. I'm in the process of setting up some unit tests as well.

          Show
          Grant Ingersoll added a comment - I started documentation at: http://wiki.apache.org/solr/LocalSolr I've also at least taken care of PJ's comment on incorporating FieldCompSource from a compilation standpoint. I'm in the process of setting up some unit tests as well.
          Hide
          Grant Ingersoll added a comment -

          We should be able to incorporate the GeoHash stuff in Lucene now, right? I'm not spatial expert, but this means we could have an update processor that only uses one field, right?

          Show
          Grant Ingersoll added a comment - We should be able to incorporate the GeoHash stuff in Lucene now, right? I'm not spatial expert, but this means we could have an update processor that only uses one field, right?
          Hide
          patrick o'leary added a comment -

          GeoHash can be incorporated to reduce memory, but it should be optional as there's sill overhead in decoding the
          field for distance calculations. Again haven't been able to put a benchmark together for it, but i did notice it was slower.

          Show
          patrick o'leary added a comment - GeoHash can be incorporated to reduce memory, but it should be optional as there's sill overhead in decoding the field for distance calculations. Again haven't been able to put a benchmark together for it, but i did notice it was slower.
          Hide
          Grant Ingersoll added a comment -

          Here's a patch that compiles and the example works. The Lucene gzip contains the Lucene libs that I used (basically trunk from two nights ago) including the spatial contrib. It incorporates LUCENE-1588 for sorting.

          Still needs tests and some more example data.

          Show
          Grant Ingersoll added a comment - Here's a patch that compiles and the example works. The Lucene gzip contains the Lucene libs that I used (basically trunk from two nights ago) including the spatial contrib. It incorporates LUCENE-1588 for sorting. Still needs tests and some more example data.
          Hide
          Grant Ingersoll added a comment -

          OK, so color me a total geo newbie, but...

          So, if I index the spatial.xml in the patch I just submitted and I execute:

          http://localhost:8983/solr/select?q=name:five
          

          I get one result, which is expected.

          If I then do a geo search:

          http://localhost:8983/solr/select?q=name:five&qt=geo&long=-74.0093994140625&lat=40.75141843299745&radius=100&debugQuery=true
          

          I get two results. The second result is the other theater in the spatial.xml file. Yet, it does not contain the value "five" in the name field even though it meets the spatial search criteria.

          Shouldn't there just be one result? What am I not understanding?

          Show
          Grant Ingersoll added a comment - OK, so color me a total geo newbie, but... So, if I index the spatial.xml in the patch I just submitted and I execute: http: //localhost:8983/solr/select?q=name:five I get one result, which is expected. If I then do a geo search: http: //localhost:8983/solr/select?q=name:five&qt=geo& long =-74.0093994140625&lat=40.75141843299745&radius=100&debugQuery= true I get two results. The second result is the other theater in the spatial.xml file. Yet, it does not contain the value "five" in the name field even though it meets the spatial search criteria. Shouldn't there just be one result? What am I not understanding?
          Hide
          Grant Ingersoll added a comment -

          OK, I think I understand why it does this, but it seems a little odd to me. The reason is due to the fact that the geo handler uses the geo QParser, which ignores the query parameter and produces a query solely based on the lat/lon information.

          Like I said, I'm a newbie to geo search, but it seems like the QParser should delegate the parsing of the q param to some other parser and then it would only do distance calculations on the docset returned from the QueryComponent. Of course, I guess one could ask what the semantics are of combining a text query with a spatial query, but I would suppose we could combine them with either AND or OR, right, such that if I OR'd them together, I would get all docs matching the query term OR'd with all docs in the bounding box. Similarily, AND would yield all docs with the term in the bounding box, right?

          Again, I am likely missing something, so bear with me.

          Show
          Grant Ingersoll added a comment - OK, I think I understand why it does this, but it seems a little odd to me. The reason is due to the fact that the geo handler uses the geo QParser, which ignores the query parameter and produces a query solely based on the lat/lon information. Like I said, I'm a newbie to geo search, but it seems like the QParser should delegate the parsing of the q param to some other parser and then it would only do distance calculations on the docset returned from the QueryComponent. Of course, I guess one could ask what the semantics are of combining a text query with a spatial query, but I would suppose we could combine them with either AND or OR, right, such that if I OR'd them together, I would get all docs matching the query term OR'd with all docs in the bounding box. Similarily, AND would yield all docs with the term in the bounding box, right? Again, I am likely missing something, so bear with me.
          Hide
          patrick o'leary added a comment -

          Looking at it, there's no actual query parsing going on.
          You could call LuceneQParser, but it just doesn't seem like the right place for it.
          The original LocalSolr code created a filter to perform the geo-distance stuff- but it did have
          to duplicate a lot the SearchComponent code.

          Show
          patrick o'leary added a comment - Looking at it, there's no actual query parsing going on. You could call LuceneQParser, but it just doesn't seem like the right place for it. The original LocalSolr code created a filter to perform the geo-distance stuff- but it did have to duplicate a lot the SearchComponent code.
          Hide
          patrick o'leary added a comment -

          This fixes the query parsing issue, it defaults to the use the default QParserPlugin
          and allows you to specify a basedOn optional argument, to use a different QParserPlugin

          <queryParser  name="spatial_tier" class="org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin" basedOn="dismax"/>
          

          There are a couple of things to note

          • Latest distance facet code not included
          • Faster distance filter using query intersect isn't working (spatial lucene fix)
          • fsv for shard sorting not present

          I feel fsv should be extracted to a separate component to reduce the duplication of effort across
          other search components. But this will give the basics for the moment.

          Show
          patrick o'leary added a comment - This fixes the query parsing issue, it defaults to the use the default QParserPlugin and allows you to specify a basedOn optional argument, to use a different QParserPlugin <queryParser name= "spatial_tier" class= "org.apache.solr.spatial.tier.SpatialTierQueryParserPlugin" basedOn= "dismax" /> There are a couple of things to note Latest distance facet code not included Faster distance filter using query intersect isn't working (spatial lucene fix) fsv for shard sorting not present I feel fsv should be extracted to a separate component to reduce the duplication of effort across other search components. But this will give the basics for the moment.
          Hide
          Yonik Seeley added a comment -

          It seems like quite a lot of work has gone into working around some of Solr's current limitations... perhaps we should fix them instead? It seems like we should be able to avoid custom request handlers, query components, or update processors and simply use generic mechanisms.

          From the user interface point of view, what's needed is:

          • A way to filter by a bounding box. This could simply be a custom QParser
            fq= {!gbox p=101.2,234.5 f=position, d=1.5}

            // a bounding box, centered on 101.2,234.5 including everything within 1.5 miles

          • A function query that calculates distances
            gdist(position,101.2,234.3)
          • A way to sort by a function query... this is generic desired functionality anyway!
          • A way to return the value of a function query for documents - also generic desired functionality. Perhaps use meta as proposed in SOLR-705?

          If we had that, then geo becomes very generic - no need for special distributed search support, and one could do things like boosting the relevancy score by a function of the distance (not even necessarily a linear boost because of the flexibility of function query). If/when we get faceting on a function query, it will also automatically work with distances.

          It seems like points should be stored and represented in a single field, that way there can be multiple points per document (otherwise how would one correlate which latitude went with which longitude). How it's indexed (multiple fields, etc) is more of an implementation detail. There is an issue with how to allow a single field to index to multiple fields - another Solr limitation we should figure out how to fix (an earlier version of TrieRangeQuery needed this too).

          Show
          Yonik Seeley added a comment - It seems like quite a lot of work has gone into working around some of Solr's current limitations... perhaps we should fix them instead? It seems like we should be able to avoid custom request handlers, query components, or update processors and simply use generic mechanisms. From the user interface point of view, what's needed is: A way to filter by a bounding box. This could simply be a custom QParser fq= {!gbox p=101.2,234.5 f=position, d=1.5} // a bounding box, centered on 101.2,234.5 including everything within 1.5 miles A function query that calculates distances gdist(position,101.2,234.3) A way to sort by a function query... this is generic desired functionality anyway! A way to return the value of a function query for documents - also generic desired functionality. Perhaps use meta as proposed in SOLR-705 ? If we had that, then geo becomes very generic - no need for special distributed search support, and one could do things like boosting the relevancy score by a function of the distance (not even necessarily a linear boost because of the flexibility of function query). If/when we get faceting on a function query, it will also automatically work with distances. It seems like points should be stored and represented in a single field, that way there can be multiple points per document (otherwise how would one correlate which latitude went with which longitude). How it's indexed (multiple fields, etc) is more of an implementation detail. There is an issue with how to allow a single field to index to multiple fields - another Solr limitation we should figure out how to fix (an earlier version of TrieRangeQuery needed this too).
          Hide
          Ryan McKinley added a comment -

          Yonik – I like all 4 proposals.

          I am not familiar with the function query internal – would it get called for things that do not match the filter? Distance calculations are typically the most expensive part of the query.

          re "It seems like points should be stored and represented in a single field..." – I agree that the schema and URL API should point to a single field to represent the geometry field. In practice, the indexing will probably need multiple fields to get the job done (efficiently).

          It would be great if the schema field type could define everything needed to index and search. There are (at least) three approaches to indexing points that each have their advantages (and disadvantages) – we should be able to support any of these options.

          • GeoPointField (abstract? the standard stuff about dealing with points)
            • GeoPointFieldHash (represented as a GeoHash, fast bounds query (with limited accuracy))
            • GeoPointFieldTiers (highly scalable, fast, complex)
            • GeoPointFieldTrie (...)
          • GeoLineField...
          • GeoPolygonField...

          I think it makes sense to try to follow the georss format to represent geometry:

            <georss:point>45.256 -71.92</georss:point>
          
          <georss:line>45.256 -110.45 46.46 -109.48 43.84 -109.86</georss:line>
          
          <georss:polygon>
          	45.256 -110.45 46.46 -109.48 43.84 -109.86 45.256 -110.45
          </georss:polygon>
          
          <georss:box>42.943 -71.032 43.039 -69.856</georss:box>
          
          Show
          Ryan McKinley added a comment - Yonik – I like all 4 proposals. I am not familiar with the function query internal – would it get called for things that do not match the filter? Distance calculations are typically the most expensive part of the query. re "It seems like points should be stored and represented in a single field..." – I agree that the schema and URL API should point to a single field to represent the geometry field. In practice, the indexing will probably need multiple fields to get the job done (efficiently). It would be great if the schema field type could define everything needed to index and search. There are (at least) three approaches to indexing points that each have their advantages (and disadvantages) – we should be able to support any of these options. GeoPointField (abstract? the standard stuff about dealing with points) GeoPointFieldHash (represented as a GeoHash, fast bounds query (with limited accuracy)) GeoPointFieldTiers (highly scalable, fast, complex) GeoPointFieldTrie (...) GeoLineField... GeoPolygonField... I think it makes sense to try to follow the georss format to represent geometry: <georss:point> 45.256 -71.92 </georss:point> <georss:line> 45.256 -110.45 46.46 -109.48 43.84 -109.86 </georss:line> <georss:polygon> 45.256 -110.45 46.46 -109.48 43.84 -109.86 45.256 -110.45 </georss:polygon> <georss:box> 42.943 -71.032 43.039 -69.856 </georss:box>
          Hide
          Yonik Seeley added a comment -

          I am not familiar with the function query internal - would it get called for things that do not match the filter? Distance calculations are typically the most expensive part of the query.

          Right... function query currently does get called before filters are checked - but that's because current function queries are part of the main relevancy query. A function query that was only used to sort would only be called for docs that match the main relevancy query and all filters though.

          The performance issue would be the "boost" scenario - when the distance calculation is part of the main query. That's another generic Solr issue we should tackle at some point... filter efficiency. Related to LUCENE-1536 I think (but we could already do this relatively easily for BitDocSet... just not HashDocSet).

          Show
          Yonik Seeley added a comment - I am not familiar with the function query internal - would it get called for things that do not match the filter? Distance calculations are typically the most expensive part of the query. Right... function query currently does get called before filters are checked - but that's because current function queries are part of the main relevancy query. A function query that was only used to sort would only be called for docs that match the main relevancy query and all filters though. The performance issue would be the "boost" scenario - when the distance calculation is part of the main query. That's another generic Solr issue we should tackle at some point... filter efficiency. Related to LUCENE-1536 I think (but we could already do this relatively easily for BitDocSet... just not HashDocSet).
          Hide
          Chris A. Mattmann added a comment -

          Hi Guys,

          I'm interested in using LocalSOLR and the 4 proposals described by Yonik above are exactly what I need for an app here to stand up oceans data search here at work. We need the ability to do bounding box queries and spatial queries of the following form:

          1. Lat, Lon, Radius Template Element
          "&lat=

          {geo:lat?}

          &lon=

          {geo:lon?}

          &r=

          {geo:radius?}

          "

          With latitude and longitude in decimal degrees in EPSG:4326 format. The radius parameter is in meters along the surface.

          2. Box Template Element
          "&bbox=

          {geo:box?}

          "

          Bounding box coordinates in EPSG:4326 format in decimal degrees.

          Ordering is "west, south, east, north".

          3. Polygon
          "&p=

          {geo:polygon?}

          "

          Replaced with the latitude/longitude pairs describing a bounding area to perform a search within. The polygon is defined in latitude, longitude pairs, in clockwise order around the polygon, with

          I realize that #3 above is probably a ways off, but how close are we to #1 and #2? I'm trying to push to use SOLR here rather than leverage a custom or COTS solution, but will need at least support for #1 and #2 to make any headway. I'm willing to contribute and help out towards this – I just want to find out where we are.

          Thanks!

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - Hi Guys, I'm interested in using LocalSOLR and the 4 proposals described by Yonik above are exactly what I need for an app here to stand up oceans data search here at work. We need the ability to do bounding box queries and spatial queries of the following form: 1. Lat, Lon, Radius Template Element "&lat= {geo:lat?} &lon= {geo:lon?} &r= {geo:radius?} " With latitude and longitude in decimal degrees in EPSG:4326 format. The radius parameter is in meters along the surface. 2. Box Template Element "&bbox= {geo:box?} " Bounding box coordinates in EPSG:4326 format in decimal degrees. Ordering is "west, south, east, north". 3. Polygon "&p= {geo:polygon?} " Replaced with the latitude/longitude pairs describing a bounding area to perform a search within. The polygon is defined in latitude, longitude pairs, in clockwise order around the polygon, with I realize that #3 above is probably a ways off, but how close are we to #1 and #2? I'm trying to push to use SOLR here rather than leverage a custom or COTS solution, but will need at least support for #1 and #2 to make any headway. I'm willing to contribute and help out towards this – I just want to find out where we are. Thanks! Cheers, Chris
          Hide
          Grant Ingersoll added a comment - - edited

          I think, and correct me if I'm wrong, that one of the things that often happens with geo stuff is that there are a lot of unique values. This often has memory ramifications when using with FunctionQueries since most ValueSources uninvert the field.

          Otherwise, I like the sounds of Yonik's proposal as well.

          Show
          Grant Ingersoll added a comment - - edited I think, and correct me if I'm wrong, that one of the things that often happens with geo stuff is that there are a lot of unique values. This often has memory ramifications when using with FunctionQueries since most ValueSources uninvert the field. Otherwise, I like the sounds of Yonik's proposal as well.
          Hide
          Grant Ingersoll added a comment -

          Also, how does the TrieRange stuff factor into this?

          Show
          Grant Ingersoll added a comment - Also, how does the TrieRange stuff factor into this?
          Hide
          Yonik Seeley added a comment -

          FunctionQuery would just be the interface to the underlying geo distance function... it doesn't seem like it should affect the memory requirements of that underlying function (however it's currently implemented in local solr).

          Use of TrieRange could just be another implementation detail on how to quickly implement a bounding box function... it doesn't sound like it's necessarily needed with the cartesian tier strategy.

          Show
          Yonik Seeley added a comment - FunctionQuery would just be the interface to the underlying geo distance function... it doesn't seem like it should affect the memory requirements of that underlying function (however it's currently implemented in local solr). Use of TrieRange could just be another implementation detail on how to quickly implement a bounding box function... it doesn't sound like it's necessarily needed with the cartesian tier strategy.
          Hide
          Uwe Schindler added a comment - - edited

          Also, how does the TrieRange stuff factor into this?

          LocalLucene does something similar like TrieRange, but in two dimensions. It stores the Latitude and Longitude in one field as the number of a small rectangle (Cartesian tier) and the lower precision are simply bigger rectangles (I think they are quadrats). The effect is, that you only need one field name for the search, but you have the problem of limited precision.

          TrieRange on the other side is more universal for any numeric searches and is not limited to Geo. The bounding box search in Solr as proposed in the issue can also be simply done with two ints (e.g. by scaling the lat/lon by a factor like 1000000 for 6 digits after decimal point) or float field TrieRangeQueries. Interesting would be a comparison in speed and index size between LocalLucene and TrieRange. Both can be simply done with Solr, but I had no time for it.

          For our case (PANGAEA) we have another problem that is only solveable by TrieRange, not LocalLucene: Our Datasets itself are bounding boxes and if the user enters a bounding box, a hit is, if they intersect. This can be easily done with four half-open ranges. There is a small speed impact because of the half-open ranges that may hit very much TermDocs for the lower precs, but maybe I will create a special combined filter, that collects TermDocs only into one BitSet, so you can combine this ranges easily (but no idea, how to make an senseful API for that).

          Another idea to use TrieRange for geo search is using a hilbert curve on the earth and just do a range around the position on this curve (look on the picture on http://en.wikipedia.org/wiki/Hilbert_curve then it is clear what the idea is). As far as I know, geohash is working with this hilbert curve (it's the position on this curve), so if you index the binary geohash as a long with TrieRange, you could do this range very simply (correct me if I am wrong!). The drawback is, that you will only find quadratic areas (so the use case is: find all phone cells around (lat,lon)).

          In my opinion, I would recommend the following:
          If you need standard queries like find all phone cells around a position, use LocalLucene. If you need full flexibility, just see lat/lon or whatever CRS (Gauss-Krüger etc.) as two numeric values, where you can do SQL-like "between", ">", "<", ">=" and "<=" searches very fast.

          Show
          Uwe Schindler added a comment - - edited Also, how does the TrieRange stuff factor into this? LocalLucene does something similar like TrieRange, but in two dimensions. It stores the Latitude and Longitude in one field as the number of a small rectangle (Cartesian tier) and the lower precision are simply bigger rectangles (I think they are quadrats). The effect is, that you only need one field name for the search, but you have the problem of limited precision. TrieRange on the other side is more universal for any numeric searches and is not limited to Geo. The bounding box search in Solr as proposed in the issue can also be simply done with two ints (e.g. by scaling the lat/lon by a factor like 1000000 for 6 digits after decimal point) or float field TrieRangeQueries. Interesting would be a comparison in speed and index size between LocalLucene and TrieRange. Both can be simply done with Solr, but I had no time for it. For our case (PANGAEA) we have another problem that is only solveable by TrieRange, not LocalLucene: Our Datasets itself are bounding boxes and if the user enters a bounding box, a hit is, if they intersect. This can be easily done with four half-open ranges. There is a small speed impact because of the half-open ranges that may hit very much TermDocs for the lower precs, but maybe I will create a special combined filter, that collects TermDocs only into one BitSet, so you can combine this ranges easily (but no idea, how to make an senseful API for that). Another idea to use TrieRange for geo search is using a hilbert curve on the earth and just do a range around the position on this curve (look on the picture on http://en.wikipedia.org/wiki/Hilbert_curve then it is clear what the idea is). As far as I know, geohash is working with this hilbert curve (it's the position on this curve), so if you index the binary geohash as a long with TrieRange, you could do this range very simply (correct me if I am wrong!). The drawback is, that you will only find quadratic areas (so the use case is: find all phone cells around (lat,lon)). In my opinion, I would recommend the following: If you need standard queries like find all phone cells around a position, use LocalLucene. If you need full flexibility, just see lat/lon or whatever CRS (Gauss-Krüger etc.) as two numeric values, where you can do SQL-like "between", ">", "<", ">=" and "<=" searches very fast.
          Hide
          patrick o'leary added a comment -

          Sorry for not getting into this sooner-

          Lets take a step back for a second, and ask a couple of questions, my thoughts are provided.

          1) What is the goal we want to achieve?

          • Provide a first iteration of a geographical search entity to SOLR
          • Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many.

          2) What is the level of commitment, and road map of spatial solutions in lucene and solr?

          • The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that
            without reinventing the wheel and shoe horn-ing it into lucene.
            (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)
          • We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it.
          • I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex.

          3) Raw Math through trie data structures, Spatial ids geo hash, Tier Id's Cartesian tiers, which one?

          • Why not all? Again we can't solve everyone's needs so why not let them have the tools to help themselves.

          As for bench marking, I have performed some recently using tdouble precision 0,
          ~1 Million docs covering the state of NY
          Top density was ~300,000 between Manhattan & Brooklyn area.

          Returning all results, avg of 100 hits:
          Trie Double: 108ms
          Cartesian Tier: 12ms

          The reason for the difference, is with Trie Ranges, you are doing 2 sets of range filters/ queries.
          Cartesian you are doing 1 iteration for maybe 4 to 16 fielded id's.
          And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with.

          However, the distance calculation is the killer, 300,000 took about 1.8 seconds in a single thread on a 3.2GHz machine.

          I was working on some additional features in locallucene, such as poly lines, and convex hulls, which using the Cartesian tierIds
          can give some basic quick features such as intersect, contains, and a nifty feature of having sorted id's is nearby results.

          Also faceting on tierId's can give you hot spot results.
          One final feature, the projection method is a an implementation of IProjector, which allows you to create your own projection
          currently I'm using Sinusoidal, but you can do your own, such as say

          • Google Mercator (I use a similar quad grid concept, just different projection method)
          • Open Map
            etc..

          There's a lot that can be done, but we should stay focused on primary goals, and iterate, iterate iterate.

          Show
          patrick o'leary added a comment - Sorry for not getting into this sooner- Lets take a step back for a second, and ask a couple of questions, my thoughts are provided. 1) What is the goal we want to achieve? Provide a first iteration of a geographical search entity to SOLR Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many. 2) What is the level of commitment, and road map of spatial solutions in lucene and solr? The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that without reinventing the wheel and shoe horn-ing it into lucene. (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used) We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it. I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex. 3) Raw Math through trie data structures, Spatial ids geo hash, Tier Id's Cartesian tiers, which one? Why not all? Again we can't solve everyone's needs so why not let them have the tools to help themselves. As for bench marking, I have performed some recently using tdouble precision 0, ~1 Million docs covering the state of NY Top density was ~300,000 between Manhattan & Brooklyn area. Returning all results, avg of 100 hits: Trie Double: 108ms Cartesian Tier: 12ms The reason for the difference, is with Trie Ranges, you are doing 2 sets of range filters/ queries. Cartesian you are doing 1 iteration for maybe 4 to 16 fielded id's. And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with. However, the distance calculation is the killer, 300,000 took about 1.8 seconds in a single thread on a 3.2GHz machine. I was working on some additional features in locallucene, such as poly lines, and convex hulls, which using the Cartesian tierIds can give some basic quick features such as intersect, contains, and a nifty feature of having sorted id's is nearby results. Also faceting on tierId's can give you hot spot results. One final feature, the projection method is a an implementation of IProjector, which allows you to create your own projection currently I'm using Sinusoidal, but you can do your own, such as say Google Mercator (I use a similar quad grid concept, just different projection method) Open Map etc.. There's a lot that can be done, but we should stay focused on primary goals, and iterate, iterate iterate.
          Hide
          Uwe Schindler added a comment -

          Hi Patrick,

          thanks for ding the comparison!

          As for bench marking, I have performed some recently using tdouble precision 0,
          ~1 Million docs covering the state of NY
          Top density was ~300,000 between Manhattan & Brooklyn area.

          I wonder, what you mean with precison 0, so what was the precision step? 2, 4 or 8? precisionStep=0 should throw IAE, 64 should do a classical RangeQuery (enumerating all terms).

          And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with.

          I think much faster will not be possible. Even with TrieRange you always have to visit TermDocs. And something other: as you only return 100 docs, the number of terms visited may not be so big. The speed improvement of TrieRange is more visible the more distinct values are in the range.

          Show
          Uwe Schindler added a comment - Hi Patrick, thanks for ding the comparison! As for bench marking, I have performed some recently using tdouble precision 0, ~1 Million docs covering the state of NY Top density was ~300,000 between Manhattan & Brooklyn area. I wonder, what you mean with precison 0, so what was the precision step? 2, 4 or 8? precisionStep=0 should throw IAE, 64 should do a classical RangeQuery (enumerating all terms). And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with. I think much faster will not be possible. Even with TrieRange you always have to visit TermDocs. And something other: as you only return 100 docs, the number of terms visited may not be so big. The speed improvement of TrieRange is more visible the more distinct values are in the range.
          Hide
          patrick o'leary added a comment -

          Misread positionIncrementGap as precisionStep, should would have been using the default which I guess is 8.

          Show
          patrick o'leary added a comment - Misread positionIncrementGap as precisionStep, should would have been using the default which I guess is 8.
          Hide
          Grant Ingersoll added a comment - - edited

          1) What is the goal we want to achieve?

          • Provide a first iteration of a geographical search entity to SOLR
          • Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many.

          Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code.

          2) What is the level of commitment, and road map of spatial solutions in lucene and solr?

          • The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that
            without reinventing the wheel and shoe horn-ing it into lucene.
            (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)
          • We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it.
          • I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex.

          On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know.

          Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well.

          Totally agree on the other points. Also very cool to see the benchmarking info.

          Show
          Grant Ingersoll added a comment - - edited 1) What is the goal we want to achieve? Provide a first iteration of a geographical search entity to SOLR Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many. Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code. 2) What is the level of commitment, and road map of spatial solutions in lucene and solr? The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that without reinventing the wheel and shoe horn-ing it into lucene. (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used) We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it. I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex. On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know. Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well. Totally agree on the other points. Also very cool to see the benchmarking info.
          Hide
          patrick o'leary added a comment -

          On 1.2 LocalSolr has suffered from not being in the trunk of SOLR, it is popular and has successfully driven a lot of projects. But I have to put my hand up and say that I am it's biggest bottle neck in keep up to date with SOLR changes. And I think it would gain a lot from just the aspect of being current. Most of the changes that caused problems have been minor signature changes where any developer can resolve the issue, thus the 1 - many element really wins.

          Certainly improvements are always good, and there are plenty of ways to improve LocalSolr, but even at this stage, I've had to move the trunk of localsolr in SF forward to meet other needs. It would be good to centralize the development, even in a contrib manor. Thus working but open for improvement.

          On 2. GIS search can be defined in more ways than I can think of, the opengis consortium has a fairly large list of standards
          http://www.opengeospatial.org/standards/is
          LocalSolr supports only 1 set of those items, which is why I define localsolr as not a full GIS solution. It has a framework that
          can grow to be more.

          Show
          patrick o'leary added a comment - On 1.2 LocalSolr has suffered from not being in the trunk of SOLR, it is popular and has successfully driven a lot of projects. But I have to put my hand up and say that I am it's biggest bottle neck in keep up to date with SOLR changes. And I think it would gain a lot from just the aspect of being current. Most of the changes that caused problems have been minor signature changes where any developer can resolve the issue, thus the 1 - many element really wins. Certainly improvements are always good, and there are plenty of ways to improve LocalSolr, but even at this stage, I've had to move the trunk of localsolr in SF forward to meet other needs. It would be good to centralize the development, even in a contrib manor. Thus working but open for improvement. On 2. GIS search can be defined in more ways than I can think of, the opengis consortium has a fairly large list of standards http://www.opengeospatial.org/standards/is LocalSolr supports only 1 set of those items, which is why I define localsolr as not a full GIS solution. It has a framework that can grow to be more.
          Hide
          Uwe Schindler added a comment -

          Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code.

          I aggree with iterating about the patch and also LocalLucene (not only LocalSolr).

          On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know.

          Yes, and we tried solutions in the past that use unique doc ids to do joins between RDBMS used for geo search and Lucene used for the full text part. The biggest problem is, that this join operations are very inefficient if many documents are affected. Lucene as a full text engine has the great advantage to display the results very fast without retrieving the whole hits (you normally display only the best ranking ones). If you combine with data bases, you have to intersect the results in a HitCollector during filling the PriorityQueue. RDBMS have the problem to always have "transactions" around select statements and will only deliver the results, when the query is completely done. This puts an additional time lag. Doing the geo query completely in Lucene for our search in PANGAEA about a hundred of times faster in most cases (with TrieRange).

          Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well.

          I also agree about thinking to reimplement specific parts of the code, that may be done with "standard" Lucene/Solr tools (I would count TrieRange to that, even as it is not "standard" today - but its generic and not bound to geo and hopefully will move to Lucene Core as NumericRangeQuery & utils) easily.

          In my opinion, LocalLucene should be as generic as possible and should not add too many custom datatypes, specific index structures, fixed field names etc. A problem of most GIS solutions for relational databases available on the world is, that you are fixed to specific database schemas. E.g. for our search at PANGAEA, we want to display the results of the Lucene Query also as Map. But for that you cannot use common GIS solution, because they do not know how to extract the data from Lucene.

          Soon I will start a small project, to add a plugin to GeoServer's feature store, that does not use RDBMS or shape files or whatever for the features, instead use Lucene. Using that it may also be possible to retrieve the geo objects (in our case data sets with lat/lon) and display them in a WMS using OpenLayers, stream it to Google Earth using the Geoserver KML Streaming API (using TrieRange to support the bounding box filter) and so on.

          About your benchmarks:
          I suspect, that you have warmed up the readers, but I think you should get faster performace out of TrieRange. In my opinion, you should not use doubles for lat/lon, just use ints and scale the float lat/lon by multiply with 1E7 to get 7 decimal digits (which is surely enough for geo, 180*1E7 should be <Integer.MAX_VALUE, too).
          In general, the biggest speed improve of TrieRangecan be seen in comparison to other range queries, if the range contains a lot of distinct values and so hit many documents. E.g. you will also get 100 ms, if you do a search around the african continent where thousands of hits are in, each having a different lat/lon pair! How does LocalLucene behave with that?
          Because of this, I would implement the Tiers using tint or tfloat or whatever.

          Show
          Uwe Schindler added a comment - Agreed on the first, not 100% certain on the second. On the second, this issue is the gate keeper. If people reviewing the patch feel there are better ways to do things, then we should work through them before committing. What you are effectively seeing is an increase in the developers working on from 1 to many, it's just not on committed code. I aggree with iterating about the patch and also LocalLucene (not only LocalSolr). On the first point, I don't follow. Isn't LocalLucene and LocalSolr, just exactly a GIS search capability for Lucene/Solr? I'm not sure if I would categorize it as shoe-horning. There are many things that Lucene/Solr can power, GIS search with text is one of them. By committing this patch (or some variation), we are saying Solr is going to support it. Of course, there are other ways to do it, but that doesn't preclude it from L/S. The combination of text search plus GIS search is very powerful, as you know. Yes, and we tried solutions in the past that use unique doc ids to do joins between RDBMS used for geo search and Lucene used for the full text part. The biggest problem is, that this join operations are very inefficient if many documents are affected. Lucene as a full text engine has the great advantage to display the results very fast without retrieving the whole hits (you normally display only the best ranking ones). If you combine with data bases, you have to intersect the results in a HitCollector during filling the PriorityQueue. RDBMS have the problem to always have "transactions" around select statements and will only deliver the results, when the query is completely done. This puts an additional time lag. Doing the geo query completely in Lucene for our search in PANGAEA about a hundred of times faster in most cases (with TrieRange). Still, I think Yonik's main point is why reinvent the wheel when it comes to things like distributed search and the need for custom code for indexing, etc. when they likely can be handled through function queries and field types and therefore all of Solr's current functionality would just work. The other capabilities (like sorting by a FunctionQuery) is icing on the cake that helps solve other problems as well. I also agree about thinking to reimplement specific parts of the code, that may be done with "standard" Lucene/Solr tools (I would count TrieRange to that, even as it is not "standard" today - but its generic and not bound to geo and hopefully will move to Lucene Core as NumericRangeQuery & utils) easily. In my opinion, LocalLucene should be as generic as possible and should not add too many custom datatypes, specific index structures, fixed field names etc. A problem of most GIS solutions for relational databases available on the world is, that you are fixed to specific database schemas. E.g. for our search at PANGAEA, we want to display the results of the Lucene Query also as Map. But for that you cannot use common GIS solution, because they do not know how to extract the data from Lucene. Soon I will start a small project, to add a plugin to GeoServer's feature store, that does not use RDBMS or shape files or whatever for the features, instead use Lucene. Using that it may also be possible to retrieve the geo objects (in our case data sets with lat/lon) and display them in a WMS using OpenLayers, stream it to Google Earth using the Geoserver KML Streaming API (using TrieRange to support the bounding box filter) and so on. About your benchmarks: I suspect, that you have warmed up the readers, but I think you should get faster performace out of TrieRange. In my opinion, you should not use doubles for lat/lon, just use ints and scale the float lat/lon by multiply with 1E7 to get 7 decimal digits (which is surely enough for geo, 180*1E7 should be <Integer.MAX_VALUE, too). In general, the biggest speed improve of TrieRangecan be seen in comparison to other range queries, if the range contains a lot of distinct values and so hit many documents. E.g. you will also get 100 ms, if you do a search around the african continent where thousands of hits are in, each having a different lat/lon pair! How does LocalLucene behave with that? Because of this, I would implement the Tiers using tint or tfloat or whatever.
          Hide
          Norman Leutner added a comment -

          Hi,

          just a comment on the distance function.

          So breaking things down, it seems like we basically need to be able to:
          1) filter by a bounding box
          2) filter by a geo radius (impl could first get the bounding box and narrow within that...)
          3) sort by distance
          4) return the distance

          Due tue the surface distance on 0° latitude is 111.320 km per 1° longitude,
          on 90° latitude it is 0 km per 1° longitude, if you use a rectangle that does not include any sphere information,
          this would be very inaccurate.

          Instead (if not to load intense) some mathematical functions should be used here.
          For example you can calculate the distance between a given latitude and longitude and another position by calculating
          the radian measure between these two points using the angle to the earth middle.

          Show
          Norman Leutner added a comment - Hi, just a comment on the distance function. So breaking things down, it seems like we basically need to be able to: 1) filter by a bounding box 2) filter by a geo radius (impl could first get the bounding box and narrow within that...) 3) sort by distance 4) return the distance Due tue the surface distance on 0° latitude is 111.320 km per 1° longitude, on 90° latitude it is 0 km per 1° longitude, if you use a rectangle that does not include any sphere information, this would be very inaccurate. Instead (if not to load intense) some mathematical functions should be used here. For example you can calculate the distance between a given latitude and longitude and another position by calculating the radian measure between these two points using the angle to the earth middle.
          Hide
          Yonik Seeley added a comment -

          if you use a rectangle that does not include any sphere information, this would be very inaccurate.

          I've been really just commenting on what seemed to be the best way to hook into solr.... the interface, not the implementation.
          The bounding box filter would simply guarantee to contain all of the points of interest in an efficient manner (but could have some outside the specified radius as well to increase efficiency.

          Show
          Yonik Seeley added a comment - if you use a rectangle that does not include any sphere information, this would be very inaccurate. I've been really just commenting on what seemed to be the best way to hook into solr.... the interface, not the implementation. The bounding box filter would simply guarantee to contain all of the points of interest in an efficient manner (but could have some outside the specified radius as well to increase efficiency.
          Hide
          Chris A. Mattmann added a comment -

          From the user interface point of view, what's needed is:

          • A way to filter by a bounding box. This could simply be a custom QParser
            fq= {!gbox p=101.2,234.5 f=position, d=1.5}

            // a bounding box, centered on 101.2,234.5 including everything within 1.5 miles

          • A function query that calculates distances
            gdist(position,101.2,234.3)
          • A way to sort by a function query... this is generic desired functionality anyway!
          • A way to return the value of a function query for documents - also generic desired functionality. Perhaps use meta as proposed in SOLR-705?

          Yonik: clear description of what we need to do here, thanks for that. Myself and 3 collaborators at JPL (Paul Ramirez, Sean McCleese and Sean Hardman) are going to spend time this summer over the next few months to get some patches together than implement this architecture in order to generically support GIS search in SOLR. We have a large corpus of ocean data and lunar data over here at JPL that we'd like to get this working for.

          Thanks and more to come – soon

          Cheers,
          Chris

          Show
          Chris A. Mattmann added a comment - From the user interface point of view, what's needed is: A way to filter by a bounding box. This could simply be a custom QParser fq= {!gbox p=101.2,234.5 f=position, d=1.5} // a bounding box, centered on 101.2,234.5 including everything within 1.5 miles A function query that calculates distances gdist(position,101.2,234.3) A way to sort by a function query... this is generic desired functionality anyway! A way to return the value of a function query for documents - also generic desired functionality. Perhaps use meta as proposed in SOLR-705 ? Yonik: clear description of what we need to do here, thanks for that. Myself and 3 collaborators at JPL (Paul Ramirez, Sean McCleese and Sean Hardman) are going to spend time this summer over the next few months to get some patches together than implement this architecture in order to generically support GIS search in SOLR. We have a large corpus of ocean data and lunar data over here at JPL that we'd like to get this working for. Thanks and more to come – soon Cheers, Chris
          Hide
          Ryan McKinley added a comment -

          My apologies for being out of the loop.... going back to some Patrick's high level points... I agree with Grant (and by extension with most of Patrick's points. Our main issue now is how to move forward.

          We have a few options:

          1. Get whatever we can working and integrated ASAP and iterate from there.
          2. Make some core structural changes to Solr that will make integrating spatial stuff easier/cleaner. With this in place, we can then integrate.
          3. Hybrid 1&2 – get a spatial contrib working ASAP with the knowledge that most of it needs to be replaced/reworked as solr core evolves to better support it. We would probalby want to keep the spatial contrib out of the 1.4 release or mark it as "experimental" and subject to change without notice, etc etc

          I am partial to #3 so that we can point to concrete issues and have something to patch against.

          ---------

          The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that without reinventing the wheel and shoe horn-ing it into lucene. (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)

          of course, but solr should make this kind of integration easy. The beauty of open source is that we need to get a good foundation and the various implementation extensions can be contributed down the road.

          Show
          Ryan McKinley added a comment - My apologies for being out of the loop.... going back to some Patrick's high level points... I agree with Grant (and by extension with most of Patrick's points. Our main issue now is how to move forward. We have a few options: 1. Get whatever we can working and integrated ASAP and iterate from there. 2. Make some core structural changes to Solr that will make integrating spatial stuff easier/cleaner. With this in place, we can then integrate. 3. Hybrid 1&2 – get a spatial contrib working ASAP with the knowledge that most of it needs to be replaced/reworked as solr core evolves to better support it. We would probalby want to keep the spatial contrib out of the 1.4 release or mark it as "experimental" and subject to change without notice, etc etc I am partial to #3 so that we can point to concrete issues and have something to patch against. --------- The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that without reinventing the wheel and shoe horn-ing it into lucene. (e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used) of course, but solr should make this kind of integration easy. The beauty of open source is that we need to get a good foundation and the various implementation extensions can be contributed down the road.
          Hide
          Shekhar added a comment -

          Can someone please let me know where can I download latest spatial solr code from.

          Show
          Shekhar added a comment - Can someone please let me know where can I download latest spatial solr code from.
          Hide
          patrick o'leary added a comment -

          It's in lucene trunk under contrib/spatial

          Show
          patrick o'leary added a comment - It's in lucene trunk under contrib/spatial
          Hide
          Shekhar added a comment -

          I checkedout code from https://svn.apache.org/repos/asf/lucene/java/trunk/contrib/. It generates lucene-spatial-2.9-dev.jar.
          But I am looking for solr-spatial (org.apache.solr.spatial.tier package).

          Show
          Shekhar added a comment - I checkedout code from https://svn.apache.org/repos/asf/lucene/java/trunk/contrib/ . It generates lucene-spatial-2.9-dev.jar. But I am looking for solr-spatial (org.apache.solr.spatial.tier package).
          Hide
          patrick o'leary added a comment -

          That comes from this patch-
          This was an older port of localsolr to solr that's fallen behind and hasn't been maintained.
          I'll take a look at it and see see about getting it working

          Show
          patrick o'leary added a comment - That comes from this patch- This was an older port of localsolr to solr that's fallen behind and hasn't been maintained. I'll take a look at it and see see about getting it working
          Hide
          Shekhar added a comment - - edited

          Patrick,

          I finally got it compiled. Had to do some minor changes. But it is giving 0 results. I am trying to run localsolr and collapse together. But so far no luck.

          Request :
          http://localhost:9080/solr/select?indent=on&version=2.2&q=LCD&qt=geo&long=-74.418689&radius=500&lat=40.755677

          Following is my solrconfig :

          <requestHandler name="geo" class="org.apache.solr.handler.component.SearchHandler">
          <lst name="defaults">
          <str name="echoParams">explicit</str>
          <str name="defType">spatial_tier</str>
          <str name="fq">doctype:provider</str>
          </lst>
          <lst name="invariants">
          <str name="latField">lat</str>
          <str name="lngField">lng</str>
          <str name="distanceField">geo_distance</str>
          <str name="tierPrefix">tier</str>
          </lst>
          <arr name="components">
          <str>collapse</str>
          <str>geodistance</str>
          </arr>
          </requestHandler>

          =======================
          Response :

          int name="QTime">6</int>

          <lst name="params">
          <str name="lat">40.755677</str>
          <str name="radius">500</str>
          <str name="indent">on</str>
          <str name="q">LCD</str>
          <str name="qt">geo</str>
          <str name="long">-74.418689</str>
          <str name="version">2.2</str>
          </lst>
          </lst>
          <result name="response" numFound="0" start="0"/>
          </response>

          Show
          Shekhar added a comment - - edited Patrick, I finally got it compiled. Had to do some minor changes. But it is giving 0 results. I am trying to run localsolr and collapse together. But so far no luck. Request : http://localhost:9080/solr/select?indent=on&version=2.2&q=LCD&qt=geo&long=-74.418689&radius=500&lat=40.755677 Following is my solrconfig : <requestHandler name="geo" class="org.apache.solr.handler.component.SearchHandler"> <lst name="defaults"> <str name="echoParams">explicit</str> <str name="defType">spatial_tier</str> <str name="fq">doctype:provider</str> </lst> <lst name="invariants"> <str name="latField">lat</str> <str name="lngField">lng</str> <str name="distanceField">geo_distance</str> <str name="tierPrefix">tier</str> </lst> <arr name="components"> <str>collapse</str> <str>geodistance</str> </arr> </requestHandler> ======================= Response : int name="QTime">6</int> − <lst name="params"> <str name="lat">40.755677</str> <str name="radius">500</str> <str name="indent">on</str> <str name="q">LCD</str> <str name="qt">geo</str> <str name="long">-74.418689</str> <str name="version">2.2</str> </lst> </lst> <result name="response" numFound="0" start="0"/> </response>
          Hide
          Bill Bell added a comment -

          It seems to me, as an outsider, that this project is not being incorporated and just languishing. Patrick has some really cool code that is very useful. Why don't we just incorporate it as part of the build first? Get it in there "as is" into the 1.4 build?

          As ideas come up, we can track them separately in JIRA, and people can volunteer to fix them.

          A lot of people use this today, and it is being left in the dust.

          Bill

          Show
          Bill Bell added a comment - It seems to me, as an outsider, that this project is not being incorporated and just languishing. Patrick has some really cool code that is very useful. Why don't we just incorporate it as part of the build first? Get it in there "as is" into the 1.4 build? As ideas come up, we can track them separately in JIRA, and people can volunteer to fix them. A lot of people use this today, and it is being left in the dust. Bill
          Hide
          patrick o'leary added a comment -

          This is a dash and run comment, as I'm heading to the airport and am out of reach for a couple of weeks but:

          • Solr is in the run up to a 1.4 launch, I don't expect local / spatial solr to get into 1.4 at this stage.
          • This patch is out of date, it's a patch these things happen.
          • LocalSolr will continue in some format on sourceforge

          It's been there for over a year now playing catch up with both lucene and solr releases, and while I can't guarantee it will always
          have support I've done all that I can to bring in other engineers to help keep it going.

          It would be great to get local / spatial solr into solr, but I have no idea where in the road map function queries enhancements ( to provide localsolr's current feature's in a different format ) are for solr, or what priority they will be.

          For those reasons I cannot reasonably commit time to something that may / may not happen for who knows how long.
          But there are a multitude of components that folks are asking for on the localsolr side of things, and once I'm back I'll be posting
          a wish list to the localsolr community asking for features that folks would like.

          Again it will move things more out of date with solr, but there isn't much I can do about that.

          Show
          patrick o'leary added a comment - This is a dash and run comment, as I'm heading to the airport and am out of reach for a couple of weeks but: Solr is in the run up to a 1.4 launch, I don't expect local / spatial solr to get into 1.4 at this stage. This patch is out of date, it's a patch these things happen. LocalSolr will continue in some format on sourceforge It's been there for over a year now playing catch up with both lucene and solr releases, and while I can't guarantee it will always have support I've done all that I can to bring in other engineers to help keep it going. It would be great to get local / spatial solr into solr, but I have no idea where in the road map function queries enhancements ( to provide localsolr's current feature's in a different format ) are for solr, or what priority they will be. For those reasons I cannot reasonably commit time to something that may / may not happen for who knows how long. But there are a multitude of components that folks are asking for on the localsolr side of things, and once I'm back I'll be posting a wish list to the localsolr community asking for features that folks would like. Again it will move things more out of date with solr, but there isn't much I can do about that.
          Hide
          Chris Male added a comment - - edited

          I have just added a patch which adds support to Solr for the multi-threaded spatial search I've added in LUCENE-1732 (Note, I have attached the jar built using the code in the Lucene issue). The performance improvements made by the multi-threaded search reduces the time taken to filter 1.2 million documents from 3s to between 500-800ms.

          In addition to the support for the improved spatial search, I have changed the query syntax supported by Solr for spatial searches. The syntax now uses local params which contain any information specific to a spatial search. An example of a search using the new syntax is:

          q=

          {!spatial_tier lat=50.0 long=4.0 radius=10}

          :

          Also as part of the patch, I have removed the need for a specific DistanceCalcuatingComponent by changing the query produced by the SpatialTierQueryParserPlugin to a FilteredQuery, and by introducing the notion of a FieldValueSource.

          FieldValueSources, which can be registered with the new FieldValueSourceRegistry, are used to add arbitrary information to documents as they are being written by ResponseWriters. Hence a DistanceFieldValueSource is created and registered by the SpatialTierQueryParserPlugin so that the distances calculated during the spatial search can be added the resulting documents. This removes the need to do the adding of the distances in a special component. A useful feature of the FieldValueSources is that they can be controlled through fl request parameter. This means that for spatial search, the distances calculated do not necessary have to be included in the response.

          The final contribution of the patch is since the new spatial search uses multiple-threads through an ExecutorService, it is necessary for Solr to have an ExecutorService that can be configured and managed. Consequently the patch includes support for defining an ExecutorService in the solrconfig.xml. The ExecutorService is then cleaned up when the SolrCore it belongs to, is closed.

          I am intending on creating an example configuration over the next few days, which will also include some example data.

          Show
          Chris Male added a comment - - edited I have just added a patch which adds support to Solr for the multi-threaded spatial search I've added in LUCENE-1732 (Note, I have attached the jar built using the code in the Lucene issue). The performance improvements made by the multi-threaded search reduces the time taken to filter 1.2 million documents from 3s to between 500-800ms. In addition to the support for the improved spatial search, I have changed the query syntax supported by Solr for spatial searches. The syntax now uses local params which contain any information specific to a spatial search. An example of a search using the new syntax is: q= {!spatial_tier lat=50.0 long=4.0 radius=10} : Also as part of the patch, I have removed the need for a specific DistanceCalcuatingComponent by changing the query produced by the SpatialTierQueryParserPlugin to a FilteredQuery, and by introducing the notion of a FieldValueSource. FieldValueSources, which can be registered with the new FieldValueSourceRegistry, are used to add arbitrary information to documents as they are being written by ResponseWriters. Hence a DistanceFieldValueSource is created and registered by the SpatialTierQueryParserPlugin so that the distances calculated during the spatial search can be added the resulting documents. This removes the need to do the adding of the distances in a special component. A useful feature of the FieldValueSources is that they can be controlled through fl request parameter. This means that for spatial search, the distances calculated do not necessary have to be included in the response. The final contribution of the patch is since the new spatial search uses multiple-threads through an ExecutorService, it is necessary for Solr to have an ExecutorService that can be configured and managed. Consequently the patch includes support for defining an ExecutorService in the solrconfig.xml. The ExecutorService is then cleaned up when the SolrCore it belongs to, is closed. I am intending on creating an example configuration over the next few days, which will also include some example data.
          Hide
          Noble Paul added a comment -

          I am not going to comment on the "spatialo search" part of this. Let us not keep the ExecutorService in the SolrConfig. SolrConfig is just a place where confiurations are parsed. SolrCore can create and keep the ExecutorService.

          There is already another threadpool Executor maintained for distributed search. That one does not require any configuration and it uses some defaults (but it is useful to have some confugurability there). It makes sense to maintain one global threadpool at the core level which everyone component should use.

          Show
          Noble Paul added a comment - I am not going to comment on the "spatialo search" part of this. Let us not keep the ExecutorService in the SolrConfig. SolrConfig is just a place where confiurations are parsed. SolrCore can create and keep the ExecutorService. There is already another threadpool Executor maintained for distributed search. That one does not require any configuration and it uses some defaults (but it is useful to have some confugurability there). It makes sense to maintain one global threadpool at the core level which everyone component should use.
          Hide
          Uri Boness added a comment -

          I guess it is possible to configure the executor service via the configuration of the query parser. That said, having a way to configure executor services in solr config will eliminate some code duplication. I don't think it's a good practice to have on executor service for all components to use - the last thing you want is to have component depend on each other in terms of "race conditions" over threads. I think it is better to fine tune each component with a thread pool of its own.

          Show
          Uri Boness added a comment - I guess it is possible to configure the executor service via the configuration of the query parser. That said, having a way to configure executor services in solr config will eliminate some code duplication. I don't think it's a good practice to have on executor service for all components to use - the last thing you want is to have component depend on each other in terms of "race conditions" over threads. I think it is better to fine tune each component with a thread pool of its own.
          Hide
          Noble Paul added a comment -

          the last thing you want is to have component depend on each other in terms of "race conditions" over threads.

          The fact that each thread is going to compete for the same CPU resources ,I guess, it should not be a problem. If necessary we can take this discussion in a separate issue . If we discuss this here , it may take away the focus from this one .

          Show
          Noble Paul added a comment - the last thing you want is to have component depend on each other in terms of "race conditions" over threads. The fact that each thread is going to compete for the same CPU resources ,I guess, it should not be a problem. If necessary we can take this discussion in a separate issue . If we discuss this here , it may take away the focus from this one .
          Hide
          Chris Male added a comment -

          I have now attached an example configuration for the spatial search patch I added. It contains some sample documents and the lucene spatial search jar that my patch is designed to integrate with.

          Show
          Chris Male added a comment - I have now attached an example configuration for the spatial search patch I added. It contains some sample documents and the lucene spatial search jar that my patch is designed to integrate with.
          Hide
          Grant Ingersoll added a comment -

          Spawning off some of Yonik's ideas into separate issues so they can be dealt with one at a time.

          Show
          Grant Ingersoll added a comment - Spawning off some of Yonik's ideas into separate issues so they can be dealt with one at a time.
          Hide
          Ryan McKinley added a comment -

          SOLR-705 is a sketch for how we may add arbitrary metadata to returned documents.

          Show
          Ryan McKinley added a comment - SOLR-705 is a sketch for how we may add arbitrary metadata to returned documents.
          Hide
          Ryan McKinley added a comment -

          SOLR-1131 offers a way that a single field type could write to multiple fields.

          Show
          Ryan McKinley added a comment - SOLR-1131 offers a way that a single field type could write to multiple fields.
          Hide
          Ryan McKinley added a comment -

          Wow, this issue just keeps growing! We need to figure the best way to move forward that will keep things clean and have the flexibility to enable the wide rage of spatial features we all want. As noted earlier, I hope we can come up with a simple interface that could support various strategies - including: trie, cartesian tier, geohash, rtree, jts/geotools etc, etc.

          To get things going, it seems the biggest hurdles are getting the solr framework to support some basic wiring to make these things possible. As Yonik pointed out before, this comes down to a few core features:

          • SOLR-1131 – FieldType should be able to write multiple fields (consider WKT -> many fields)
          • SOLR-1298 – Add function query calculation to result
          • SOLR-705 – Attach arbitrary metadata (distance) to the results
          • A way to sort by a function query

          With this, geosearch could be implemented with:

          • Custom QParser like: fq= {!gbox p=101.2,234.5 f=position, d=1.5}

            // a bounding box, centered on 101.2,234.5 including everything within 1.5 miles

          • A function query that calculates distances gdist(position,101.2,234.3) (may need to share data with the Query/Filter)
          Show
          Ryan McKinley added a comment - Wow, this issue just keeps growing! We need to figure the best way to move forward that will keep things clean and have the flexibility to enable the wide rage of spatial features we all want. As noted earlier, I hope we can come up with a simple interface that could support various strategies - including: trie, cartesian tier, geohash, rtree, jts/geotools etc, etc. To get things going, it seems the biggest hurdles are getting the solr framework to support some basic wiring to make these things possible. As Yonik pointed out before, this comes down to a few core features: SOLR-1131 – FieldType should be able to write multiple fields (consider WKT -> many fields) SOLR-1298 – Add function query calculation to result SOLR-705 – Attach arbitrary metadata (distance) to the results A way to sort by a function query With this, geosearch could be implemented with: Custom QParser like: fq= {!gbox p=101.2,234.5 f=position, d=1.5} // a bounding box, centered on 101.2,234.5 including everything within 1.5 miles A function query that calculates distances gdist(position,101.2,234.3) (may need to share data with the Query/Filter)
          Hide
          Uri Boness added a comment -

          Ryan, you should really have a look at the patch Chris added as it already tackles a few of the requirements you listed:

          • The FieldValueSource is an abstraction that can be used to add "dynamic" fields to the returned docs. I think this approach is the most flexible and can be used as a starting point. I'm still not sure whether it should add fields to the docs or some sort of a meta data information, but for both approaches the mechanism can stay the same (if meta data approach is chosen then I guess it can be renamed to MetaDataSource instead)
          • A distance calculation abstraction was already added in the form of GeoDistanceCalculator interface (there are currently two implementation, but a third one can easly be added based on JTS) . I agree there might be other abstractions that one would want to use.
          • The query parser is already there. The only thing is that right now it differs a bit from the syntax you suggested... it's more in the form q= {!spatial lat=XXX lng=YYY radius=10 calc=arc unit=km}

            .

          Show
          Uri Boness added a comment - Ryan, you should really have a look at the patch Chris added as it already tackles a few of the requirements you listed: The FieldValueSource is an abstraction that can be used to add "dynamic" fields to the returned docs. I think this approach is the most flexible and can be used as a starting point. I'm still not sure whether it should add fields to the docs or some sort of a meta data information, but for both approaches the mechanism can stay the same (if meta data approach is chosen then I guess it can be renamed to MetaDataSource instead) A distance calculation abstraction was already added in the form of GeoDistanceCalculator interface (there are currently two implementation, but a third one can easly be added based on JTS) . I agree there might be other abstractions that one would want to use. The query parser is already there. The only thing is that right now it differs a bit from the syntax you suggested... it's more in the form q= {!spatial lat=XXX lng=YYY radius=10 calc=arc unit=km} .
          Hide
          Grant Ingersoll added a comment - - edited

          The thing I keep coming back to is Yonik and Ryan's comments that most of this stuff need not require any custom work at all other than fixing things in Solr that prevent it from using existing capabilities I'd much rather see work done there than more work done customizing "spatial" code. At a minimum, implementing a FunctionQuery for Great Circle distance (and others), adding sort by function and pseudo-fields, those kinds of things and then maybe working on Ryan's FieldType ideas. It seems like none of those, other than FieldTypes, require custom components, right?

          Show
          Grant Ingersoll added a comment - - edited The thing I keep coming back to is Yonik and Ryan's comments that most of this stuff need not require any custom work at all other than fixing things in Solr that prevent it from using existing capabilities I'd much rather see work done there than more work done customizing "spatial" code. At a minimum, implementing a FunctionQuery for Great Circle distance (and others), adding sort by function and pseudo-fields, those kinds of things and then maybe working on Ryan's FieldType ideas. It seems like none of those, other than FieldTypes, require custom components, right?
          Hide
          Chris Male added a comment -

          From my experience more than just a FunctionQuery is required for LocalSolr to be efficient. Without the Cartesian tier information that is added by the UpdateProcessor, you will have to calculate the distance for every single document in our index. The great circle distance calculations are actually quite expensive and when multiplied by say 1 million documents, the query time will become around 2 or 3 seconds. If you then repeat the calculation again for sorting on distance, then the time will be even worse. Therefore it seems necessary to include some way to reduce the number of distance calculations that are done.

          Show
          Chris Male added a comment - From my experience more than just a FunctionQuery is required for LocalSolr to be efficient. Without the Cartesian tier information that is added by the UpdateProcessor, you will have to calculate the distance for every single document in our index. The great circle distance calculations are actually quite expensive and when multiplied by say 1 million documents, the query time will become around 2 or 3 seconds. If you then repeat the calculation again for sorting on distance, then the time will be even worse. Therefore it seems necessary to include some way to reduce the number of distance calculations that are done.
          Hide
          Shalin Shekhar Mangar added a comment -

          Marking for 1.5

          Show
          Shalin Shekhar Mangar added a comment - Marking for 1.5
          Hide
          Brad Giaccio added a comment -

          I'm going to have to disagree with Chris's asserion that more than a functionQuery is needed, I have a functionQuery that simply starts by getting a TermEnum that starts with the minimum latitude that can possibily match your spatial extent, and exits when it gets to the max lat. This way I take advantage of the lexical searching of the strings, and then only have to compute distances for things that are in the box.

          This code runs at sub second on a shard of 12 million documents, actually its subsecond hitting 8 shards of 12mil each.

          Just a thought? If interested I have a searchComponent that makes use of this filter I can attach

          Show
          Brad Giaccio added a comment - I'm going to have to disagree with Chris's asserion that more than a functionQuery is needed, I have a functionQuery that simply starts by getting a TermEnum that starts with the minimum latitude that can possibily match your spatial extent, and exits when it gets to the max lat. This way I take advantage of the lexical searching of the strings, and then only have to compute distances for things that are in the box. This code runs at sub second on a shard of 12 million documents, actually its subsecond hitting 8 shards of 12mil each. Just a thought? If interested I have a searchComponent that makes use of this filter I can attach
          Hide
          Grant Ingersoll added a comment -

          I think the sum take away here to Ryan's point that there are multiple ways to do this. Cartesian tier is useful, as are other approaches, let's just not re-invent the wheel if we don't have to.

          Show
          Grant Ingersoll added a comment - I think the sum take away here to Ryan's point that there are multiple ways to do this. Cartesian tier is useful, as are other approaches, let's just not re-invent the wheel if we don't have to.
          Hide
          Chris Male added a comment -

          Brad,

          Would you be willing to add your FunctionQuery to the issue?

          Show
          Chris Male added a comment - Brad, Would you be willing to add your FunctionQuery to the issue?
          Hide
          Padraic Hannon added a comment -

          I realize this is marked for inclusion for 1.5, however, does the group feel that the patches here are ready to be used on 1.4 or should one stick with the LocalSolr project as found on Sourceforge? And if so should one then use 1.3 instead of 1.4?

          Any input would be greatly appreciated, and if this is the wrong forum to ask such a question please remove the comment

          aloha
          Padraic Hannon

          Show
          Padraic Hannon added a comment - I realize this is marked for inclusion for 1.5, however, does the group feel that the patches here are ready to be used on 1.4 or should one stick with the LocalSolr project as found on Sourceforge? And if so should one then use 1.3 instead of 1.4? Any input would be greatly appreciated, and if this is the wrong forum to ask such a question please remove the comment aloha Padraic Hannon
          Hide
          Chris Male added a comment -

          Hi Padriac,

          Most of these patches, particularly the latest ones, are built against Solr 1.4 therefore I recommend you use this version instead of 1.3. I wouldn't recommend you use LocalSolr from SourceForge as it does not seem as though it has been updated recently.

          Show
          Chris Male added a comment - Hi Padriac, Most of these patches, particularly the latest ones, are built against Solr 1.4 therefore I recommend you use this version instead of 1.3. I wouldn't recommend you use LocalSolr from SourceForge as it does not seem as though it has been updated recently.
          Hide
          patrick o'leary added a comment -

          Chris / Padraic
          I have to disagree -
          A patch is not an adequate way to maintain software for a company.

          If you have something small, and you don't mind the bleeding edge software, then go ahead and use this.
          But if you need stability, then use a completed piece of software such as localsolr.

          Show
          patrick o'leary added a comment - Chris / Padraic I have to disagree - A patch is not an adequate way to maintain software for a company. If you have something small, and you don't mind the bleeding edge software, then go ahead and use this. But if you need stability, then use a completed piece of software such as localsolr.
          Hide
          Vincent Yeung added a comment -

          I noticed that the current implementation only stores individual points per document, is there any plans to store a bounding box per document. This would be useful where complex geometries can be implemented by allowing lucene/solr to do the heavy lifting, filtering by the bounding box and use JTS to complete the more complicated spatial comparisons. (JTS will handle all your WKTs with ease too).

          Show
          Vincent Yeung added a comment - I noticed that the current implementation only stores individual points per document, is there any plans to store a bounding box per document. This would be useful where complex geometries can be implemented by allowing lucene/solr to do the heavy lifting, filtering by the bounding box and use JTS to complete the more complicated spatial comparisons. (JTS will handle all your WKTs with ease too).
          Hide
          Sean McCleese added a comment -

          Myself, Chris Mattmann (commented above), Faranak Davoodi and some others at JPL are currently working on integrating JTS for just this purpose. We're looking at closely tying it into Chris Male's patches as posted above, and we've been communicating with him and Ryan McKinley about this process.

          Right now we're focusing on how to tie JTS into the process as Vincent mentions, without requiring it to do all the filtering, as the speed hit there would be pretty intense. Right now I'm thinking of basically coopting the local lucene calls in Chris Male's approach and siphoning off the gbox related ones to JTS. This might also allow for more complex geodetic functions (like swath data and such) down the line.

          Show
          Sean McCleese added a comment - Myself, Chris Mattmann (commented above), Faranak Davoodi and some others at JPL are currently working on integrating JTS for just this purpose. We're looking at closely tying it into Chris Male's patches as posted above, and we've been communicating with him and Ryan McKinley about this process. Right now we're focusing on how to tie JTS into the process as Vincent mentions, without requiring it to do all the filtering, as the speed hit there would be pretty intense. Right now I'm thinking of basically coopting the local lucene calls in Chris Male's approach and siphoning off the gbox related ones to JTS. This might also allow for more complex geodetic functions (like swath data and such) down the line.
          Hide
          Bill Bell added a comment -

          OK, I need some sort of distance from lat long being returned in the results. I also need a sort=distance...

          Patrick: I cannot get you current localsolr to work with locallucene trunk. Do you have a copy that works?

          Thanks.

          Show
          Bill Bell added a comment - OK, I need some sort of distance from lat long being returned in the results. I also need a sort=distance... Patrick: I cannot get you current localsolr to work with locallucene trunk. Do you have a copy that works? Thanks.
          Hide
          Bill Bell added a comment -

          So right now I cannot get a clean build with LOCALSOLR and SOLR trunk. If I take Patrick's latest and copy the lucene lib and his localsolr.jar I get:

          INFO: [core0] webapp=/solr path=/admin/ping params={} status=0 QTime=54
          Sep 6, 2009 8:05:27 PM org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder getBoxShape
          INFO: Best Fit is : 10
          Sep 6, 2009 8:05:29 PM org.apache.solr.common.SolrException log
          SEVERE: java.lang.AbstractMethodError
          at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:437)
          at org.apache.solr.search.DocSetDelegateCollector.setNextReader(DocSetHitCollector.java:140)
          at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:251)
          at org.apache.lucene.search.Searcher.search(Searcher.java:173)
          at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101)
          at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880)
          at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341)
          at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1302)
          at com.pjaol.search.solr.component.LocalSolrQueryComponent.process(LocalSolrQueryComponent.java:300)
          at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195)
          at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131)
          at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299)
          at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338)
          at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241)
          at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235)
          at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206)
          at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233)
          at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191)
          at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128)
          at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102)
          at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109)
          at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286)
          at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857)
          at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565)
          at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509)
          at java.lang.Thread.run(Thread.java:619)

          Any ideas on this one? Maybe bring in something that is missing from Lucene?

          Maybe we should have 2 issues:

          • Working with Patrick's code and SOLR.
          • Getting new code to work with distance and sorting.
          Show
          Bill Bell added a comment - So right now I cannot get a clean build with LOCALSOLR and SOLR trunk. If I take Patrick's latest and copy the lucene lib and his localsolr.jar I get: INFO: [core0] webapp=/solr path=/admin/ping params={} status=0 QTime=54 Sep 6, 2009 8:05:27 PM org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder getBoxShape INFO: Best Fit is : 10 Sep 6, 2009 8:05:29 PM org.apache.solr.common.SolrException log SEVERE: java.lang.AbstractMethodError at org.apache.lucene.search.TopFieldCollector$MultiComparatorNonScoringCollector.setNextReader(TopFieldCollector.java:437) at org.apache.solr.search.DocSetDelegateCollector.setNextReader(DocSetHitCollector.java:140) at org.apache.lucene.search.IndexSearcher.search(IndexSearcher.java:251) at org.apache.lucene.search.Searcher.search(Searcher.java:173) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSetNC(SolrIndexSearcher.java:1101) at org.apache.solr.search.SolrIndexSearcher.getDocListC(SolrIndexSearcher.java:880) at org.apache.solr.search.SolrIndexSearcher.search(SolrIndexSearcher.java:341) at org.apache.solr.search.SolrIndexSearcher.getDocListAndSet(SolrIndexSearcher.java:1302) at com.pjaol.search.solr.component.LocalSolrQueryComponent.process(LocalSolrQueryComponent.java:300) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:195) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1299) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:286) at org.apache.coyote.http11.Http11AprProcessor.process(Http11AprProcessor.java:857) at org.apache.coyote.http11.Http11AprProtocol$Http11ConnectionHandler.process(Http11AprProtocol.java:565) at org.apache.tomcat.util.net.AprEndpoint$Worker.run(AprEndpoint.java:1509) at java.lang.Thread.run(Thread.java:619) Any ideas on this one? Maybe bring in something that is missing from Lucene? Maybe we should have 2 issues: Working with Patrick's code and SOLR. Getting new code to work with distance and sorting.
          Hide
          Mark Miller added a comment -

          setNextReader had an unused param dropped: numSlotsFull or something. You can just remove it and you should at least get by that one.

          Show
          Mark Miller added a comment - setNextReader had an unused param dropped: numSlotsFull or something. You can just remove it and you should at least get by that one.
          Hide
          patrick o'leary added a comment -

          I'm holding off updating localsolr on SF until SOLR 1.4 comes out.
          There's a lot of flux right now, and I'm not maintaing a patch.

          There was a June version of solr-1.4-dev version I made available on http://www.nsshutdown.com/solr-example.tgz

          Once 1.4 comes out with a stabilized interface I'll adopt it, and re-release

          Show
          patrick o'leary added a comment - I'm holding off updating localsolr on SF until SOLR 1.4 comes out. There's a lot of flux right now, and I'm not maintaing a patch. There was a June version of solr-1.4-dev version I made available on http://www.nsshutdown.com/solr-example.tgz Once 1.4 comes out with a stabilized interface I'll adopt it, and re-release
          Hide
          Noble Paul added a comment -

          hi , For everyone who has not followed everything in the list.

          As I see it , we have a workable solution now (correct me if I am wrong) . What is preventing us from committing this (after 1.4 of course)?

          Show
          Noble Paul added a comment - hi , For everyone who has not followed everything in the list. As I see it , we have a workable solution now (correct me if I am wrong) . What is preventing us from committing this (after 1.4 of course)?
          Hide
          Chris Male added a comment -

          Which solution are you referring to? My patch is a little out of sync with the latest Spatial Lucene code but updating is very easy.

          Show
          Chris Male added a comment - Which solution are you referring to? My patch is a little out of sync with the latest Spatial Lucene code but updating is very easy.
          Hide
          Noble Paul added a comment -

          I checked out the trunk code from SF

          Show
          Noble Paul added a comment - I checked out the trunk code from SF
          Hide
          Bill Bell added a comment - - edited

          Chris: Can you add the distance injection from lat long or the sort=distance? The sort=distance appears more difficult. I could probably just loop through the results and get the distance by doing a simple geo spatial calculation, but the sorting needs to be in your patch.

          Noble: Ideas on the best way to add the sorting? Local Lucene has functions for sorting.... Not sure how to expose them.

          Thanks!

          Show
          Bill Bell added a comment - - edited Chris: Can you add the distance injection from lat long or the sort=distance? The sort=distance appears more difficult. I could probably just loop through the results and get the distance by doing a simple geo spatial calculation, but the sorting needs to be in your patch. Noble: Ideas on the best way to add the sorting? Local Lucene has functions for sorting.... Not sure how to expose them. Thanks!
          Hide
          Bill Bell added a comment -

          Brad,

          Were you able to complete your patch?

          You commented:
          Brad Giaccio added a comment - 12/Aug/09 04:08 PM
          I'm going to have to disagree with Chris's asserion that more than a functionQuery is needed, I have a functionQuery that simply starts by getting a TermEnum that starts with the minimum latitude that can possibily match your spatial extent, and exits when it gets to the max lat. This way I take advantage of the lexical searching of the strings, and then only have to compute distances for things that are in the box.

          This code runs at sub second on a shard of 12 million documents, actually its subsecond hitting 8 shards of 12mil each.

          Just a thought? If interested I have a searchComponent that makes use of this filter I can attach

          Show
          Bill Bell added a comment - Brad, Were you able to complete your patch? You commented: Brad Giaccio added a comment - 12/Aug/09 04:08 PM I'm going to have to disagree with Chris's asserion that more than a functionQuery is needed, I have a functionQuery that simply starts by getting a TermEnum that starts with the minimum latitude that can possibily match your spatial extent, and exits when it gets to the max lat. This way I take advantage of the lexical searching of the strings, and then only have to compute distances for things that are in the box. This code runs at sub second on a shard of 12 million documents, actually its subsecond hitting 8 shards of 12mil each. Just a thought? If interested I have a searchComponent that makes use of this filter I can attach
          Hide
          Brad Giaccio added a comment -

          Sorry for the long delay on this, I had to get approval to submit it.

          Basically this code adds a new search component, that uses a field identified in solrconfig.xml for searching. It does a circular search based on min/max radius. The component also handles searching across shards, provided

          In the next few weeks I've been tasked to also do bounding box searches (i.e. find all documents that fall inside of a box defined by nw and se corners).

          Hope this helps someone. Let me know if you have questions or can't get it to build.

          Brad

          Show
          Brad Giaccio added a comment - Sorry for the long delay on this, I had to get approval to submit it. Basically this code adds a new search component, that uses a field identified in solrconfig.xml for searching. It does a circular search based on min/max radius. The component also handles searching across shards, provided In the next few weeks I've been tasked to also do bounding box searches (i.e. find all documents that fall inside of a box defined by nw and se corners). Hope this helps someone. Let me know if you have questions or can't get it to build. Brad
          Hide
          Gijs Kunze added a comment -

          I've written a Solr plugin which uses a field with the computed hilbert space filling curve to cluster resulting documents so they can be efficiently placed on a google map control. Basically given a precision and a southwest lat/lng and northeast lat/lng bounding box it returns a group of clusters with an exact lat/lng location, a bounding box for all the documents in the cluster and the count of the number of documents in that cluster. Depending on settings given to the application (number of results in docset and/or size of the requested bounding box) it will instead to return the list of documents so that when you're zoomed in far enough the clusters transform into actual distinct documents.

          My implementation is very specific to our website and is not generally applicable:

          • The calculation of the hilbert space filling curve value is done by our index-script
          • Several field names are hardcoded
          • It uses a hardcoded precision for the hilbert value (30 bits)
          • It still uses highly inefficient methods for some actions (it stores the value in a sint field instead of a trie int as I was waiting for Solr 1.4 to be released before continuing working on the plugin, but now I'll have to find/make the time)

          I think LocalSolr would really benefit from something like this as I think when you're storing geographic data displaying it on a map (whether it be google maps, bing maps, open streetview or whatever) is something a lot of people will want to do (and I love full faceted browsing on a map).

          My implementation can be seen running on: http://www.mysecondhome.co.uk/search.html?view=map (It's not perfect, there are small bugs but in general it works fast enough on our dataset)

          Show
          Gijs Kunze added a comment - I've written a Solr plugin which uses a field with the computed hilbert space filling curve to cluster resulting documents so they can be efficiently placed on a google map control. Basically given a precision and a southwest lat/lng and northeast lat/lng bounding box it returns a group of clusters with an exact lat/lng location, a bounding box for all the documents in the cluster and the count of the number of documents in that cluster. Depending on settings given to the application (number of results in docset and/or size of the requested bounding box) it will instead to return the list of documents so that when you're zoomed in far enough the clusters transform into actual distinct documents. My implementation is very specific to our website and is not generally applicable: The calculation of the hilbert space filling curve value is done by our index-script Several field names are hardcoded It uses a hardcoded precision for the hilbert value (30 bits) It still uses highly inefficient methods for some actions (it stores the value in a sint field instead of a trie int as I was waiting for Solr 1.4 to be released before continuing working on the plugin, but now I'll have to find/make the time) I think LocalSolr would really benefit from something like this as I think when you're storing geographic data displaying it on a map (whether it be google maps, bing maps, open streetview or whatever) is something a lot of people will want to do (and I love full faceted browsing on a map). My implementation can be seen running on: http://www.mysecondhome.co.uk/search.html?view=map (It's not perfect, there are small bugs but in general it works fast enough on our dataset)
          Hide
          Bill Bell added a comment -

          Patrick,

          Now that 1.4 is out, what are we looking at to get your LOCALSOLR to work with it?

          Thanks in advance.

          Show
          Bill Bell added a comment - Patrick, Now that 1.4 is out, what are we looking at to get your LOCALSOLR to work with it? Thanks in advance.
          Hide
          Grant Ingersoll added a comment -

          Hi Bill (and everyone else),

          I'm working on bits and pieces of this, as are others. I don't think there will be one monolithic patch called "Local Solr" at this point as the donated LocalSolr solves one particular spatial problem in one particular way. I already added in distance function queries (see SOLR-1302) and am now working on a QParserPlugin that will produce CartesianTier filters, possibly reusing what is in contrib/spatial from Lucene, although I am not totally sold on what is in their just yet either, implementation wise. It may require some cleanup as well to be more generic and use newer Lucene capabilities. Basically, I am executing on what Yonik, Ryan and I laid out around https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12733900&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12733900, https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12703259&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12703259 and https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12631963&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12631963. This result should be a much better Solr system overall, with the side effect being we can now support spatial search.

          As it stands now, I've been doing searches w/ filtering using the dist() and hsin() methods in conjunction with Solr's frange functionality (see http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/) and that seems to be working quite well.

          Other than that, I think the pieces that make up what is needed for spatial search are now being tracked through the various dependent JIRA issues listed above. I am going to keep this issue open as a way of tracking all the bits and pieces that go into making Solr do spatial work. Once I feel they are ready, then I will update here.

          Show
          Grant Ingersoll added a comment - Hi Bill (and everyone else), I'm working on bits and pieces of this, as are others. I don't think there will be one monolithic patch called "Local Solr" at this point as the donated LocalSolr solves one particular spatial problem in one particular way. I already added in distance function queries (see SOLR-1302 ) and am now working on a QParserPlugin that will produce CartesianTier filters, possibly reusing what is in contrib/spatial from Lucene, although I am not totally sold on what is in their just yet either, implementation wise. It may require some cleanup as well to be more generic and use newer Lucene capabilities. Basically, I am executing on what Yonik, Ryan and I laid out around https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12733900&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12733900 , https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12703259&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12703259 and https://issues.apache.org/jira/browse/SOLR-773?focusedCommentId=12631963&page=com.atlassian.jira.plugin.system.issuetabpanels%3Acomment-tabpanel#action_12631963 . This result should be a much better Solr system overall, with the side effect being we can now support spatial search. As it stands now, I've been doing searches w/ filtering using the dist() and hsin() methods in conjunction with Solr's frange functionality (see http://www.lucidimagination.com/blog/2009/07/06/ranges-over-functions-in-solr-14/ ) and that seems to be working quite well. Other than that, I think the pieces that make up what is needed for spatial search are now being tracked through the various dependent JIRA issues listed above. I am going to keep this issue open as a way of tracking all the bits and pieces that go into making Solr do spatial work. Once I feel they are ready, then I will update here.
          Hide
          Grant Ingersoll added a comment -

          Not a blocker, but may be useful.

          Show
          Grant Ingersoll added a comment - Not a blocker, but may be useful.
          Hide
          patrick o'leary added a comment -

          11/21/09 21:00 PDT
          patrick o'leary
          to locallucene-users, locallucene-developers

          Folks

          I've updated localsolr to work with solr-1.4 release, also works with solr-1.5?? nightly as of 11/21/09

          There are a couple of changes needed to upgrade to this version.

          1) schema.xml has to be updated
          lat / long fields and dynamic field _localTier* has to be updated to type="tdouble"

          2) your index has to be rebuilt from scratch.

          This is not ideal, but unfortunately numeric util updates in lucene force us down this path.

          As always I've put a batteries included demo on http://www.nsshutdown.com/solr-example.tgz

          Thanks
          Patrick

          Show
          patrick o'leary added a comment - 11/21/09 21:00 PDT patrick o'leary to locallucene-users, locallucene-developers Folks I've updated localsolr to work with solr-1.4 release, also works with solr-1.5?? nightly as of 11/21/09 There are a couple of changes needed to upgrade to this version. 1) schema.xml has to be updated lat / long fields and dynamic field _localTier* has to be updated to type="tdouble" 2) your index has to be rebuilt from scratch. This is not ideal, but unfortunately numeric util updates in lucene force us down this path. As always I've put a batteries included demo on http://www.nsshutdown.com/solr-example.tgz Thanks Patrick
          Hide
          Benoît Terradillos added a comment -

          Hello folks,

          thanks a lot for your job on this issue, it is very useful for the project I'm working on (geo-localisation of cultural events in the french part of Switzerland).

          special thanks to patrick o'leary for your last post! I had expected this update since several weeks as I'm updating solr to version 1.4 and the version I was using didn't work anymore.

          I made modifications to your code to allow configuration of tierPrefix and distanceField values in the solrconfig (like what has been added to the spatial-solr project). Do you want to have my modifications? May I commit them?

          Show
          Benoît Terradillos added a comment - Hello folks, thanks a lot for your job on this issue, it is very useful for the project I'm working on (geo-localisation of cultural events in the french part of Switzerland). special thanks to patrick o'leary for your last post! I had expected this update since several weeks as I'm updating solr to version 1.4 and the version I was using didn't work anymore. I made modifications to your code to allow configuration of tierPrefix and distanceField values in the solrconfig (like what has been added to the spatial-solr project). Do you want to have my modifications? May I commit them?
          Hide
          patrick o'leary added a comment -

          It's important to realize that localsolr is just a stop gap until it's functionality / feature set is included in solr
          Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome.

          Please feel free to join locallucene-users listserv on sourceforge http://sourceforge.net/mail/?group_id=208194
          and send patches there, and I'll do my best to include them.

          Show
          patrick o'leary added a comment - It's important to realize that localsolr is just a stop gap until it's functionality / feature set is included in solr Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome. Please feel free to join locallucene-users listserv on sourceforge http://sourceforge.net/mail/?group_id=208194 and send patches there, and I'll do my best to include them.
          Hide
          Grant Ingersoll added a comment -

          Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome

          Grant would definitely welcome help! This is way too big for me. People wanting to help, should take a look at all of the linked items on this issue and see where they can contribute. If in doubt, please ask. I'm good at telling people what to do

          Show
          Grant Ingersoll added a comment - Grant is doing some fantastic work here, and I'm looking forward to seeing the outcome Grant would definitely welcome help! This is way too big for me. People wanting to help, should take a look at all of the linked items on this issue and see where they can contribute. If in doubt, please ask. I'm good at telling people what to do
          Hide
          patrick o'leary added a comment -

          Yeah, this has become a big re-arch of solr-
          The implementation of the spatial search is almost secondary to the key features

          1) A method to add meta data to a document from a query life cycle

          • Should be common between lucene and solr (end goal)
            2) Meta data can be used to perform sorting and boosting

          I think once those two things are completed then spatial search will just fit right in

          Show
          patrick o'leary added a comment - Yeah, this has become a big re-arch of solr- The implementation of the spatial search is almost secondary to the key features 1) A method to add meta data to a document from a query life cycle Should be common between lucene and solr (end goal) 2) Meta data can be used to perform sorting and boosting I think once those two things are completed then spatial search will just fit right in
          Hide
          Grant Ingersoll added a comment -

          Not so much a re-arch, but an extension of some pieces to handle some new ideas. I think we all agree that Solr does a pretty good job of hiding some of the complexity of Lucene. So, by being able to simply declare a new field that is a CartesianTier field type, then the user need not worry at all about managing the tier prefix stuff that contrib/spatial requires.

          Show
          Grant Ingersoll added a comment - Not so much a re-arch, but an extension of some pieces to handle some new ideas. I think we all agree that Solr does a pretty good job of hiding some of the complexity of Lucene. So, by being able to simply declare a new field that is a CartesianTier field type, then the user need not worry at all about managing the tier prefix stuff that contrib/spatial requires.
          Hide
          Bill Bell added a comment -

          Patrick:

          In your http://www.nsshutdown.com/solr-example.tgz There is missing localsolr.jar and the other jar is also missing... Can you please add them?

          What repo are we supposed to build from now? (kinda confusing).

          Bill

          Show
          Bill Bell added a comment - Patrick: In your http://www.nsshutdown.com/solr-example.tgz There is missing localsolr.jar and the other jar is also missing... Can you please add them? What repo are we supposed to build from now? (kinda confusing). Bill
          Hide
          patrick o'leary added a comment -

          Already added, someone mailed me earlier about it, tar'd without following symlinks.

          What are you building against? Lucene / Solr are ASF, localsolr is sourceforge

          Show
          patrick o'leary added a comment - Already added, someone mailed me earlier about it, tar'd without following symlinks. What are you building against? Lucene / Solr are ASF, localsolr is sourceforge
          Hide
          patrick o'leary added a comment -

          I will be making some updates that fix a few bugs, and start working on polygon searching.
          Right now there's a very very basic example in svn
          https://locallucene.svn.sourceforge.net/svnroot/locallucene/trunk/contrib/polySpatial

          simple run ant in that directory to create dist/polySpatial.war
          Load is up in a web container like tomcat on port 8080 and hit http://localhost:8080/polySpatial/
          Click on the map to start seeing results around the generated polygon.

          Let me know your thoughts

          Show
          patrick o'leary added a comment - I will be making some updates that fix a few bugs, and start working on polygon searching. Right now there's a very very basic example in svn https://locallucene.svn.sourceforge.net/svnroot/locallucene/trunk/contrib/polySpatial simple run ant in that directory to create dist/polySpatial.war Load is up in a web container like tomcat on port 8080 and hit http://localhost:8080/polySpatial/ Click on the map to start seeing results around the generated polygon. Let me know your thoughts
          Hide
          Eric Pugh added a comment - - edited

          Patrick, I tried out your "Batteries Included" example, and it worked great. One of the questions I have is that it seems like the scoring process doesn't take into account the distance from a central point.. In other words, if I specify a 10 mile radius, and there is a really high scoring match more then 10 miles out, it doesn't get returned. The radius functions as a strict filter of what gets returned. However, I think what we are really trying to do is to find the best search results, and have distance factored in as well.

          I was thinking that I could sort of do this "fuzzy" boundary by making a query with a radius x, and then doing the same query radius x * 2. Then, if any of the documents in x * 2 are much better then in radius x, then to include them. Obviously this would be somewhat clunky to do from the client side!

          A use case I can think of is searching for gas stations within 5 miles of me, but if a gas station has really cheap gas, and is 6 miles away, then include that. But just a penny cheaper ignore it.

          I added as a "screenshot" a drawing of what I was sort of thinking.

          Show
          Eric Pugh added a comment - - edited Patrick, I tried out your "Batteries Included" example, and it worked great. One of the questions I have is that it seems like the scoring process doesn't take into account the distance from a central point.. In other words, if I specify a 10 mile radius, and there is a really high scoring match more then 10 miles out, it doesn't get returned. The radius functions as a strict filter of what gets returned. However, I think what we are really trying to do is to find the best search results, and have distance factored in as well. I was thinking that I could sort of do this "fuzzy" boundary by making a query with a radius x, and then doing the same query radius x * 2. Then, if any of the documents in x * 2 are much better then in radius x, then to include them. Obviously this would be somewhat clunky to do from the client side! A use case I can think of is searching for gas stations within 5 miles of me, but if a gas station has really cheap gas, and is 6 miles away, then include that. But just a penny cheaper ignore it. I added as a "screenshot" a drawing of what I was sort of thinking.
          Hide
          patrick o'leary added a comment -

          You can certainly implement a fuzzy scoring method, but you really want to avoid having to calculate distances for all your results, so
          some sort of restriction is good.

          If your data set is small ~100K docs, you might get away with using a value scorer and boost on distances.
          But if your data set is in the order of millions, that's not going to be a good idea.

          Show
          patrick o'leary added a comment - You can certainly implement a fuzzy scoring method, but you really want to avoid having to calculate distances for all your results, so some sort of restriction is good. If your data set is small ~100K docs, you might get away with using a value scorer and boost on distances. But if your data set is in the order of millions, that's not going to be a good idea.
          Hide
          Grant Ingersoll added a comment -

          Just an update:

          1. SOLR-1131: aka poly fields is almost ready to go. Please review.
          2. SOLR-1297: sort by function query just needs review and then can be committed.

          After that, we can add in the Cartesian Tier indexing and the Cartesian Tier QParserPlugin (after a little re-write). Then we need pseudo-fields and we likely want to hook in a per request function cache (maybe)

          Show
          Grant Ingersoll added a comment - Just an update: SOLR-1131 : aka poly fields is almost ready to go. Please review. SOLR-1297 : sort by function query just needs review and then can be committed. After that, we can add in the Cartesian Tier indexing and the Cartesian Tier QParserPlugin (after a little re-write). Then we need pseudo-fields and we likely want to hook in a per request function cache (maybe)
          Hide
          Dave Craft added a comment -

          Hi,

          I've created a blog post on installing LocalSolr onto Solr 1.4.. Which takes all the comments and breaks it down into step by step instructions.

          Hope it helps

          http://craftyfella.blogspot.com/2009/12/installing-localsolr-onto-solr-14.html

          Show
          Dave Craft added a comment - Hi, I've created a blog post on installing LocalSolr onto Solr 1.4.. Which takes all the comments and breaks it down into step by step instructions. Hope it helps http://craftyfella.blogspot.com/2009/12/installing-localsolr-onto-solr-14.html
          Hide
          Otis Gospodnetic added a comment -

          Dave - useful, thanks!
          Do you think creating/editing a Wiki page with this information would be good?
          See: http://wiki.apache.org/solr/LocalSolr

          Show
          Otis Gospodnetic added a comment - Dave - useful, thanks! Do you think creating/editing a Wiki page with this information would be good? See: http://wiki.apache.org/solr/LocalSolr
          Hide
          Grant Ingersoll added a comment - - edited

          There is already a spot for Spatial at: http://wiki.apache.org/solr/SpatialSearch

          It probably would be useful to see if the LocalSolr project can make use of it, since Solr itself is not going to require any custom install stuff.

          Show
          Grant Ingersoll added a comment - - edited There is already a spot for Spatial at: http://wiki.apache.org/solr/SpatialSearch It probably would be useful to see if the LocalSolr project can make use of it, since Solr itself is not going to require any custom install stuff.
          Hide
          Grant Ingersoll added a comment -

          SOLR-1131 is committed. I'm now working on SOLR-1586.

          Show
          Grant Ingersoll added a comment - SOLR-1131 is committed. I'm now working on SOLR-1586 .
          Hide
          Grant Ingersoll added a comment -

          SOLR-1586 is committed for GeohashField and SpatialTileField. We likely will add one more FieldType that combines both a 2D PointType and the tiling capabilities into a single FieldType, mostly as a convenience mechanism.

          Show
          Grant Ingersoll added a comment - SOLR-1586 is committed for GeohashField and SpatialTileField. We likely will add one more FieldType that combines both a 2D PointType and the tiling capabilities into a single FieldType, mostly as a convenience mechanism.
          Hide
          Brian Westphal added a comment -

          We've got Localsolr (2.9.1 lucene-spatial library) running on Solr 1.4 with Tomcat 1.6. Everything's looking good, except for a couple little issues.

          Issue #1:
          If we specify fl=id (or fl= anything) and wt=json it seems that the fl parameter is ignored (thus we get a lot more detail in our results than we'd like).

          If we specify fl=id and leave out wt=json (which defaults to returning xml results), we get the expected fields back. We'd really prefer to use wt=json because the results are easier for us to deal with (also, the same issue also arises with wt=python and wt=ruby).

          Issue #2:
          It looks like the defType parameter isn't properly passed through for geo queries, making it really hard to use things like dismax + geo. I've been playing with the code a bit and have a "working" patch for it. However, as I'm very new to the solr/localsolr source, I'd be uncomfortable submitting it without additional testing.

          ---------

          If anyone knows any workarounds for these issues, please let me know.

          Show
          Brian Westphal added a comment - We've got Localsolr (2.9.1 lucene-spatial library) running on Solr 1.4 with Tomcat 1.6. Everything's looking good, except for a couple little issues. Issue #1: If we specify fl=id (or fl= anything) and wt=json it seems that the fl parameter is ignored (thus we get a lot more detail in our results than we'd like). If we specify fl=id and leave out wt=json (which defaults to returning xml results), we get the expected fields back. We'd really prefer to use wt=json because the results are easier for us to deal with (also, the same issue also arises with wt=python and wt=ruby). Issue #2: It looks like the defType parameter isn't properly passed through for geo queries, making it really hard to use things like dismax + geo. I've been playing with the code a bit and have a "working" patch for it. However, as I'm very new to the solr/localsolr source, I'd be uncomfortable submitting it without additional testing. --------- If anyone knows any workarounds for these issues, please let me know.
          Hide
          Grant Ingersoll added a comment -

          Brian,

          Have you tried what's in trunk of Solr?

          Show
          Grant Ingersoll added a comment - Brian, Have you tried what's in trunk of Solr?
          Hide
          Brian Westphal added a comment -

          Hi Grant,

          Trying the solr trunk right now, but I'm getting an exception: rsp java.lang.NoSuchFieldError: rsp at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:119)

          Working on trying to figure out what the issue could be still – very well my fault – but though I'd mention it in case it rang any bells. I'm very new to looking at the solr and localsolr code so it might take me a bit of time to figure out – I looked at the code for ResponseBuilder and it seems like it has an rsp field.

          Thanks

          Show
          Brian Westphal added a comment - Hi Grant, Trying the solr trunk right now, but I'm getting an exception: rsp java.lang.NoSuchFieldError: rsp at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:119) Working on trying to figure out what the issue could be still – very well my fault – but though I'd mention it in case it rang any bells. I'm very new to looking at the solr and localsolr code so it might take me a bit of time to figure out – I looked at the code for ResponseBuilder and it seems like it has an rsp field. Thanks
          Hide
          Grant Ingersoll added a comment -

          Sorry, meant w/o LocalSolr. Most of LocalSolr has been incorporated into Solr at this point, with the exception of the Tier filtering. Docs are under way at http://wiki.apache.org/solr/SpatialSearch

          Show
          Grant Ingersoll added a comment - Sorry, meant w/o LocalSolr. Most of LocalSolr has been incorporated into Solr at this point, with the exception of the Tier filtering. Docs are under way at http://wiki.apache.org/solr/SpatialSearch
          Hide
          Brian Westphal added a comment -

          I'm gonna work on getting stuff tested with solr 1.5. I wanted to ask about another issue in the meantime however.

          I've noticed that I get a "Illegal Latitude Value" exception sometimes when working with points near the poles or just when working with very large radii. I would personally rather the system just cutoff at -90 and 90 artificially than throw an error. I'm not worried about finding things near the poles as much as I'd like to be able to use bigger search radii, but I don't care if it wraps around the earth correctly latitudinally speaking. (if avoiding this issue were doable by a flag or something, that'd be great too)

          Here's the more precise error if it helps:
          -----------------
          HTTP Status 500 - Illegal latitude value 113.29902312168412 java.lang.IllegalArgumentException: Illegal latitude value 113.29902312168412 at org.apache.lucene.spatial.geometry.FloatLatLng.<init>(FloatLatLng.java:31) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:85) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:54) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:59) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:121) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.<init>(DistanceQueryBuilder.java:59) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:637)
          -------------------

          Show
          Brian Westphal added a comment - I'm gonna work on getting stuff tested with solr 1.5. I wanted to ask about another issue in the meantime however. I've noticed that I get a "Illegal Latitude Value" exception sometimes when working with points near the poles or just when working with very large radii. I would personally rather the system just cutoff at -90 and 90 artificially than throw an error. I'm not worried about finding things near the poles as much as I'd like to be able to use bigger search radii, but I don't care if it wraps around the earth correctly latitudinally speaking. (if avoiding this issue were doable by a flag or something, that'd be great too) Here's the more precise error if it helps: ----------------- HTTP Status 500 - Illegal latitude value 113.29902312168412 java.lang.IllegalArgumentException: Illegal latitude value 113.29902312168412 at org.apache.lucene.spatial.geometry.FloatLatLng.<init>(FloatLatLng.java:31) at org.apache.lucene.spatial.geometry.shape.LLRect.createBox(LLRect.java:85) at org.apache.lucene.spatial.tier.DistanceUtils.getBoundary(DistanceUtils.java:54) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoxShape(CartesianPolyFilterBuilder.java:59) at org.apache.lucene.spatial.tier.CartesianPolyFilterBuilder.getBoundingArea(CartesianPolyFilterBuilder.java:121) at org.apache.lucene.spatial.tier.DistanceQueryBuilder.<init>(DistanceQueryBuilder.java:59) at com.pjaol.search.solr.component.LocalSolrQueryComponent.prepare(LocalSolrQueryComponent.java:151) at org.apache.solr.handler.component.SearchHandler.handleRequestBody(SearchHandler.java:174) at org.apache.solr.handler.RequestHandlerBase.handleRequest(RequestHandlerBase.java:131) at org.apache.solr.core.SolrCore.execute(SolrCore.java:1316) at org.apache.solr.servlet.SolrDispatchFilter.execute(SolrDispatchFilter.java:338) at org.apache.solr.servlet.SolrDispatchFilter.doFilter(SolrDispatchFilter.java:241) at org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:235) at org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:206) at org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:233) at org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:191) at org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:128) at org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:102) at org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:109) at org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:293) at org.apache.coyote.http11.Http11Processor.process(Http11Processor.java:849) at org.apache.coyote.http11.Http11Protocol$Http11ConnectionHandler.process(Http11Protocol.java:583) at org.apache.tomcat.util.net.JIoEndpoint$Worker.run(JIoEndpoint.java:454) at java.lang.Thread.run(Thread.java:637) -------------------
          Hide
          Bill Bell added a comment - - edited

          See LUCENE-1781 to fix the pole issue

          Show
          Bill Bell added a comment - - edited See LUCENE-1781 to fix the pole issue
          Hide
          Grant Ingersoll added a comment -

          Just an update here on other work related to this issue.

          I should have a patch up for SOLR-1568 up pretty soon. I'm also going to add a new FieldType specifically for Lat/Lon that extends PointType and is fixed to two dimensions and can be a bit more intelligent about that specific use case.

          Could use some help on SOLR-1298 so that we could get pseudo-fields in sooner rather than later.

          Show
          Grant Ingersoll added a comment - Just an update here on other work related to this issue. I should have a patch up for SOLR-1568 up pretty soon. I'm also going to add a new FieldType specifically for Lat/Lon that extends PointType and is fixed to two dimensions and can be a bit more intelligent about that specific use case. Could use some help on SOLR-1298 so that we could get pseudo-fields in sooner rather than later.
          Hide
          Dan Bentson added a comment -

          I'm trying to test spatial search out in 1.5 going by the docs on http://wiki.apache.org/solr/SpatialSearch

          First of all I believe
          <fieldType name="location" class="solr.PointType" dimension="2" subFieldTypes="double"/>
          should be changed to
          <fieldType name="location" class="solr.PointType" dimension="2" subFieldType="double"/>

          But more importantly I'm having trouble getting this to work.
          I was able to index my data using the Geohash type and can see it in my store field when not doing spatial queries. However, when doing the following query:
          ...?q=val:"recip(dist(2, store, vector(34.0232,-81.0664)),1,1,0)"&fl=*,score
          I get error message:
          Illegal number of sources. There must be an even number of sources

          I also tried ...?q=

          {!sfilt fl=location}

          &pt=49.32,-79.0&dist=20 and get message unknown query type 'sfilt'.

          Is there something I'm missing or is code just not committed to trunk yet?
          Thanks

          Show
          Dan Bentson added a comment - I'm trying to test spatial search out in 1.5 going by the docs on http://wiki.apache.org/solr/SpatialSearch First of all I believe <fieldType name="location" class="solr.PointType" dimension="2" subFieldTypes="double"/> should be changed to <fieldType name="location" class="solr.PointType" dimension="2" subFieldType="double"/> But more importantly I'm having trouble getting this to work. I was able to index my data using the Geohash type and can see it in my store field when not doing spatial queries. However, when doing the following query: ...?q= val :"recip(dist(2, store, vector(34.0232,-81.0664)),1,1,0)"&fl=*,score I get error message: Illegal number of sources. There must be an even number of sources I also tried ...?q= {!sfilt fl=location} &pt=49.32,-79.0&dist=20 and get message unknown query type 'sfilt'. Is there something I'm missing or is code just not committed to trunk yet? Thanks
          Hide
          Dan Bentson added a comment -

          Update to my above comment. I was able to get both types of searches working.

          Using the Spatial Filter QParser I'm getting the results I want (example query: q=pizza

          {!sfilt fl=location}

          &pt=49.32,-79.0&dist=20). I have a couple questions though:

          First of all what is the distance unit of measurement? Miles? Meters?

          Also, using Patrick's plug-in it returned the distance as a result field. Is there any way to do that in SOLR 1.5?

          Any help with this would be greatly appreciated!!!

          Thanks

          Show
          Dan Bentson added a comment - Update to my above comment. I was able to get both types of searches working. Using the Spatial Filter QParser I'm getting the results I want (example query: q=pizza {!sfilt fl=location} &pt=49.32,-79.0&dist=20). I have a couple questions though: First of all what is the distance unit of measurement? Miles? Meters? Also, using Patrick's plug-in it returned the distance as a result field. Is there any way to do that in SOLR 1.5? Any help with this would be greatly appreciated!!! Thanks
          Hide
          Grant Ingersoll added a comment -

          Dan, sfilt can take a units measurement, but internally it uses miles.

          Show
          Grant Ingersoll added a comment - Dan, sfilt can take a units measurement, but internally it uses miles.
          Hide
          Grant Ingersoll added a comment -

          Status update:

          SOLR-1568, which is the last big piece, I think, is almost done. I added a new LatLonType which should make it super easy to do pure LatLon stuff (Point is more for Rectangular Coordinate System. I guess maybe we should rename it?) and it should be easy to extend to use different distance methods. I will try to document some more on the wiki.

          There are some minor bugs related to sorting by function right now, but it should be usable for people just doing spatial stuff (SOLR-1297). Probably the next most important piece to get in place is SOLR-1298 and it's related item SOLR-705. Help on those pieces would be most appreciated.

          As always, people kicking the tires on the trunk is appreciated too.

          Show
          Grant Ingersoll added a comment - Status update: SOLR-1568 , which is the last big piece, I think, is almost done. I added a new LatLonType which should make it super easy to do pure LatLon stuff (Point is more for Rectangular Coordinate System. I guess maybe we should rename it?) and it should be easy to extend to use different distance methods. I will try to document some more on the wiki. There are some minor bugs related to sorting by function right now, but it should be usable for people just doing spatial stuff ( SOLR-1297 ). Probably the next most important piece to get in place is SOLR-1298 and it's related item SOLR-705 . Help on those pieces would be most appreciated. As always, people kicking the tires on the trunk is appreciated too.
          Hide
          Uri Boness added a comment - - edited

          Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge all the related issues (there are currently two open issues for the same purpose with two different patches). But this should be done with somewhat collaborated manner so everybody will be on the same page here.... also regarding the discussion about the different approaches (inline the pseudo fields or have them nested in a separate "meta" element). Is there some way to merge the issues? or perhaps mark one of them as duplicate, so the discussion will be centralized.

          btw, the other "duplicate" issues is SOLR-1566

          Show
          Uri Boness added a comment - - edited Grant, I started looking at SOLR-1298 yesterday. The idea is to somehow merge all the related issues (there are currently two open issues for the same purpose with two different patches). But this should be done with somewhat collaborated manner so everybody will be on the same page here.... also regarding the discussion about the different approaches (inline the pseudo fields or have them nested in a separate "meta" element). Is there some way to merge the issues? or perhaps mark one of them as duplicate, so the discussion will be centralized. btw, the other "duplicate" issues is SOLR-1566
          Hide
          Hoss Man added a comment -

          Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email...

          http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E

          Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed.

          A unique token for finding these 240 issues in the future: hossversioncleanup20100527

          Show
          Hoss Man added a comment - Bulk updating 240 Solr issues to set the Fix Version to "next" per the process outlined in this email... http://mail-archives.apache.org/mod_mbox/lucene-dev/201005.mbox/%3Calpine.DEB.1.10.1005251052040.24672@radix.cryptio.net%3E Selection criteria was "Unresolved" with a Fix Version of 1.5, 1.6, 3.1, or 4.0. email notifications were suppressed. A unique token for finding these 240 issues in the future: hossversioncleanup20100527
          Hide
          Bill Bell added a comment -

          In the WIKI is does not show example output. Do you give the # of miles from the pt as part of the output?

          Thanks.

          Show
          Bill Bell added a comment - In the WIKI is does not show example output. Do you give the # of miles from the pt as part of the output? Thanks.
          Hide
          P B added a comment -

          I am newbie and I have a question:

          Is there possible now to use Solr Night Build to solve this problem:

          find all points (using index) in radius of R km on earth?

          Is there ready to use query sample?

          Show
          P B added a comment - I am newbie and I have a question: Is there possible now to use Solr Night Build to solve this problem: find all points (using index) in radius of R km on earth? Is there ready to use query sample?
          Hide
          Oliver Beattie added a comment -

          I'm extremely interested in the Polygon search here, is there any activity on that front or am I the only one interested in it? I'd be willing to try and contribute to this effort, but I imagine a lot of it will depend on the work being done on point searching.

          Show
          Oliver Beattie added a comment - I'm extremely interested in the Polygon search here, is there any activity on that front or am I the only one interested in it? I'd be willing to try and contribute to this effort, but I imagine a lot of it will depend on the work being done on point searching.
          Hide
          Tamas Sandor added a comment -

          I'm also interested in the status of Polygon search...

          Show
          Tamas Sandor added a comment - I'm also interested in the status of Polygon search...
          Hide
          Simon Rijnders added a comment -

          Same question as P B:
          Is there possible now to use Solr Night Build to solve this problem: find all points (using index) in radius of R km on earth?

          Im willing to act as an (ignorant) test subject, so if the above is possible, please let me know, and I'll see what I can do....

          Show
          Simon Rijnders added a comment - Same question as P B: Is there possible now to use Solr Night Build to solve this problem: find all points (using index) in radius of R km on earth? Im willing to act as an (ignorant) test subject, so if the above is possible, please let me know, and I'll see what I can do....
          Hide
          Bill Bell added a comment -

          PB and SR:

          http://<host>:8983/solr/core0/select?fl=*,score&qf=namesearch&pf=namesearch&start=0&rows=10&q=bill&qt=standard&pt=39.7391536,-104.9847034&d=160.9344&fq=

          {!sfilt%20fl=store_lat_lon}

          &sort=hsin(6371,true,store,vector(39.7391536,-104.9847034))+asc,+score+desc

          This is an example, that queries the index and returns those points 100 miles (160.93km) away from Denver, CO (39.7391536,-104.9847034).

          It also does a dismax query on namesearch for "bill".

          It sorts the results by distance in km. To show the distance use a Javascript hsin() function when you loop through your results.

          i.e.:

          	function toRad(val) {
          		return (Math.PI*val/180);
          	};
           
           
          	function hsin(lat1,lon1,lat2,lon2) {
           
          		var R = 3958.761; //miles
          		var dLat = toRad(lat2-lat1);
          		var dLon = toRad(lon2-lon1); 
          		var a = Math.sin(dLat/2) * Math.sin(dLat/2) +        
          			Math.cos(toRad(lat1)) * Math.cos(toRad(lat2)) *         
          			Math.sin(dLon/2) * Math.sin(dLon/2); 
          		var c = 2 * Math.atan2(Math.sqrt(a), Math.sqrt(1-a)); 
          		var d = R * c;
          		return d;
          	};
          
          Show
          Bill Bell added a comment - PB and SR: http://<host>:8983/solr/core0/select?fl=*,score&qf=namesearch&pf=namesearch&start=0&rows=10&q=bill&qt=standard&pt=39.7391536,-104.9847034&d=160.9344&fq= {!sfilt%20fl=store_lat_lon} &sort=hsin(6371,true,store,vector(39.7391536,-104.9847034))+asc,+score+desc This is an example, that queries the index and returns those points 100 miles (160.93km) away from Denver, CO (39.7391536,-104.9847034). It also does a dismax query on namesearch for "bill". It sorts the results by distance in km. To show the distance use a Javascript hsin() function when you loop through your results. i.e.: function toRad(val) { return ( Math .PI*val/180); }; function hsin(lat1,lon1,lat2,lon2) { var R = 3958.761; //miles var dLat = toRad(lat2-lat1); var dLon = toRad(lon2-lon1); var a = Math .sin(dLat/2) * Math .sin(dLat/2) + Math .cos(toRad(lat1)) * Math .cos(toRad(lat2)) * Math .sin(dLon/2) * Math .sin(dLon/2); var c = 2 * Math .atan2( Math .sqrt(a), Math .sqrt(1-a)); var d = R * c; return d; };
          Hide
          Grant Ingersoll added a comment -

          I believe I have backported all trunk spatial related things to 3.x, including the bbox and related stuff.

          Show
          Grant Ingersoll added a comment - I believe I have backported all trunk spatial related things to 3.x, including the bbox and related stuff.
          Hide
          Grant Ingersoll added a comment -

          I'm going to mark this issue as resolved at this point. For a long time, this issue has served to track a bunch of different issues related to Solr, but I think we have incorporated almost all of the major features of local solr (and some others, too) such that it makes sense to just track things individually at this point.

          Show
          Grant Ingersoll added a comment - I'm going to mark this issue as resolved at this point. For a long time, this issue has served to track a bunch of different issues related to Solr, but I think we have incorporated almost all of the major features of local solr (and some others, too) such that it makes sense to just track things individually at this point.
          Hide
          Grant Ingersoll added a comment -

          Bulk close for 3.1.0 release

          Show
          Grant Ingersoll added a comment - Bulk close for 3.1.0 release

            People

            • Assignee:
              Grant Ingersoll
              Reporter:
              Grant Ingersoll
            • Votes:
              35 Vote for this issue
              Watchers:
              62 Start watching this issue

              Dates

              • Created:
                Updated:
                Resolved:

                Development