Sorry for not getting into this sooner-
Lets take a step back for a second, and ask a couple of questions, my thoughts are provided.
1) What is the goal we want to achieve?
- Provide a first iteration of a geographical search entity to SOLR
- Bring an external popular plugin, in out of the cold into ASF and SOLR, helps solr users out, increases developers from 1 to many.
2) What is the level of commitment, and road map of spatial solutions in lucene and solr?
- The primary goal of SOLR is as a text search engine, not GIS search, there are other and better ways to do that
without reinventing the wheel and shoe horn-ing it into lucene.
(e.g. persistent doc id mappings that can be referenced outside of lucene, so things like postGis and other tools can be used)
- We can never fully solve everyone's needs at once, lets start with what we have, and iterate upon it.
- I'm happy for any improvements as long as they keep to two goals A. don't make it stupid B. don't make it complex.
3) Raw Math through trie data structures, Spatial ids geo hash, Tier Id's Cartesian tiers, which one?
- Why not all? Again we can't solve everyone's needs so why not let them have the tools to help themselves.
As for bench marking, I have performed some recently using tdouble precision 0,
~1 Million docs covering the state of NY
Top density was ~300,000 between Manhattan & Brooklyn area.
Returning all results, avg of 100 hits:
Trie Double: 108ms
Cartesian Tier: 12ms
The reason for the difference, is with Trie Ranges, you are doing 2 sets of range filters/ queries.
Cartesian you are doing 1 iteration for maybe 4 to 16 fielded id's.
And maybe switching the _localTier fields from sdouble to tdouble might improve that, I haven't tried, 12ms is something I can live with.
However, the distance calculation is the killer, 300,000 took about 1.8 seconds in a single thread on a 3.2GHz machine.
I was working on some additional features in locallucene, such as poly lines, and convex hulls, which using the Cartesian tierIds
can give some basic quick features such as intersect, contains, and a nifty feature of having sorted id's is nearby results.
Also faceting on tierId's can give you hot spot results.
One final feature, the projection method is a an implementation of IProjector, which allows you to create your own projection
currently I'm using Sinusoidal, but you can do your own, such as say
- Google Mercator (I use a similar quad grid concept, just different projection method)
- Open Map
There's a lot that can be done, but we should stay focused on primary goals, and iterate, iterate iterate.