I welcome any suggestions you have pertaining naming (or again on anything). If you can suggest a better name to what "leafy branch pruning" does, then at a minimum it could be expressed in the javadocs. Not long ago it had another name but it was even worse . Naming is hard. Likewise... if SpatialPrefixTree might have a better literature/industry based name then I'd love to know what that is. When I named it as-such I looked around but didn't seem to find anything that was perfect. It's a type of Trie, a spatial trie, and "PrefixTree" is a synonym for Trie, so... there you go. Maybe I didn't look hard enough. I'm sorry if I seemed touchy on the terminology; I merely want to ensure we mutually understood what we were and weren't talking about – that's all. So when you proposed that what StreamingQuadPrefixTree did was leafy branch pruning, and as the coiner of the term I can see it didn't meet my definition; clarification is in order.
I want to make sure there is no confusion here (especially for anyone else willing to participate and the sensitivity surrounding my use of the pruneLeafyBranches term). SPT's in the context of lucene-spatial's definition, no? (since SPT is not a common name for a geo data structure its a derivative)
Yes, in the context of lucene-spatial's definition.
There are use-cases for traversing beyond the within state in GeoSpatial data structures. So someone might come along and contribute the "unexpected". Just want to be prepared for that future discussion.
Interesting; I'm curious what that might be.
A lot of it is quite big so I'll have to add to the build.xml to pull a compressed version from somewhere.
Yes, as you may have observed, this is how all existing test data is handled in the benchmark module: fetched & decompressed on-demand. I added the Geonames point data. And I added facilities to SpatialDocMaker to automatically turn the points into circles & rects of random size.
I'm looking forward to seeing PackedQuadPrefixTree kick ass in being compressed allowing distErrPct=0 – I'm still puzzled but I look forward to being pleasantly astonished. I do understand that it uses all 8 bits (instead of just 2 as the legacy impl) of the first number of bytes for the quadrant info, and that should lead to shorter terms / prefixes for higher-precision data. But I don't see that there would be any change in the number of terms, which is, as I see, the scalability challenge. I have an idea on solving this floating in my head but maybe I needn't ponder it longer if PackedQuadPrefixTree handles it somehow. It's not obvious to me but where in the code of PackedQuadCell are the 5 "depth bits" encoded & decoded?
In essence, the rough benchmark you performed doesn't benchmark default Lucene spatial behavior. Is that not a problem?
It is a problem – it's not ideal, that is. Preferably it would stay enabled but I think you indicated it's not supported by PackedQuadTree? I didn't look closer.
RE Geo3d: As an example for anything with me; it's an outlier, not an example that proves the rule. If I am blind to facts I can't see then show me. Hopefully you noticed that Karl and I are working together and it's come real far now (lots of discussion on ReviewBoard & bugs I helped Karl find via randomized testing). Your patch here has nothing in common with it – PackedQuadTree obviously belongs here, and it should be quite evident that I'm here helping by providing feedback and running a benchmark. And yes, being critical. Anyway... lets get back to work.
The only thing about this patch that is a blocker (-1) for me is StreamingPrefixTreeStrategy. Not most of the code in it, but it's existence as a SpatialStrategy subclass instead of an SPT subclass. I don't mind that it optimizes something that I don't think needed optimization, though I suspect if you noticed TreeCellIterator originally you wouldn't have bothered.
I have other by-line feedback I'd like to give (and would prefer to do so via ReviewBoard/GitHub) but we needn't steamroll this in under the mantra of "progress not perfection".