Details
-
Improvement
-
Status: Patch Available
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
-
New
Description
LUCENE-4644 implemented the "IsWithin" predicate for a RecursivePrefixTree based field. It's slow since it looks across the whole world to ensure it doesn't match docs with data anywhere outside the query shape. It can be configured to only look outside the query shape using a very small buffer distance, and that will filter out documents spanning the query shape boundary, but not indexed shapes comprised of multiple disjoint parts. The solution proposed here is to index a point per disjoint part in such a way that it can be easily retrieved (e.g. DocValues) and then a post-process of WithinPrefixTreeFilter would remove false-positives.
This isn't particularly hard/advanced but it requires some advances in some APIs that aren't quite there yet. Spatial4j's ShapeCollection (aka WKT GeometryCollection or Multi*) needs to get released, it needs a vertex iterator. There needs to be code to read and write a set of points to a BinaryDocValues field (1/doc). And finally of course WithinPrefixTreeFilter needs to have a mode in which it uses the smallest buffer and then in the end checks the DocValues to remove false-postivies.
Attachments
Issue Links
- depends upon
-
LUCENE-4644 Implement spatial WITHIN query for RecursivePrefixTree
- Closed
-
LUCENE-4698 Overhaul ShapeFieldCache because its a memory pig
- Closed