Uploaded image for project: 'Lucene - Core'
  1. Lucene - Core
  2. LUCENE-4869

Optimize IsWithin spatial RPT to use a point cache for false-positive removal

Details

    • Improvement
    • Status: Open
    • Major
    • Resolution: Unresolved
    • None
    • None
    • modules/spatial
    • None
    • New

    Description

      LUCENE-4644 implemented the "IsWithin" predicate for a RecursivePrefixTree based field. It's slow since it looks across the whole world to ensure it doesn't match docs with data anywhere outside the query shape. It can be configured to only look outside the query shape using a very small buffer distance, and that will filter out documents spanning the query shape boundary, but not indexed shapes comprised of multiple disjoint parts. The solution proposed here is to index a point per disjoint part in such a way that it can be easily retrieved (e.g. DocValues) and then a post-process of WithinPrefixTreeFilter would remove false-positives.

      This isn't particularly hard/advanced but it requires some advances in some APIs that aren't quite there yet. Spatial4j's ShapeCollection (aka WKT GeometryCollection or Multi*) needs to get released, it needs a vertex iterator. There needs to be code to read and write a set of points to a BinaryDocValues field (1/doc). And finally of course WithinPrefixTreeFilter needs to have a mode in which it uses the smallest buffer and then in the end checks the DocValues to remove false-postivies.

      Attachments

        Issue Links

          Activity

            People

              Unassigned Unassigned
              dsmiley David Smiley
              Votes:
              0 Vote for this issue
              Watchers:
              1 Start watching this issue

              Dates

                Created:
                Updated: