[LUCENE-4942] Indexed non-point shapes index excessive terms - ASF JIRA

XML

Word

Printable

JSON

Details

Type: Improvement
Status: Closed
Priority: Major
Resolution: Fixed
Affects Version/s: None
Fix Version/s: 5.1
Component/s: modules/spatial
Labels:
None

Lucene Fields:

New

Description

Indexed non-point shapes are comprised of a set of terms that represent grid cells. Cells completely within the shape or cells on the intersecting edge that are at the maximum detail depth being indexed for the shape are denoted as "leaf" cells. Such cells have a trailing '+' at the end. Such tokens are actually indexed twice, one with the leaf byte and one without.

The TermQuery based PrefixTree Strategy doesn't consider the notion of 'leaf' cells and so the tokens with '+' are completely redundant.

The Recursive [algorithm] based PrefixTree Strategy better supports correct search of indexed non-point shapes than TermQuery does and the distinction is relevant. However, the foundational search algorithms used by this strategy (Intersects & Contains; the other 2 are based on these) could each be upgraded to deal with this correctly. Not trivial but very doable.

In the end, spatial non-point indexes can probably be trimmed my ~40% by doing this.

Attachments

- Sort By Name
- Sort By Date
- Ascending
- Descending

LUCENE-4942_non-point_excessive_terms.patch
10/Mar/15 16:39
68 kB
David Smiley
LUCENE-4942_non-point_excessive_terms.patch
09/Mar/15 21:12
61 kB
David Smiley
LUCENE-4942-clone.diff
11/Mar/15 16:53
3 kB
Ryan McKinley
spatial.alg
10/Mar/15 16:03
4 kB
David Smiley

Issue Links

relates to

LUCENE-5529 Spatial: Small optimization searching on indexed non-point shapes

Closed

Activity

People

Assignee:: David Smiley

Reporter:: David Smiley

Votes:: 1 Vote for this issue

Watchers:: 8 Start watching this issue

Dates

Created:: 18/Apr/13 14:29

Updated:: 28/Aug/22 13:44

Resolved:: 10/Mar/15 20:15