Description
I think it could be compelling if we restricted doc values to use an
iterator API at read time, instead of the more general random access
API we have today:
- It would make doc values disk usage more of a "you pay for what
what you actually use", like postings, which is a compelling
reduction for sparse usage.
- I think codecs could compress better and maybe speed up decoding
of doc values, even in the non-sparse case, since the read-time
API is more restrictive "forward only" instead of random access.
- We could remove getDocsWithField entirely, since that's
implicit in the iteration, and the awkward "return 0 if the
document didn't have this field" would go away.
- We can remove the annoying thread locals we must make today in
CodecReader, and close the trappy "I accidentally shared a
single XXXDocValues instance across threads", since an iterator is
inherently "use once".
- We could maybe leverage the numerous optimizations we've done for
postings over time, since the two problems ("iterate over doc ids
and store something interesting for each") are very similar.
This idea has come up many in the past, e.g. LUCENE-7253 is a recent
example, and very early iterations of doc values started with exactly
this
However, it's a truly enormous change, likely 7.0 only. Or maybe we
could have the new iterator APIs also ported to 6.x side by side with
the deprecate existing random-access APIs.
Attachments
Attachments
Issue Links
- breaks
-
SOLR-9599 DocValues performance regression with new iterator API
- Open
-
SOLR-10596 unlque and hll functions don't work after first bucket
- Resolved
-
SOLR-11664 range facets with with sub aggregations on string fields give incorrect results
- Closed
- is related to
-
SOLR-9837 Performance regression of numeric field uninversion time
- Resolved
-
SOLR-9582 TestSortingResponseWriter.testSortingOutput() failure: docs were sent out-of-order
- Resolved
-
LUCENE-7835 ToChildBlockJoinSortField to sort children by a parent field
- Patch Available
-
SOLR-13024 ValueSourceAugmenter - avoid creating new FunctionValues per doc
- Open
-
LUCENE-7253 Make sparse doc values and segments merging more efficient
- Resolved
-
LUCENE-7474 Improve doc values writers
- Resolved
-
LUCENE-10534 MinFloatFunction / MaxFloatFunction calls exists twice
- Closed
-
LUCENE-10542 FieldSource exists implementations can avoid value retrieval
- Closed
- relates to
-
SOLR-9628 Trie fields have unset lastDocId
- Resolved
-
LUCENE-7871 false positive match BlockJoinSelector[SortedDV] when child value is absent
- Closed
-
LUCENE-7835 ToChildBlockJoinSortField to sort children by a parent field
- Patch Available
-
LUCENE-7457 Default doc values format should optimize for iterator access
- Resolved
-
LUCENE-5542 Explore making DVConsumer sparse-aware
- Resolved
-
LUCENE-7459 LegacyNumericDocValuesWrapper should only check bits when the value is != 0
- Resolved
-
LUCENE-7461 Refactor doc values queries to better use the new doc values APIs
- Resolved
-
LUCENE-7462 Faster search APIs for doc values
- Resolved
-
LUCENE-7463 Create a Lucene70DocValuesFormat
- Resolved
-
LUCENE-7475 Sparse norms
- Resolved
-
LUCENE-7489 Improve sparsity support of Lucene70DocValuesFormat
- Resolved
-
LUCENE-7460 Should SortedNumericDocValues expose a per-document random-access API?
- Resolved