It is nice that DocValues gives us the freedom to do this, but.... I'm
not sure we should, because it's a sizable performance trap.
Ie, we'll be silently inserting a call to ReaderUtil.subSearcher on
every doc value lookup (vs previously when it was a single top-level
While client code that has relied on this in the past will nicely
continue to function properly, if we make this change, its performance
is going to silently take a [possibly sizable] hit.
In general, with Lucene, we can do the per-segment switching "up high"
(which is what the core now does, exclusively), or we can do it "down
low" (creating MultiTermDocs, MultiTermEnum, MultiTermPositions,
MultiDocValues, etc.), which has sizable performance costs. It's also
costly for us because we'll have N different places where we must
create & maintain a MultiXXX class. I would love to someday deprecate
all of the "down low" switching classes
In the core I think we should always switch "up high". We've already
done this w/ searching and collection/sorting. In
fixing IndexSearcher.explain to do so as well.
With external code, I'd like over time to strongly encourage only
switching "up high" as well.
Maybe it'd be best if we could somehow allow this "down low" switching
for 2.9, but 1) warn that you'll see a performance hit right off, 2)
deprecate it, and 3) and somehow state that in 3.0 you'll have to send
only a SegmentReader to this API, instead.
EG, imagine an app that created an external custom HitCollector that
calls say FloatFieldSource on the top reader in order to use of a
float value per doc in each collect() call. On upgrading to 2.9, this
app will already have to make the switch to the Collector API, which'd
be a great time for them to also then switch to pulling these float
values per-segment. But, if we make the proposed change here, the app
could in fact just keep working off the top-level values (eg if the
ctor in their class is pulling these values), thinking everything is
fine when in fact there is a sizable, silent perf hit. I'd prefer in
2.9 for them to also switch their DocValues lookup to be per segment.
[Aside: once we gain clarity on LUCENE-831, hopefully we can do away
FieldSource, etc. Ie these classes
basically copy what FieldCache does, but expose a per-doc method call
instead of a fixed array lookup.]