Some of your comments seem to indicate you think we will need to end up with an object rather than raw arrays?
Well, really I threw out all these future items to stir up the pot and
see if some clarity comes out of it This is what I try to do
whenever I'm stuck on how to design something... some sort of defense
That said, what requires object instead of array? EG for binary
fields (deleted docs) we'd have eg "BitVector getBits(...)".
For multi-valued fields, I'm not sure what's best. I think Yonik did
something neat with Solr for holding multi-valued fields but I can't
find it now. But, with ValueSource, we have the freedom to use arrays
for simple cases and something else for interesting ones? It's not
And we would want to lose exposing Parser so that CFS can be a seamless backing.
I see the CFS/CSF confusion has already struck!
But yes cleaner API would be a nice step forward...
We have it? Just pass the CSFValueSource at IndexReader creation?
Yes I think we have this one.
Though... I feel like ValueSource should represent a single field's
values, and something else (FieldType?) returns the ValueSource for
that field. Ie, I think we are overloading ValueSource now?
Good point. We need a way to update, that can throw USO Exception?
Maybe... or we can defer for future. We don't need full answers nor
impls for all of these now...
> Possible future when Lucene computes sort cache (for text fields)
> and stores in the index
I'm not familiar with that idea, so not sure what affect this has...
Sort cache is just getStringIndex()... all other types just use the
values directly (no need for separate ords). If it's costly to
compute per-reopen we may want to store it in the index. But
honestly, since we load the full thing into RAM, I wonder how
different the time'd really be loading it vs recomputing it.
Good point again. Getting norms under this API will add a bit more meat to this issue.
Yeah I'm not sure whether norms/deleted docs "fit"; certainly we'd
need updatability first. It's just that, from a distance, they are
clearly a "value per doc" for every doc in the index. If we had norms
& deletions under this API then suddenly, [almost] for free, we'd get
pluggability of deleted docs & norms.
I am kind of liking Uwe's idea of assigning ValueSources per field, though that could probably get messy. Perhaps a default, and then per field overrides?
I'm also more liking "per field" to be somehow handled. Whether
IndexReader exposes that vs a FieldType (that also holds other
per-field stuff), I'm not sure.
Anybody is updating norms on a regular basis on a serious project?
This is a good question – I'd love to know too.
But I think updating CSFs would be compelling; having to reindex the
entire doc because only 1 or 2 metadata fields had changed is a common
annoyance. Of course we'd have to figure out (or rule out) updating
the postings for such changes...