I didn't propose that we add a DV format, I was saying that if there was one, then a DirectFacets format would make sense, b/c the app wouldn't need to write special code to work with it ... it would just return the ints more efficiently.
And we're abusing DV now, just like we abused payloads before, so nothing has changed .
I did propose on another issue (forgot where, maybe the migration layer issue?) to develop a FacetsCodec, but you were against it. Perhaps after you worked on DV 2.0 you now think it makes more sense? It will solve a slew of problems I think.
This FacetsCodec today is mimicked by CategoryListIterator which exposes that getInts API. But Mike and I saw that the DV abstraction (getBytes) + CLI (getInts) hurts performance, therefore the *fast* aggregators / collectors sidestep the CLI abstrtaction and uses only DV. On
LUCENE-4764, mike sidesteps the DV abstraction too, which results in more duplicated code. I'm all for those specializations, but it becomes harder to maintain. I just think of all the places we'd need to change if someone will find a better encoding than gap+vint .
Plus, the specialization doesn't serve the different facet features. I.e. if I'm interested in fast sum-score, I need to write a specialized one. If I'm interested in fast sum-association, I need to write one. Just to be clear, I'm not complaining and I think it makes sense for expert apps to write some specialized code. What I am saying is that if we could make the abstractions FAST, then we'd lower the bar of when apps would need to do that ...
So far, our latest optimizations only pertain to the counting case. It is the common case and I think it's important that we did that. Perhaps the rest of the API changes also improved the other cases too, but it's clear that if we want to really speed them up, we should specialize them.
Maybe if we had a FacetsCodec, with CategoryListFormat (an extension to Codec, private to Facets), then
LUCENE-4764 and this issue would benefit out-of-the-box all facet features. Because that format will expose what facets need - a getInts API. And if we make this one a Codec and FastDV a Codec, then we anyway force the app to declare a special facets Codec, so at least from that aspect, we won't require more ...
And if we do a FacetsCodec w/ CategoryListFormat, then at first it can continue to abuse DV, but then in the future we can explore a different format to manage the per-document categories (and support category associations). Maybe even a way to manage the taxonomy in the main index, in its own data structure ...
Perhaps these two issues show the usefulness of having such Codec?