Are you sure only one thing makes sense? What if i need integers that are larger than a short, but the range of values (max-min)
is actually small. Then a Packed impl could make more sense. So we should think about this...
I understand your point, I am myself a big supporter of packed ints and plan to use them probably more often than fixed ints, but I still think that fixed_ints would be a good default (no one would be surprised if the doc values of a field which is an int in their schema require 4 bytes per value).
But if Lucene was able to switch automatically from packed ints to fixed_ints if they have less than x% overhead, this would be great!
Well I don't think there should be so many types
If you want to sort on a String field, there are 6 available types. And I think it should be easy for people getting started with Solr to do simple things such as sorting data without having to understand the different trade-offs of these doc values types in order to choose one. Otherwise the risk is that they keep using the field cache instead because they find it more convenient.
(I hate this argument because some people will certainly have troubles with SORTED doc values on a unique field of a very large index, but anyway it is still better than the field cache?)
In my opinion instead of IndexWriter streaming docvalues to the codec directly, only to have the codec buffer up in ram and use
Counter for accounting, IndexWriter should buffer and things like STRAIGHT/VAR would just be optimizations...
I'm still worried about this case: I don't like them treated as stored fields. Its only going to be more seeks if people have disk-enabled dvs that we must fetch in addition to the stored fields.
I havent looked at the relevant bits, but is it possible we could treat "*" as just meaning the stored fields still? Basically, if you CHOOSE to
request them, you get them, but we don't do anything trappy.
If we allow for direct doc values, this makes sense to not load them by default, but I think we should add documentation to the example schema.xml so that people know that it is wasteful to store fields if doc values are enabled and in memory, and that they can be added very easily to the response by adding the field name to the fl parameter.
In case the unique key has doc values and is not stored, maybe it still makes sense to fetch it when fl=*?