Thanks for looking at this! I'll try to split this DocValuesHolder class into separate classes per DV type.
In your patch you do an anonymous class for each one but I think it would be clearer to not do that and eliminate the DocValuesHolder which is holding for multiple types of values at once which is a little confusing.
MemoryIndex is supposed to be thread safe after the freeze() method has been invoked. (this is describe in the jdocs) Returning the same doc values instance for binary, sorted, number and sorted number doc values is fine, but not for sorted set doc values. That implementation keeps state in order to iterate over the values. So for sorted set doc values we always need to return a new instance and in order to be consistent with the other doc values types I did they same thing there too. I think this isn't a big of deal.
Could you move fields() above MemoryFields so it reads more naturally?
Sure, will do
Why is there a different sorting approach to sorted numeric vs BytesRef DV types? The former you add to an array and sort eventually, and the latter you use a TreeSet. They could both be handled consistently – array first then sort later.
That is what I thought initially too, but the difference between the two is that sorted numeric DV can hold duplicates while sort set DV doesn't hold duplicates. This is why I took the approach of using treeset during the building phase for any binary DV.
I noticed the BytesRefs DV data isn't copied; we retain a reference. Is that allowed?
Good point, let me fix this.