I am not acknowledging there is a problem: I'm just telling you if you have 'sparse' values in a docvalues field, and you want to emulate what fieldcache does in allowing you to optionally pull a bitset telling you when a value does/doesnt exist: you can do the same thing at index-time yourself today.
I'm against changing the "default" of 0 because its both unnecessary and unhelpful to differentiate whether a value exists in the field (it wont work: for numeric types it could be a "real value". Thats why FieldCache does this as a bitset, thats why FieldCache has a "hardcoded default" of 0). I don't want to add unnecessary complexity that ultimately provides no benefit (because that solves nothing, sorry).
I'm not opposed to allowing the comparators to take in a bits from somewhere other than the fieldcache (which i think always returns MatchAllBits for dv fields). This way if someone wants this: they can do it. I do have some reservations about it, because it doesnt give a 1-1 consistency with FieldCache api (so wouldnt "automatically" work for function queries without giving them special ctors too). So this would make APIs harder to use: and I don't like that... but its an option and its totally clear to the user what is happening.
I'm significantly less opposed to supporting an equivalent to FieldCache.getDocsWithField for docvalues. The advantage is we could pass FieldCache.getDocsWithField thru to it, and the sort missing-first/last, function queries exist() and so on would automatically work. The downsides are: it adds some complexity under the hood to deal with (e.g. indexwriter consumers, codec apis need change, codecs need to optimize for the case where none are missing, etc). And is this really complexity we should be adding for what is supposed to be a column-stride type (like norms?)... I'm just not sure its the right tradeoff.