From the dev list discussion:
My original post.
Zero is different from not
existing. And let's claim that I want to process a stream and, say,
facet on in integer field over the result set. There's no way on the
client side to distinguish between a document that has a zero in the
field and one that didn't have the field in the first place so I'll
over-count the zero bucket.
From Dennis Gove:
Is this true for non-numeric fields as well? I agree that this seems like a very bad thing.
I can't imagine that a fix would cause a problem with Streaming Expressions, ParallelSQL, or other given that the /select handler is not returning 0 for these missing fields (the /select handler is the default handler for the Streaming API so if nulls were a problem I imagine we'd have already seen it).
That said, within Streaming Expressions there is a select(...) function which supports a replace(...) operation which allows you to replace one value (or null) with some other value. If a 0 were necessary one could use a select(...) to replace null with 0 using an expression like this
select(<stream>, replace(fieldA, null, withValue=0)).
The end result of that would be that the field fieldA would never have a null value and for all tuples where a null value existed it would be replaced with 0.
Details on the select function can be found at https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=61330338#StreamingExpressions-select.
And to answer Denis' question, null gets returned for string DocValues fields.
- is related to
LUCENE-7548 Docvalues sorting treats empty values as the default