Let's consider a table that has string, char and varchar columns and some of the values in these columns are empty strings.
If I query the # of distinct values by DataSketches HLL then the empty string add +1 to the end result.
However, Hive's implementation omits empty strings so for this particular example above Hive would return 1 for each column.
I assume omits empty strings because of this line:
First step of this task would be to decide which approach is the correct one, and as a second step do the adjustment in Impala if we decide that way.
Btw, in Impala this functions updates string to the HLL sketches: