Details
-
Improvement
-
Status: Open
-
Major
-
Resolution: Unresolved
-
None
-
None
-
None
Description
There are two styles of encoding nulls in dictionaries (masked or encoded). In compute:: DictionaryEncode this is controlled by an option. Today, if you pass a dictionary into DictionaryEncode it is a no-op.
Instead it should check to see if the dictionary is properly encoded (this is easily checked in constant time) according to the requested null encoding scheme and, if not, it should convert it.
The default NullEncodingBehavior should also change to EXISTING_OR_ENCODE or a second option should be added so that this doesn't change existing behavior.
Once this is done then partition.cc could be improved. It currently requires dictionaries use "encoded nulls" and, if a dictionary is passed in that uses "masked nulls" then it uncodes and re-encodes the dictionary which is a potentially costly operation. This could be fixed to use the conversion.