Description
Just as ORC allows the choice of columns to enable bloom-filters on, it would be nice to have a way to specify which columns DICTIONARY_V2 encoding should be disabled on.
Currently, the choice of dictionary-encoding depends on the results of sampling the first row-stride within a stripe. If the user knows that a column's cardinality is bound to prevent an effective dictionary, she might choose to simply disable it on just that column, and avoid the cost of sampling in the first row-stride.
Attachments
Attachments
Issue Links
- is related to
-
SPARK-25635 Support selective direct encoding in native ORC write
- Resolved
-
ORC-308 Add function to get subtypes by name
- Closed
-
ORC-299 Improve heuristics for bailing on dictionary encoding
- Open
- links to