Details
-
Bug
-
Status: Open
-
Critical
-
Resolution: Unresolved
-
None
-
None
-
None
Description
The spec for DICTIONARY_ENCODING states that:
If the dictionary grows too big, whether in size or number of distinct values, the encoding will fall back to the plain encoding.
However, the parquet-mr implementation was deliberately changed to a different fallback mechanism in https://issues.apache.org/jira/browse/PARQUET-52
I'm assuming the parquet-mr implementation is authoritative here. But then the spec is incorrect and should be fixed to reflect expected behavior.