[PARQUET-2221] [Format] Encoding spec incorrect for dictionary fallback - ASF JIRA

Add vote

Watch issue

XML

Word

Printable

JSON

The spec for DICTIONARY_ENCODING states that:

If the dictionary grows too big, whether in size or number of distinct values, the encoding will fall back to the plain encoding.

However, the parquet-mr implementation was deliberately changed to a different fallback mechanism in https://issues.apache.org/jira/browse/PARQUET-52

I'm assuming the parquet-mr implementation is authoritative here. But then the spec is incorrect and should be fixed to reflect expected behavior.

[Format] Encoding spec incorrect for dictionary fallback