Parquet is currently structured to choose the appropriate value writer based on the type of the column as well as the Parquet version. Value writers are responsible for writing out values with the appropriate encoding. As an example, for Boolean data types, we use BooleanPlainValuesWriter (v1.0) or RunLengthBitPackingHybridValuesWriter (v2.0). The code to take these decisions is in ParquetProperties.
Thanks to this set up, the writer(s) (and hence encoding) for each data type is hard coded in the Parquet source code.
Would be nice to support being able to override the encodings per type via config. That allows users to experiment with various encoding strategies manually as well as enables them to override the hardcoded defaults if they don't suit their use case.
We can override encodings per data type (int32 / int64 / ...).
Something on the lines of:
As an example:
When a primary + fallback need to be specified, we can do the following:
In such cases we can mandate that the first encoding listed must allow for Fallbacks by implementing RequiresFallback.