Details
-
Wish
-
Status: Open
-
Minor
-
Resolution: Unresolved
-
None
-
None
-
None
-
None
Description
In very large datasets, aggregating several INT8 into INT32 fields (or byte array) can make a big difference.
In parquet, efficient algorithms exist for INT32, so if the LogicalType is INT_8 the encoded int might take up only one byte.
However further optimizations could be made by allowing the user to better specify the types.
What about BYTE_ARRAY logical type, backed by FIXED_LEN_BYTE_ARRAY type (or eventually INT_32)?