Details
-
Improvement
-
Status: Resolved
-
Major
-
Resolution: Fixed
-
None
-
ghx-label-6
Description
Hadoop uses a native block format for LZ4 (same as parquet-mr api) which is incompatible with LZ4 block format.
As a result Parquet/LZ4 could have different block formats.
The parquet-cpp api (now Apache Arrow) uses LZ4 frame format, which is also incompatible with LZ4 block format.
The current decision is to use a format compatible with Hive, Spark and parquet-mr.