As described in HADOOP-12990, the Hadoop Lz4Codec uses the lz4 block format, and it prepends 8 extra bytes before the compressed data. I believe that lz4 implementation in parquet-cpp also uses the lz4 block format, but it does not prepend these 8 extra bytes.
Using Java parquet-mr, I wrote a Parquet file with lz4 compression:
When I attempted to read this file with parquet-cpp, I got the following error:
https://github.com/apache/arrow/issues/3491 reported incompatibility in the other direction, using Spark (which uses the Hadoop lz4 codec) to read a parquet file that was written with parquet-cpp.
Given that the Hadoop lz4 codec has long been in use, and users have accumulated Parquet files that were written with this implementation, I propose changing parquet-cpp to match the Hadoop implementation.