Description
Currently we use Hadop 3.3.1's shaded client libraries. Lz4 is a provided dependency in Hadoop Common 3.3.1 for Lz4Codec. But it isn't excluded from relocation in these libraries. So to use lz4 as Parquet codec, we will hit the exception even we include lz4 as dependency.
[info] Cause: java.lang.NoClassDefFoundError: org/apache/hadoop/shaded/net/jpountz/lz4/LZ4Factory [info] at org.apache.hadoop.io.compress.lz4.Lz4Compressor.<init>(Lz4Compressor.java:66) [info] at org.apache.hadoop.io.compress.Lz4Codec.createCompressor(Lz4Codec.java:119) [info] at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:152) [info] at org.apache.hadoop.io.compress.CodecPool.getCompressor(CodecPool.java:168)
I already submitted a PR to Hadoop to fix it. Before it is released, at Spark side, we either downgrade to 3.3.0 or revert back to non-shaded hadoop client library.
Attachments
Issue Links
- is related to
-
HADOOP-17891 lz4-java and snappy-java should be excluded from relocation in shaded Hadoop libraries
- Resolved
- relates to
-
SPARK-36679 Remove lz4 hadoop wrapper classes after Hadoop 3.3.2
- Resolved
- links to