Details
-
Improvement
-
Status: Resolved
-
Minor
-
Resolution: Fixed
-
3.4.0
-
Reviewed
Description
Maven-shade-plugin rewrites classes when moving them into hadoop-client JARs. That's true even when it doesn't actually need to modify the byte code of the classes, say for shading.
We use a tool that checks for classpath duplicates that don't have equal byte code. This tool flags classes brought in via Hadoop. The classes it flagged came on one side from
a JAR containing relocated classes (hadoop-client-api or -runtime) and the other from the relocated JAR (hadoop-common or hadoop-shaded-guava). We checked and the byte code for the same class is indeed different between the relocated and non-relocated JARs.
This is because maven-shade-plugin, before 3.3.0, was rewriting class files even when the relocation was a "no-op". See MSHADE-391 and apache/maven-shade-plugin#95.
Maven Shade internally uses ASM's ClassRemapper and defines a custom Remapper subclass, which takes care of relocation, partially doing the work by itself and partially delegating to the ASM parent class. An ASM ClassReader reads each class file from the original JAR and unconditionally writes it into a ClassWriter, plugging in the transformer.
This transformation, even if not a single relocation (package name mapping) takes place, often leads to binary differences between original class and transformed class, because constant pool or stack map frames have been adjusted, not changing the functionality of the class, but making it look like something changed when comparing class files before and after the relocation process.
Upgrading to maven-shade-plugin 3.3.0 fixes the unnecessary rewrite of classes.
Attachments
Attachments
Issue Links
- links to